mirror of https://github.com/karpathy/nanochat.git synced 2026-02-10 12:39:50 +00:00

History

Matt Suiche c5ef68cea2 Add comprehensive educational guide for nanochat Created a complete educational resource covering the implementation of nanochat from scratch, including: - Mathematical foundations (linear algebra, optimization, attention) - Tokenization with detailed BPE algorithm explanation - Transformer architecture and GPT model implementation - Self-attention mechanism with RoPE and Multi-Query Attention - Training process, data loading, and distributed training - Advanced optimization techniques (Muon + AdamW) - Practical implementation guide with debugging tips - Automated PDF compilation script The guide includes deep code walkthroughs with line-by-line explanations of key components, making it accessible for beginners while covering advanced techniques used in modern LLMs. Total content: ~4,300 lines across 8 chapters plus README and tooling. PDF compilation available via compile_to_pdf.py script.		2025-10-21 18:36:26 +04:00
..
01_introduction.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
02_mathematical_foundations.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
03_tokenization.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
04_transformer_architecture.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
05_attention_mechanism.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
06_training_process.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
07_optimization.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
08_putting_it_together.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
compile_to_pdf.py	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
nanochat_educational_guide.pdf	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00
README.md	Add comprehensive educational guide for nanochat	2025-10-21 18:36:26 +04:00

README.md

Educational Guide to nanochat

This folder contains a comprehensive educational guide to understanding and building your own Large Language Model (LLM) from scratch, using nanochat as a reference implementation.

What's Included

This guide covers everything from mathematical foundations to practical implementation:

📚 Core Materials

01_introduction.md - Overview of nanochat and the LLM training pipeline
02_mathematical_foundations.md - All the math you need (linear algebra, probability, optimization)
03_tokenization.md - Byte Pair Encoding (BPE) algorithm with detailed code walkthrough
04_transformer_architecture.md - GPT model architecture and components
05_attention_mechanism.md - Deep dive into self-attention with implementation details
06_training_process.md - Complete training pipeline from data loading to checkpointing
07_optimization.md - Advanced optimizers (Muon + AdamW) with detailed explanations
08_putting_it_together.md - Practical implementation guide and debugging tips

🎯 Who This Is For

Beginners: Start from first principles with clear explanations
Intermediate: Deep dive into implementation details and code
Advanced: Learn cutting-edge techniques (RoPE, Muon, MQA)

How to Use This Guide

Sequential Reading (Recommended for Beginners)

Read in order from 01 to 08. Each section builds on previous ones:

Introduction → Math → Tokenization → Architecture →
Attention → Training → Optimization → Implementation

Topic-Based Reading (For Experienced Practitioners)

Jump directly to topics of interest:

Want to understand tokenization? → Read 03_tokenization.md
Need to implement attention? → Read 05_attention_mechanism.md
Optimizing training? → Read 07_optimization.md

Code Walkthrough (Best for Implementation)

Read alongside the nanochat codebase:

Read a section (e.g., "Transformer Architecture")
Open the corresponding file (nanochat/gpt.py)
Follow along with the code examples
Modify and experiment

Compiling to PDF

To create a single PDF document from all sections:

cd educational
python compile_to_pdf.py

This will generate nanochat_educational_guide.pdf.

Requirements:

Python 3.7+
pandoc
LaTeX distribution (e.g., TeX Live, MiKTeX)

Install dependencies:

# macOS
brew install pandoc
brew install basictex  # or MacTeX for full distribution

# Ubuntu/Debian
sudo apt-get install pandoc texlive-full

# Python packages
pip install pandoc

Key Features of This Guide

🎓 Educational Approach

From first principles: Assumes only basic Python and math knowledge
Progressive complexity: Start simple, build up gradually
Concrete examples: Real code from nanochat, not pseudocode

💻 Code-Focused

Deep code explanations: Every important function is explained line-by-line
Implementation patterns: Learn best practices and design patterns
Debugging tips: Common pitfalls and how to avoid them

🔬 Comprehensive

Mathematical foundations: Understand the "why" behind every technique
Modern techniques: RoPE, MQA, Muon optimizer, softcapping
Full pipeline: From raw text to deployed chatbot

🚀 Practical

Runnable examples: All code can be tested immediately
Optimization tips: Make training fast and efficient
Scaling guidance: From toy models to production systems

What You'll Learn

By the end of this guide, you'll understand:

✅ How tokenization works (BPE algorithm) ✅ Transformer architecture in detail ✅ Self-attention mechanism (with RoPE, MQA) ✅ Training loop and data pipeline ✅ Advanced optimization (Muon + AdamW) ✅ Mixed precision training (BF16) ✅ Distributed training (DDP) ✅ Evaluation and metrics ✅ How to implement your own LLM

Prerequisites

Essential:

Python programming
Basic linear algebra (matrices, vectors, dot products)
Basic calculus (derivatives, chain rule)
Basic probability (distributions)

Helpful but not required:

PyTorch basics
Deep learning fundamentals
Familiarity with Transformers

Additional Resources

Official Documentation

nanoGPT - Pretraining only
minGPT - Educational GPT
llm.c - GPT in C/CUDA

Papers

Attention Is All You Need - Original Transformer
Language Models are Few-Shot Learners - GPT-3
Training Compute-Optimal LLMs - Chinchilla scaling laws

Contributing

Found an error or want to improve the guide?

Open an issue on the main nanochat repository
Suggest improvements or clarifications
Share what topics you'd like to see covered

License

This educational material follows the same MIT license as nanochat.

Acknowledgments

This guide is based on the nanochat implementation by Andrej Karpathy. All code examples are from the nanochat repository.

Special thanks to the open-source community for making LLM education accessible!

Happy learning! 🚀

If you find this guide helpful, please star the nanochat repository!