Table of Contents
Overview
In the rapidly evolving world of AI, the complexity of building and understanding large language models (LLMs) can be daunting. Enter nanochat, a groundbreaking educational tool designed to demystify the entire LLM pipeline. Created by Andrej Karpathy, former founding member of OpenAI and Tesla’s Director of AI, nanochat offers a full-stack implementation of an LLM comparable to ChatGPT within a single, clean, minimal, and highly hackable codebase of approximately 8,000 lines. This project empowers users to run everything from tokenization and pretraining to finetuning, evaluation, inference, and even a web UI on a single 8XH100 GPU node, providing an unprecedented opportunity to grasp the internals of LLM development without typical enterprise-scale overhead.
Released on October 13, 2025, via GitHub, nanochat represents the culmination of Karpathy’s educational philosophy and will serve as the capstone project for his LLM101n course being developed at Eureka Labs. Unlike his earlier project nanoGPT which focused solely on pretraining, nanochat delivers the complete ChatGPT-style experience including conversational finetuning, tool use capabilities, and deployment-ready inference.
Key Features
nanochat distinguishes itself as a meticulously crafted educational system rather than just another LLM toolkit, with distinct advantages for learners and researchers.
- Minimal, dependency-light codebase: Comprising approximately 8,000 lines of code across 45 files, with most components written in Python using PyTorch and a custom Rust-based tokenizer built via Maturin, making the entire system remarkably navigable and understandable.
- Complete LLM pipeline from scratch: Encompasses the full lifecycle including custom BPE tokenizer training with 65,536-token vocabulary, base pretraining on FineWeb-EDU dataset evaluating CORE metrics across 22 benchmarks, mid-training on SmolTalk conversational data with multiple-choice questions and tool use, supervised finetuning evaluated on ARC-Easy, ARC-Challenge, MMLU, GSM8K and HumanEval, and optional reinforcement learning using simplified GRPO on GSM8K.
- Production-grade inference engine: Features efficient KV cache implementation, optimized prefill and decode operations, Python interpreter integration within a lightweight sandbox for tool use capabilities, and both CLI and ChatGPT-style web interface for interaction.
- Runs on single 8XH100 GPU node: The complete pipeline executes via the speedrun.sh script on a single high-performance node, with flexible depth configurations from 560M parameters at depth 20 to larger models approaching GPT-3 Small compute budgets at depth 30, making advanced LLM development accessible at consumer cloud rental rates.
- Extensively documented and hackable: Clean, hand-written code without AI assistance prioritizes educational clarity over framework abstraction, includes detailed walkthrough documentation explaining architectural choices and training dynamics, generates comprehensive markdown report cards gamifying benchmarks and progress, and encourages community experimentation through maximum forkability.
How It Works
Understanding nanochat’s capabilities becomes practical through its streamlined execution process designed for efficiency and educational transparency. The system operates through a well-orchestrated sequence accessible even to those new to LLM development.
Users begin by setting up their environment on a cloud GPU instance, typically renting an 8XH100 node at approximately 24 dollars per hour from providers like Lambda Labs or similar GPU-as-a-service platforms. After cloning the repository and activating the virtual environment with minimal dependencies primarily PyTorch, the entire training pipeline executes through the single speedrun.sh script.
This script first trains a custom Byte Pair Encoding tokenizer in Rust on 2 billion characters from the dataset, achieving approximately 4.8 characters per token compression ratio comparable to modern tokenizers. The pretraining phase then begins on FineWeb-EDU, a curated educational subset of the Common Crawl dataset that Karpathy repackaged as karpathy/fineweb-edu-100b-shuffle into fully shuffled 100MB parquet shards for efficient access. During pretraining, the system continuously evaluates CORE scores across diverse benchmarks including HellaSwag, ARC, BoolQ, PIQA, and others.
Following base pretraining, mid-training introduces conversational structure using SmolTalk’s 460,000 user-assistant dialogues, MMLU auxiliary train split with 100,000 multiple-choice examples, and GSM8K’s 8,000 mathematical reasoning problems, incorporating special tokens for tool use schemas. The supervised finetuning stage refines the chat model on carefully selected examples totaling 21,400 instances from ARC-Easy, ARC-Challenge, GSM8K, and SmolTalk to strengthen world knowledge, mathematical reasoning, and conversational quality.
For teams seeking maximum performance, an optional reinforcement learning stage using simplified GRPO optimizes the model specifically on GSM8K mathematical problems. Throughout training, the system tracks metrics and generates a comprehensive report.md file summarizing all benchmarks, creating a gamified learning experience that encourages iterative improvement.
Once training completes and model weights are saved, users activate inference by running python -m scripts.chat_web, launching the ChatGPT-style web interface where they can interact with their custom-trained model, test its capabilities across various tasks, and observe firsthand how training decisions impact conversational quality and reasoning abilities.
Use Cases
nanochat’s unique positioning as an educational-first platform with production-grade architecture opens compelling applications for individuals and organizations seeking deeper AI understanding.
- Educational foundation for aspiring AI practitioners: Provides computer science students, bootcamp participants, and self-taught programmers with hands-on experience building ChatGPT-style systems from scratch, demystifying concepts like attention mechanisms, tokenization strategies, loss curve analysis, and hyperparameter optimization through direct experimentation rather than abstract theory.
- Academic research baseline and experimentation platform: Enables university researchers and graduate students to establish reproducible baselines for novel training techniques, architectural modifications, or dataset curation strategies, with the hackable codebase facilitating rapid prototyping of research hypotheses without navigating complex framework abstractions.
- Corporate AI literacy and capability assessment: Supports organizations building AI teams by providing technical hiring assessments where candidates demonstrate LLM understanding through nanochat modifications, offers executive education demonstrating realistic AI development costs and timelines, and helps technical leads evaluate whether to build custom models versus licensing commercial alternatives.
- Prototyping domain-specific micro LLMs: Allows specialized industries including legal tech, medical diagnostics, financial analysis, and regulatory compliance to experiment with training small LLMs on proprietary domain data, assess performance-cost trade-offs before committing to larger-scale development, and validate whether custom models provide sufficient value over general-purpose alternatives.
- Teaching AI ethics and bias analysis: Facilitates computer ethics courses and responsible AI workshops by enabling students to observe how training data composition affects model behavior, experiment with debiasing techniques and safety guardrails, and develop intuition about AI limitations and failure modes through systematic experimentation.
Pros \& Cons
Like any specialized educational tool, nanochat presents distinct advantages and realistic limitations that users should understand before committing resources.
Advantages
nanochat delivers compelling benefits particularly valuable for educational contexts and research exploration.
- Exceptional educational transparency: Every component from tokenizer through deployment remains visible and understandable in clean Python and Rust code hand-written specifically for pedagogical clarity, enabling learners to trace how text transforms into tokens, how attention mechanisms combine information, and how training objectives shape model behavior.
- Fully open-source with permissive licensing: Released on GitHub without restrictive licensing enables unrestricted modification, commercial derivative works, academic publications, and community contributions, fostering collaborative improvement and knowledge sharing across the AI education ecosystem.
- Complete pipeline eliminating integration complexity: Having tokenization, pretraining, finetuning, evaluation, and deployment within a single cohesive repository removes the typical challenges of connecting disparate tools, managing version compatibility across frameworks, and debugging integration issues that plague production LLM development.
- Realistic cost and time transparency: Explicit documentation of dollar costs for different model scales paired with actual training times on standard cloud hardware provides learners and decision-makers with accurate expectations, contrasting sharply with enterprise LLM development where costs and timelines remain opaque.
- Active community and continuous improvement: Since its October 2025 release, the project has attracted significant community engagement with contributors sharing optimizations, architecture experiments, and training recipes, creating a collaborative learning environment that amplifies educational value beyond the initial codebase.
Disadvantages
Honest assessment of nanochat requires acknowledging inherent limitations stemming from its educational mission and resource constraints.
- Produces small models with limited capabilities: Even with 24-hour training generating approximately 560-600 million parameter models, nanochat outputs remain far below frontier model performance, exhibiting frequent hallucinations, naive reasoning, limited world knowledge, and conversational awkwardness comparable to GPT-2 rather than GPT-4 or Claude, making them unsuitable for production applications requiring reliability.
- Requires expensive high-end GPU infrastructure: Despite billing as affordable for education, the 8XH100 GPU node requirement represents enterprise-grade hardware costing 24 dollars per hour with minimum 4-hour commitment, creating accessibility barriers for individual learners, students without institutional support, and researchers in resource-constrained settings who cannot justify several hundred dollars for experimentation.
- Not designed for production deployment: The codebase prioritizes educational clarity over production concerns including security hardening, scalability optimizations, monitoring instrumentation, and API design, meaning organizations cannot simply deploy nanochat models into customer-facing applications without substantial additional engineering investment.
- Limited multilingual and multimodal capabilities: Training focuses exclusively on English text understanding and generation without image comprehension, speech processing, or multilingual support that characterize modern LLMs, constraining applications to narrow use cases and preventing exploration of cross-modal learning increasingly central to AI development.
- Steep learning curve despite documentation: While cleaner than production frameworks, understanding nanochat still requires solid Python programming skills, basic deep learning concepts including backpropagation and optimization, familiarity with transformer architecture fundamentals, and comfort navigating command-line tools and cloud infrastructure, potentially overwhelming complete beginners despite educational intent.
How Does It Compare?
The landscape of LLM training frameworks and educational resources in October 2025 offers various approaches serving different learning objectives and technical requirements. Understanding nanochat’s positioning relative to alternatives clarifies its unique value proposition.
Educational LLM Training Projects
Andrej Karpathy’s earlier nanoGPT established the template for minimal, educational LLM implementations focused on pretraining transformer models. Released several years before nanochat, nanoGPT provided clean PyTorch code for training GPT-2 scale models and attracted widespread adoption in university courses and self-study programs. However, nanoGPT intentionally stopped at pretraining, omitting the conversational finetuning, instruction following, tool use, and chat interface that define modern ChatGPT-style assistants.
nanochat extends nanoGPT’s philosophy to the complete ChatGPT pipeline, adding mid-training on conversational data, supervised finetuning on knowledge and reasoning tasks, optional reinforcement learning, efficient inference engine with KV caching, tool use via Python sandbox, and production-ready web UI. This makes nanochat significantly more comprehensive for understanding modern LLM development while maintaining the same minimal, hackable codebase philosophy.
nanoLM represents another educational approach, focusing on accurate loss prediction across model scales through mu-scaling techniques. While valuable for understanding scaling laws and predicting larger model performance from smaller experiments, nanoLM emphasizes pretraining dynamics and compute-optimal training rather than the complete conversational AI pipeline that nanochat delivers. Researchers interested in scaling behavior and compute efficiency find nanoLM complementary to nanochat’s end-to-end approach.
Other educational projects like minGPT and picoGPT offer even more minimal implementations, sometimes under 500 lines, demonstrating core transformer concepts. These serve as excellent introductions to attention mechanisms and language modeling basics, but their extreme simplicity means they cannot train competitive models or demonstrate production considerations that nanochat addresses.
