Karpathy LLM Wiki Pattern
Reference file summarizing Andrej Karpathy’s approach to building personal knowledge bases with LLMs and Obsidian. Published April 3, 2026.
The Core Idea
Section titled “The Core Idea”Instead of using RAG (where an LLM re-discovers knowledge from raw documents every time you ask a question), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files. The wiki compounds over time. When a new source is added, the LLM reads it, extracts key information, and integrates it into the existing wiki — updating entity pages, revising summaries, noting contradictions. The knowledge is compiled once and kept current, not re-derived on every query.
The human curates sources and asks questions. The LLM does the bookkeeping — summarizing, cross-referencing, filing, and maintenance.
Obsidian is the IDE. The LLM is the programmer. The wiki is the codebase.
Three-Layer Architecture
Section titled “Three-Layer Architecture”Layer 1: Raw Sources
Section titled “Layer 1: Raw Sources”- Curated collection of source documents (articles, papers, images, data files)
- Immutable — the LLM reads from them but never modifies them
- This is the source of truth
- Obsidian Web Clipper is recommended for ingesting web articles
Layer 2: The Wiki
Section titled “Layer 2: The Wiki”- A directory of LLM-generated markdown files
- Summaries, entity pages, concept pages, comparisons, synthesis
- The LLM owns this layer entirely — creates, updates, maintains cross-references
- The human reads it; the LLM writes it
Layer 3: The Schema
Section titled “Layer 3: The Schema”- A configuration document (CLAUDE.md) that tells the LLM how the wiki is structured
- Defines conventions, workflows for ingesting sources, answering questions, and maintaining the wiki
- Co-evolved by the human and LLM over time
Three Operations
Section titled “Three Operations”Ingest
Section titled “Ingest”Drop a new source into raw, tell the LLM to process it. The LLM reads the source, discusses takeaways, writes a summary page, updates the index, and updates relevant pages across the wiki. A single source might touch 10-15 pages.
Ask questions against the wiki. The LLM searches for relevant pages, reads them, synthesizes an answer with citations. Good answers can be filed back into the wiki as new pages — explorations compound in the knowledge base.
Periodic health check. Look for: contradictions between pages, stale claims, orphan pages with no inbound links, missing cross-references, important concepts lacking their own page, data gaps that could be filled with a web search.
Special Files
Section titled “Special Files”- index.md — content catalog, organized by category, each page listed with a link and one-line summary. The LLM reads this first to find relevant pages. Works well at moderate scale (~100 sources, ~hundreds of pages) without embeddings or vector databases.
- log.md — append-only chronological record. Entries use a consistent prefix format (e.g.,
## [2026-04-02] ingest | Article Title) so it’s parseable with grep.
Scale and Limitations
Section titled “Scale and Limitations”Karpathy reports the pattern works well at ~100 articles and ~400K words without any vector database or embeddings. The index file is sufficient for navigation at this scale. For larger wikis, he recommends qmd (local markdown search with hybrid BM25/vector search) or building a simple search tool.
This is not a replacement for enterprise-scale RAG. It’s a lightweight, zero-infrastructure alternative for individuals and small teams dealing with hundreds (not millions) of documents.
Relevance to Baseworks KB
Section titled “Relevance to Baseworks KB”The Baseworks KB already has several elements of this pattern:
- Markdown files with wiki links (bidirectional linking)
- CLAUDE.md as the schema layer
- Structured folders by domain area
- Claude Code skills for automated workflows
Potential adoptions to evaluate:
- Raw/wiki separation — Are there source materials (PDFs, transcripts, articles) that should be stored separately from the synthesized wiki pages? Currently most content is mixed.
- Index file — A master index of all key documents with one-line summaries could help Claude navigate the vault more efficiently, especially as it grows.
- Log file — An append-only operation log could supplement or replace the changelog pattern.
- Lint skill — A
/lintskill that checks for orphan pages, broken links, contradictions, and missing cross-references across the vault. - Ingest workflow — A standardized process for adding new source material (e.g., session transcripts, research articles) that automatically updates relevant pages.
Source
Section titled “Source”Full gist: Karpathy LLM Wiki
Original tweet: x.com/karpathy/status/2039805659525644595