Skip to content

Karpathy LLM Wiki Pattern

Created 2026-04-05
Tags referencearchitectureknowledge-managementclaude-code

Reference file summarizing Andrej Karpathy’s approach to building personal knowledge bases with LLMs and Obsidian. Published April 3, 2026.

Instead of using RAG (where an LLM re-discovers knowledge from raw documents every time you ask a question), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files. The wiki compounds over time. When a new source is added, the LLM reads it, extracts key information, and integrates it into the existing wiki — updating entity pages, revising summaries, noting contradictions. The knowledge is compiled once and kept current, not re-derived on every query.

The human curates sources and asks questions. The LLM does the bookkeeping — summarizing, cross-referencing, filing, and maintenance.

Obsidian is the IDE. The LLM is the programmer. The wiki is the codebase.

  • Curated collection of source documents (articles, papers, images, data files)
  • Immutable — the LLM reads from them but never modifies them
  • This is the source of truth
  • Obsidian Web Clipper is recommended for ingesting web articles
  • A directory of LLM-generated markdown files
  • Summaries, entity pages, concept pages, comparisons, synthesis
  • The LLM owns this layer entirely — creates, updates, maintains cross-references
  • The human reads it; the LLM writes it
  • A configuration document (CLAUDE.md) that tells the LLM how the wiki is structured
  • Defines conventions, workflows for ingesting sources, answering questions, and maintaining the wiki
  • Co-evolved by the human and LLM over time

Drop a new source into raw, tell the LLM to process it. The LLM reads the source, discusses takeaways, writes a summary page, updates the index, and updates relevant pages across the wiki. A single source might touch 10-15 pages.

Ask questions against the wiki. The LLM searches for relevant pages, reads them, synthesizes an answer with citations. Good answers can be filed back into the wiki as new pages — explorations compound in the knowledge base.

Periodic health check. Look for: contradictions between pages, stale claims, orphan pages with no inbound links, missing cross-references, important concepts lacking their own page, data gaps that could be filled with a web search.

  • index.md — content catalog, organized by category, each page listed with a link and one-line summary. The LLM reads this first to find relevant pages. Works well at moderate scale (~100 sources, ~hundreds of pages) without embeddings or vector databases.
  • log.md — append-only chronological record. Entries use a consistent prefix format (e.g., ## [2026-04-02] ingest | Article Title) so it’s parseable with grep.

Karpathy reports the pattern works well at ~100 articles and ~400K words without any vector database or embeddings. The index file is sufficient for navigation at this scale. For larger wikis, he recommends qmd (local markdown search with hybrid BM25/vector search) or building a simple search tool.

This is not a replacement for enterprise-scale RAG. It’s a lightweight, zero-infrastructure alternative for individuals and small teams dealing with hundreds (not millions) of documents.

The Baseworks KB already has several elements of this pattern:

  • Markdown files with wiki links (bidirectional linking)
  • CLAUDE.md as the schema layer
  • Structured folders by domain area
  • Claude Code skills for automated workflows

Potential adoptions to evaluate:

  1. Raw/wiki separation — Are there source materials (PDFs, transcripts, articles) that should be stored separately from the synthesized wiki pages? Currently most content is mixed.
  2. Index file — A master index of all key documents with one-line summaries could help Claude navigate the vault more efficiently, especially as it grows.
  3. Log file — An append-only operation log could supplement or replace the changelog pattern.
  4. Lint skill — A /lint skill that checks for orphan pages, broken links, contradictions, and missing cross-references across the vault.
  5. Ingest workflow — A standardized process for adding new source material (e.g., session transcripts, research articles) that automatically updates relevant pages.

Full gist: Karpathy LLM Wiki

Original tweet: x.com/karpathy/status/2039805659525644595