Karpathy LLM Wiki Pattern

Created 2026-04-05

Tags referencearchitectureknowledge-managementclaude-code

Reference file summarizing Andrej Karpathy’s approach to building personal knowledge bases with LLMs and Obsidian. Published April 3, 2026.

The Core Idea

Instead of using RAG (where an LLM re-discovers knowledge from raw documents every time you ask a question), the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files. The wiki compounds over time. When a new source is added, the LLM reads it, extracts key information, and integrates it into the existing wiki — updating entity pages, revising summaries, noting contradictions. The knowledge is compiled once and kept current, not re-derived on every query.

The human curates sources and asks questions. The LLM does the bookkeeping — summarizing, cross-referencing, filing, and maintenance.

Obsidian is the IDE. The LLM is the programmer. The wiki is the codebase.

Three-Layer Architecture

Layer 1: Raw Sources

Curated collection of source documents (articles, papers, images, data files)
Immutable — the LLM reads from them but never modifies them
This is the source of truth
Obsidian Web Clipper is recommended for ingesting web articles

Layer 2: The Wiki

A directory of LLM-generated markdown files
Summaries, entity pages, concept pages, comparisons, synthesis
The LLM owns this layer entirely — creates, updates, maintains cross-references
The human reads it; the LLM writes it

Layer 3: The Schema

A configuration document (CLAUDE.md) that tells the LLM how the wiki is structured
Defines conventions, workflows for ingesting sources, answering questions, and maintaining the wiki
Co-evolved by the human and LLM over time

Three Operations

Ingest

Drop a new source into raw, tell the LLM to process it. The LLM reads the source, discusses takeaways, writes a summary page, updates the index, and updates relevant pages across the wiki. A single source might touch 10-15 pages.

Query

Ask questions against the wiki. The LLM searches for relevant pages, reads them, synthesizes an answer with citations. Good answers can be filed back into the wiki as new pages — explorations compound in the knowledge base.

Lint

Periodic health check. Look for: contradictions between pages, stale claims, orphan pages with no inbound links, missing cross-references, important concepts lacking their own page, data gaps that could be filled with a web search.

Special Files

index.md — content catalog, organized by category, each page listed with a link and one-line summary. The LLM reads this first to find relevant pages. Works well at moderate scale (~100 sources, ~hundreds of pages) without embeddings or vector databases.
log.md — append-only chronological record. Entries use a consistent prefix format (e.g., ## [2026-04-02] ingest | Article Title) so it’s parseable with grep.

Scale and Limitations

Karpathy reports the pattern works well at ~100 articles and ~400K words without any vector database or embeddings. The index file is sufficient for navigation at this scale. For larger wikis, he recommends qmd (local markdown search with hybrid BM25/vector search) or building a simple search tool.

This is not a replacement for enterprise-scale RAG. It’s a lightweight, zero-infrastructure alternative for individuals and small teams dealing with hundreds (not millions) of documents.

Relevance to Baseworks KB

The Baseworks KB already has several elements of this pattern:

Markdown files with wiki links (bidirectional linking)
CLAUDE.md as the schema layer
Structured folders by domain area
Claude Code skills for automated workflows

Potential adoptions to evaluate:

Raw/wiki separation — Are there source materials (PDFs, transcripts, articles) that should be stored separately from the synthesized wiki pages? Currently most content is mixed.
Index file — A master index of all key documents with one-line summaries could help Claude navigate the vault more efficiently, especially as it grows.
Log file — An append-only operation log could supplement or replace the changelog pattern.
Lint skill — A /lint skill that checks for orphan pages, broken links, contradictions, and missing cross-references across the vault.
Ingest workflow — A standardized process for adding new source material (e.g., session transcripts, research articles) that automatically updates relevant pages.

Source

Full gist: Karpathy LLM Wiki

Original tweet: x.com/karpathy/status/2039805659525644595