Skip to content

Vault SQLite Index + Reliable Git Sync — Implementation Plan

Created 2026-03-24
Updated 2026-04-07
Tags infrastructurevaultsqlitegit-syncplanning

Status: Implemented on Patrick’s Mac and VPS (2026-04-07) Created: 2026-03-24 Implemented: 2026-04-07 Context: Inspired by the shift from PKM tools toward tool-agnostic, AI-queryable folder structures. This plan keeps Obsidian as the editing interface while adding a queryable relationship layer and fixing the sync reliability issues between machines.


Two issues with the current vault setup:

  1. Stale content between machines. The Obsidian Git plugin auto-commits every 10 minutes but auto-push is disabled (autoPushInterval = 0). Changes commit locally but don’t reach other machines until someone manually pushes. This causes frequent drift between Patrick’s Mac, Asia’s machines, and the VPS.

  2. No queryable relationship layer. The vault has rich wikilinks and tags, but there’s no way for AI (or VS Code) to quickly query “what links to this file?” or “what files have this tag?” without scanning every file. The existing qmd tool handles full-text and vector search but doesn’t track wikilinks or frontmatter metadata.

The vault already has qmd installed on both Mac and VPS with:

  • 1,154 files indexed for full-text (BM25) + vector search
  • MCP server configured for Claude at ~/.claude/mcp-servers/qmd-mcp.sh
  • Daily LaunchAgent at 3 AM (was broken: ran qmd embed without qmd update first — fixed as part of this implementation)

We do NOT build a content search index. We build a complementary relationship index (wikilinks + tags + frontmatter) and fix qmd’s staleness.


FilePurpose
scripts/build-vault-index.pyBuilds/updates vault-index.db — relationship index (wikilinks, tags, frontmatter). Incremental by default, --full for complete rebuild. Zero external dependencies, Python 3.8+ compatible.
scripts/vault-sync.shReplaces Obsidian Git plugin — pull (rebase with merge fallback), commit, push every 5 min. Lockfile prevents concurrent runs. --pull-only flag for passive mirrors. Logs to ~/logs/vault-sync.log.
scripts/post-sync-hook.shRuns after pull brings changes — rebuilds vault-index.db (incremental) + runs qmd update if available.
scripts/git-hooks/post-mergeGit hook that triggers post-sync-hook after manual git pull.
scripts/git-hooks/post-rewriteGit hook that triggers post-sync-hook after rebase.
scripts/launchd/com.baseworks.vault-sync.plistmacOS LaunchAgent — runs vault-sync.sh every 300 seconds (5 min), RunAtLoad=true. PATH includes /opt/homebrew/bin.
scripts/vault-audit.pyHealth check for the entire system — CLAUDE.md size, index freshness, qmd status, vault structure, sync status, growth trends. Outputs human-readable report or --json. Logs metrics to scripts/audit-log.json for trend tracking.
FileChange
.gitignoreAdded vault-index.db exclusion
CLAUDE.md (vault root)Slimmed from ~225 to ~175 lines. Added vault-index.db documentation (schema, useful queries). Removed reference material that moved to dedicated files.
00-inbox/claude-code-shared-context.mdUpdated with new sync/index infrastructure details
SkillPurpose
/vault-auditRuns vault-audit.py and presents the health report. Available from any session.
FileReason
scripts/README.mdDeferred — the scripts are self-documented via header comments and the plan document serves as the reference. Can be added if the scripts directory grows further.

The SQLite database lives at the vault root. Gitignored — each machine rebuilds its own copy via post-sync-hook.sh or manual run.

Tables:

  • files — path (PK), title, tags_json, created, mtime, word_count
  • links — source_path, target_path (resolved), target_raw, display_text, resolved (0/1)
  • tags — file_path, tag
  • meta — key/value pairs: schema_version (1), last_build timestamp

Indexes: idx_links_source, idx_links_target, idx_tags_file, idx_tags_tag

Useful queries:

-- Backlinks to a specific file
SELECT source_path FROM links WHERE target_path = 'path/to/file.md';
-- All files with a specific tag
SELECT file_path FROM tags WHERE tag = 'infrastructure';
-- Orphan files (no incoming links)
SELECT path FROM files WHERE path NOT IN (SELECT target_path FROM links WHERE resolved = 1);
-- Broken wikilinks
SELECT source_path, target_raw FROM links WHERE resolved = 0;

Performance: Full rebuild of ~754 files in under 2 seconds. Incremental updates (typical post-sync) under 0.1 seconds.


  1. Acquire lockfile (/tmp/vault-sync.lock) — prevents concurrent runs, auto-clears stale locks
  2. git pull --rebase origin main (falls back to git pull --no-rebase on conflict)
  3. If local changes exist: git add -A, commit as vault sync: YYYY-MM-DD HH:MM:SS, git push origin main
  4. If pull brought new commits: run post-sync-hook.sh (rebuilds index + updates qmd)
  5. Log every action to ~/logs/vault-sync.log
MachineMethodStatusNotes
Patrick’s Maclaunchd plist (every 5 min)LiveInstalled at ~/Library/LaunchAgents/com.baseworks.vault-sync.plist
VPS (baseworks-agents)cron */5 * * * *LiveAlso runs forum-content-sync (System 3) and questionnaire export (System 1)
Asia’s Macslaunchd plistPendingScripts sync via git; needs plist install + git hooks path set
NAS (Synology DS920+)--pull-only every 15 minDeferredNeeds read-only PAT; not a priority since VPS handles all automation

Currently coexisting: vault-sync.sh handles the actual sync, Obsidian Git plugin remains installed but with auto-commit/auto-pull intervals at 0. The plugin is kept for its status bar indicator and manual commit UI. Commits from the new system show as vault sync: vs the old vault backup: format. Full removal deferred until stability is confirmed across all machines.


These questions were listed in the original plan draft. Here are the decisions:

  1. Sync interval: 5 minutes confirmed. Frequent enough that drift is negligible, infrequent enough that GitHub Actions deploys (which cancel-in-progress) aren’t excessive.

  2. Commit message format: vault sync: YYYY-MM-DD HH:MM:SS — distinct from Obsidian Git’s vault backup: format so the source of each commit is identifiable in git log.

  3. Asia’s Obsidian Git plugin: Coexistence approach. Plugin settings sync via git but auto-intervals are set to 0. vault-sync.sh takes over the actual sync duty. Asia gets an inbox item with setup instructions for the launchd plist.

  4. NAS vault clone: Deferred. The VPS handles all automation needs. NAS pull-only mirror is nice-to-have but not blocking anything.

  5. qmd update frequency: Handled automatically by post-sync-hook.sh — runs qmd update after every pull that brings changes. The daily 3 AM qmd embed job remains for vector embedding (heavier operation). The broken daily job (missing qmd update step) was fixed.


LightRAG and graph database systems: Evaluated for AI retrieval enhancement. These tools are designed for thousands of documents with complex relationship graphs. The vault has ~754 files with explicit wikilinks — the SQLite relationship index provides the same structural awareness at zero ongoing cost (no external services, no API keys, no embedding costs). Will revisit if the vault grows past ~2,000 files or if cross-vault querying becomes needed.


  • build-vault-index.py completes under 2 seconds, creates vault-index.db
  • File count in DB matches vault (~754 files as of 2026-04-07)
  • Broken links tracked and queryable
  • Incremental re-run (no changes) under 0.5 seconds
  • vault-sync.sh commits and pushes when files change
  • vault-sync.sh does nothing when vault is clean
  • vault-sync.sh --pull-only only pulls
  • Lockfile prevents concurrent runs (PID-based with stale lock detection)
  • launchd plist loads and runs on schedule (Patrick’s Mac)
  • Changes in Obsidian appear on GitHub within 5 minutes
  • vault-index.db not tracked by git
  • GitHub Actions deploy still triggers normally
  • Python 3.8 compatible
  • qmd daily job fixed (added qmd update before qmd embed)
  • /vault-audit skill runs full health check

This infrastructure was designed with future AI retrieval and search enhancements in mind. Key extension points:

  1. Schema extensions. vault-index.db schema version is tracked in the meta table. The indexer can be extended to add new tables (e.g., embeddings, clusters, cross-references) without breaking existing queries. Bump SCHEMA_VERSION and add migration logic.

  2. Multi-source indexing. The VaultIndexer class accepts a vault_root parameter. It could be extended to accept multiple source paths for cross-project indexing (e.g., changelog repo + vault in one database).

  3. MCP server integration. The SQLite database is directly queryable by any MCP-connected tool. A dedicated MCP server for vault-index.db could expose relationship queries alongside qmd’s semantic search — giving Claude both structural and semantic search in one interface.

  4. Audit trend analysis. vault-audit.py logs metrics to scripts/audit-log.json after each run. Over time this provides growth curves, broken-link trends, and tag proliferation data that could inform automated maintenance decisions.

  5. Graph export. The links table is a complete directed graph of the vault. It can be exported to any graph format (DOT, GraphML, JSON) for visualization or analysis tools beyond Obsidian’s built-in graph view.