Vault SQLite Index + Reliable Git Sync — Implementation Plan
Vault SQLite Index + Reliable Git Sync
Section titled “Vault SQLite Index + Reliable Git Sync”Status: Implemented on Patrick’s Mac and VPS (2026-04-07) Created: 2026-03-24 Implemented: 2026-04-07 Context: Inspired by the shift from PKM tools toward tool-agnostic, AI-queryable folder structures. This plan keeps Obsidian as the editing interface while adding a queryable relationship layer and fixing the sync reliability issues between machines.
Problem Statement
Section titled “Problem Statement”Two issues with the current vault setup:
-
Stale content between machines. The Obsidian Git plugin auto-commits every 10 minutes but auto-push is disabled (autoPushInterval = 0). Changes commit locally but don’t reach other machines until someone manually pushes. This causes frequent drift between Patrick’s Mac, Asia’s machines, and the VPS.
-
No queryable relationship layer. The vault has rich wikilinks and tags, but there’s no way for AI (or VS Code) to quickly query “what links to this file?” or “what files have this tag?” without scanning every file. The existing
qmdtool handles full-text and vector search but doesn’t track wikilinks or frontmatter metadata.
Key Discovery: qmd Already Exists
Section titled “Key Discovery: qmd Already Exists”The vault already has qmd installed on both Mac and VPS with:
- 1,154 files indexed for full-text (BM25) + vector search
- MCP server configured for Claude at
~/.claude/mcp-servers/qmd-mcp.sh - Daily LaunchAgent at 3 AM (was broken: ran
qmd embedwithoutqmd updatefirst — fixed as part of this implementation)
We do NOT build a content search index. We build a complementary relationship index (wikilinks + tags + frontmatter) and fix qmd’s staleness.
What Was Built
Section titled “What Was Built”Scripts Created
Section titled “Scripts Created”| File | Purpose |
|---|---|
scripts/build-vault-index.py | Builds/updates vault-index.db — relationship index (wikilinks, tags, frontmatter). Incremental by default, --full for complete rebuild. Zero external dependencies, Python 3.8+ compatible. |
scripts/vault-sync.sh | Replaces Obsidian Git plugin — pull (rebase with merge fallback), commit, push every 5 min. Lockfile prevents concurrent runs. --pull-only flag for passive mirrors. Logs to ~/logs/vault-sync.log. |
scripts/post-sync-hook.sh | Runs after pull brings changes — rebuilds vault-index.db (incremental) + runs qmd update if available. |
scripts/git-hooks/post-merge | Git hook that triggers post-sync-hook after manual git pull. |
scripts/git-hooks/post-rewrite | Git hook that triggers post-sync-hook after rebase. |
scripts/launchd/com.baseworks.vault-sync.plist | macOS LaunchAgent — runs vault-sync.sh every 300 seconds (5 min), RunAtLoad=true. PATH includes /opt/homebrew/bin. |
scripts/vault-audit.py | Health check for the entire system — CLAUDE.md size, index freshness, qmd status, vault structure, sync status, growth trends. Outputs human-readable report or --json. Logs metrics to scripts/audit-log.json for trend tracking. |
Modified Files
Section titled “Modified Files”| File | Change |
|---|---|
.gitignore | Added vault-index.db exclusion |
CLAUDE.md (vault root) | Slimmed from ~225 to ~175 lines. Added vault-index.db documentation (schema, useful queries). Removed reference material that moved to dedicated files. |
00-inbox/claude-code-shared-context.md | Updated with new sync/index infrastructure details |
Claude Code Skill Created
Section titled “Claude Code Skill Created”| Skill | Purpose |
|---|---|
/vault-audit | Runs vault-audit.py and presents the health report. Available from any session. |
Not Created (Planned but Deferred)
Section titled “Not Created (Planned but Deferred)”| File | Reason |
|---|---|
scripts/README.md | Deferred — the scripts are self-documented via header comments and the plan document serves as the reference. Can be added if the scripts directory grows further. |
Database Schema (vault-index.db)
Section titled “Database Schema (vault-index.db)”The SQLite database lives at the vault root. Gitignored — each machine rebuilds its own copy via post-sync-hook.sh or manual run.
Tables:
files— path (PK), title, tags_json, created, mtime, word_countlinks— source_path, target_path (resolved), target_raw, display_text, resolved (0/1)tags— file_path, tagmeta— key/value pairs: schema_version (1), last_build timestamp
Indexes: idx_links_source, idx_links_target, idx_tags_file, idx_tags_tag
Useful queries:
-- Backlinks to a specific fileSELECT source_path FROM links WHERE target_path = 'path/to/file.md';
-- All files with a specific tagSELECT file_path FROM tags WHERE tag = 'infrastructure';
-- Orphan files (no incoming links)SELECT path FROM files WHERE path NOT IN (SELECT target_path FROM links WHERE resolved = 1);
-- Broken wikilinksSELECT source_path, target_raw FROM links WHERE resolved = 0;Performance: Full rebuild of ~754 files in under 2 seconds. Incremental updates (typical post-sync) under 0.1 seconds.
Sync Architecture
Section titled “Sync Architecture”How vault-sync.sh Works
Section titled “How vault-sync.sh Works”- Acquire lockfile (
/tmp/vault-sync.lock) — prevents concurrent runs, auto-clears stale locks git pull --rebase origin main(falls back togit pull --no-rebaseon conflict)- If local changes exist:
git add -A, commit asvault sync: YYYY-MM-DD HH:MM:SS,git push origin main - If pull brought new commits: run
post-sync-hook.sh(rebuilds index + updates qmd) - Log every action to
~/logs/vault-sync.log
Deployment Status
Section titled “Deployment Status”| Machine | Method | Status | Notes |
|---|---|---|---|
| Patrick’s Mac | launchd plist (every 5 min) | Live | Installed at ~/Library/LaunchAgents/com.baseworks.vault-sync.plist |
| VPS (baseworks-agents) | cron */5 * * * * | Live | Also runs forum-content-sync (System 3) and questionnaire export (System 1) |
| Asia’s Macs | launchd plist | Pending | Scripts sync via git; needs plist install + git hooks path set |
| NAS (Synology DS920+) | --pull-only every 15 min | Deferred | Needs read-only PAT; not a priority since VPS handles all automation |
Obsidian Git Plugin Transition
Section titled “Obsidian Git Plugin Transition”Currently coexisting: vault-sync.sh handles the actual sync, Obsidian Git plugin remains installed but with auto-commit/auto-pull intervals at 0. The plugin is kept for its status bar indicator and manual commit UI. Commits from the new system show as vault sync: vs the old vault backup: format. Full removal deferred until stability is confirmed across all machines.
Decisions Made (Resolved Open Questions)
Section titled “Decisions Made (Resolved Open Questions)”These questions were listed in the original plan draft. Here are the decisions:
-
Sync interval: 5 minutes confirmed. Frequent enough that drift is negligible, infrequent enough that GitHub Actions deploys (which cancel-in-progress) aren’t excessive.
-
Commit message format:
vault sync: YYYY-MM-DD HH:MM:SS— distinct from Obsidian Git’svault backup:format so the source of each commit is identifiable in git log. -
Asia’s Obsidian Git plugin: Coexistence approach. Plugin settings sync via git but auto-intervals are set to 0. vault-sync.sh takes over the actual sync duty. Asia gets an inbox item with setup instructions for the launchd plist.
-
NAS vault clone: Deferred. The VPS handles all automation needs. NAS pull-only mirror is nice-to-have but not blocking anything.
-
qmd update frequency: Handled automatically by post-sync-hook.sh — runs
qmd updateafter every pull that brings changes. The daily 3 AMqmd embedjob remains for vector embedding (heavier operation). The broken daily job (missingqmd updatestep) was fixed.
Evaluated and Skipped
Section titled “Evaluated and Skipped”LightRAG and graph database systems: Evaluated for AI retrieval enhancement. These tools are designed for thousands of documents with complex relationship graphs. The vault has ~754 files with explicit wikilinks — the SQLite relationship index provides the same structural awareness at zero ongoing cost (no external services, no API keys, no embedding costs). Will revisit if the vault grows past ~2,000 files or if cross-vault querying becomes needed.
Verification (2026-04-07)
Section titled “Verification (2026-04-07)”-
build-vault-index.pycompletes under 2 seconds, creates vault-index.db - File count in DB matches vault (~754 files as of 2026-04-07)
- Broken links tracked and queryable
- Incremental re-run (no changes) under 0.5 seconds
-
vault-sync.shcommits and pushes when files change -
vault-sync.shdoes nothing when vault is clean -
vault-sync.sh --pull-onlyonly pulls - Lockfile prevents concurrent runs (PID-based with stale lock detection)
- launchd plist loads and runs on schedule (Patrick’s Mac)
- Changes in Obsidian appear on GitHub within 5 minutes
-
vault-index.dbnot tracked by git - GitHub Actions deploy still triggers normally
- Python 3.8 compatible
- qmd daily job fixed (added
qmd updatebeforeqmd embed) -
/vault-auditskill runs full health check
Future Development Hooks
Section titled “Future Development Hooks”This infrastructure was designed with future AI retrieval and search enhancements in mind. Key extension points:
-
Schema extensions.
vault-index.dbschema version is tracked in themetatable. The indexer can be extended to add new tables (e.g.,embeddings,clusters,cross-references) without breaking existing queries. BumpSCHEMA_VERSIONand add migration logic. -
Multi-source indexing. The
VaultIndexerclass accepts avault_rootparameter. It could be extended to accept multiple source paths for cross-project indexing (e.g., changelog repo + vault in one database). -
MCP server integration. The SQLite database is directly queryable by any MCP-connected tool. A dedicated MCP server for vault-index.db could expose relationship queries alongside qmd’s semantic search — giving Claude both structural and semantic search in one interface.
-
Audit trend analysis.
vault-audit.pylogs metrics toscripts/audit-log.jsonafter each run. Over time this provides growth curves, broken-link trends, and tag proliferation data that could inform automated maintenance decisions. -
Graph export. The links table is a complete directed graph of the vault. It can be exported to any graph format (DOT, GraphML, JSON) for visualization or analysis tools beyond Obsidian’s built-in graph view.
Related
Section titled “Related”- Vault Sync Systems — documents all three automated sync systems (questionnaire, segment feedback, forum content)
- Shared Infrastructure Context — VPS and infrastructure state
- check-wikilinks.py — existing wikilink checker (resolution logic reused by the indexer)
- Claude Code Skills Index —
/vault-auditskill registered here