Skip to content

Infrastructure Security Hardening Plan

Created 2026-04-29
Status in-progress
Tags planinfrastructuresecuritybackupsprivacy

Context: Following a structured risk audit of the Baseworks KB / vault project (2026-04-29), this plan addresses all identified gaps across six domains: permissions, backups, version control, reversibility, privacy, and documentation hygiene. The goal is a system where failure modes are visible before they become incidents, destructive operations require deliberate human action, and participant data is handled with explicit governance rather than accumulated convention.

Work in phases. Each phase is self-contained and can be completed in a single session. Later phases depend on earlier ones only where noted.

Audit source: See the 2026-04-29 conversation with Patrick for the full findings that produced this plan.


Why first: These are the highest-blast-radius gaps and require no external systems, no server access, and no coordination with Asia. They are configuration changes that take effect immediately.

Context: The global ~/.claude/settings.json and ~/.claude/settings.local.json on Patrick’s Mac (and likely on Asia’s machines) contain:

  • skipDangerousModePermissionPrompt: true — bypasses Claude Code’s last confirmation dialog before dangerous operations. A single misunderstood instruction can execute without a human checkpoint.
  • Pre-authorized Bash(sudo rm:*), Bash(rm:*), Bash(docker system prune:*) — catastrophic if triggered unexpectedly, and the dangerous mode bypass means no dialog appears.
  • A full n8n JWT API key hardcoded inline in the settings.local.json permissions allow list (lines containing N8N_KEY=eyJ...). This key is stored in plain text in a local config file and was almost certainly accidentally included when Claude Code wrote the permission during an n8n session.
  • Overly broad Bash(python3:*), Bash(osascript:*), Bash(ssh:*) with no host scoping.

Steps:

  • 1.1 Remove skipDangerousModePermissionPrompt: true from ~/.claude/settings.json — done 2026-04-29.

  • 1.2 Audit ~/.claude/settings.local.json for the embedded n8n JWT key — done 2026-04-30. Removed all three instances of the n8n JWT (eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...) and the surrounding stale session artifacts: the n8n workflow loop fragments (for id in ..., do, __NEW_LINE__... placeholders), the SSH+curl commands that referenced the key, and /tmp/modify_workflow.py:*. Action still needed: rotate the n8n API key in n8n’s admin UI (step 1.5 below) since the key was in plaintext.

  • 1.3 Review the full permissions allow list in ~/.claude/settings.local.json — done 2026-04-30. Changes made:

    • Removed Bash(sudo rm:*) — catastrophic, no longer pre-authorized
    • Removed Bash(rm:*) — broad removal; specific named cache paths (rm -rf /Users/vboy/Library/Caches/...) are still present and acceptable
    • Removed Bash(docker system prune:*) — will now prompt
    • Removed 19 stale one-off session entries: Astro build commands, specific head/wc -l commands against named session files, Bash(/Users/vboy/Documents/baseworks-changelog/CLAUDE-INSTRUCTIONS.md:*) (a markdown file listed as an executable), pdftoppm with hardcoded filename, brew install poppler, stale MCP Canva entry, /tmp/event-plain.html:*
    • Also removed 2026-04-30: Bash(python3:*) and Bash(osascript:*). Neither is a major threat but both are broader than needed globally. If these start generating frequent annoying prompts during normal work, restore them by running /update-config and adding the entries back to ~/.claude/settings.local.json, or edit the file directly. python3:* allows any Python script to run without prompting; osascript:* allows any AppleScript/JXA without prompting (slightly higher sensitivity — controls apps, can access keychain). The vault project settings already have python3 scripts/* scoped, so vault work is unaffected.
  • 1.4 Confirm Asia’s machines have the same fixes applied. Check ~/.claude/settings.json and ~/.claude/settings.local.json on both Asia’s MacBook Air and Mac Mini. The skipDangerousModePermissionPrompt setting in particular may be set on those machines as well if they were configured from the same template.

  • 1.5 Rotate the n8n API key — done 2026-04-30. Old key (expired, was in plaintext in settings.local.json) deleted. New key “Baseworks Deploy” created with expiry 2027-04-30. No scripts needed updating — key was not referenced anywhere on the VPS or in any repo. Rotate again by 2027-04-30.


Phase 2 — Backup Verification and NAS Coverage

Section titled “Phase 2 — Backup Verification and NAS Coverage”

Why second: Backups exist on paper but have never been verified by a test restore. A backup you have never restored from is a backup you cannot rely on. The NAS is a single point of failure for media originals.

Context:

  • WordPress sites back up weekly (Sunday) to Backblaze B2. The daily health check confirms files are non-empty. But a corrupt dump that passes a size check looks identical to a valid one until restore time.
  • The NAS (Synology DS920+, RAID 5, ~22TB free) holds all media originals. There is no B2 backup job for the NAS. RAID 5 protects against single-drive failure only — not ransomware, accidental deletion, or Synology firmware corruption.
  • Practice.baseworks.com is active daily but only backed up weekly. Up to 6 days of participant data (posts, profiles, progress) could be lost in a disaster.

Steps:

  • 2.1 Perform a test restore of staging.baseworks.com from its most recent B2 backup.

    • Use the existing b2 file download + wp db import restore path documented in 00-inbox/claude-code-shared-context.md
    • Confirm WordPress starts, pages load, and a sample of content is correct
    • Document: date performed, which backup file used, who verified, and any issues found
    • Add a note to this plan and to claude-code-shared-context.md when done
  • 2.2 Set up a NAS → B2 backup job. The NAS runs rclone. Configure a daily rclone sync from /volume1/baseworks/media/ to a dedicated B2 bucket (suggest: baseworks-nas-media-backup). Run from the NAS via Synology Task Scheduler or from baseworks-agents via ssh synology. Add to the daily health check in daily-infra-updates.sh so failures surface in the Slack alert.

  • 2.3 Add daily incremental backups for practice.baseworks.com DB. The practice site has daily participant activity — a weekly backup means up to 6 days of data loss. Add a daily DB-only backup (files can stay weekly) with a shorter retention (7 days rolling). This is a small change to the existing weekly backup cron structure.

  • 2.4 Document a restore runbook. A simple step-by-step document (can be a section in claude-code-shared-context.md) covering: how to list available backups, how to download from B2, how to import a WordPress DB, and how to verify the result. This is for use under pressure — the last place you want to be figuring out rclone flags is during an incident.

  • 2.5 Schedule a quarterly restore test. Add a cron entry (or a calendar reminder) to run step 2.1 every 3 months on a different site each time (rotate: staging → crm → practice). Log results each time.


Phase 3 — VPS Script Version Control and n8n Workflow Git

Section titled “Phase 3 — VPS Script Version Control and n8n Workflow Git”

Why third: The 2026-04-05 VPS wipe showed that ~/scripts/ can disappear without warning. The 2026-04-13 script drift incident showed what happens when running copies and committed copies diverge. n8n workflows are currently modified by editing PostgreSQL directly with no version history.

Context:

  • ~/scripts/ on baseworks-agents contains: daily-infra-updates.sh, slack-notify.sh, all weekly-backup-*.sh scripts, daily-backup-n8n.sh, vault-audit-slack.sh, and others. These are not in any git repo. If the VPS is wiped, they must be reconstructed from memory or from claude-code-shared-context.md documentation.
  • Forum sync scripts were fixed by moving them into the vault git repo (canonical: scripts/forum-content-sync.sh). The same approach should apply to all VPS scripts.
  • n8n’s “Vault Capture via Slack” workflow was patched by editing the PostgreSQL DB directly. No JSON export exists. A restore from an old B2 backup loses the patch.

Steps:

  • 3.1 Add ~/scripts/ on baseworks-agents to a git repo — done 2026-04-30. Option A chosen: scripts committed to sites/baseworks-agents/scripts/ in baseworks-changelog repo.

  • 3.2 Commit all current VPS scripts — done 2026-04-30. All 16 active scripts committed (weekly-backup × 5, daily-backup-n8n, daily-infra-updates, stale-cache-monitor, slack-notify, vault-audit-slack, questionnaire-export, people-webhook-heartbeat, qmd-embed-filtered, update-sitemap-reference, activity-thread-summarizer.sh + .py). B2 keys and Cloudflare API key extracted from scripts and moved to ~/.config/baseworks/vps-credentials on the VPS (chmod 600, not in git). Live ~/scripts/ copies updated to source from credentials file.

  • 3.3 Export n8n workflows — done 2026-04-30. 11 workflows exported with n8n export:workflow --all --pretty and committed to sites/baseworks-n8n/workflows/n8n-workflows-2026-04-30.json. Required: re-export and commit after any future workflow change (sudo docker exec baseworks-n8n n8n export:workflow --all --output=/tmp/n8n-workflows-all.json --pretty on the n8n VPS).

  • 3.4 Update crontab to run from git-tracked paths — done 2026-04-30. All cron entries for baseworks-agents scripts now point to /srv/baseworks/changelog/sites/baseworks-agents/scripts/. A dedicated changelog sync cron (git pull at 3:00 AM daily) runs before the backup window starts at 3:30 AM, ensuring scripts are always current. (Bootstrap-safe: git pull runs on the changelog repo as a separate cron, not inside a script that modifies itself.)


Phase 4 — Reversibility and Monitoring Improvements

Section titled “Phase 4 — Reversibility and Monitoring Improvements”

Why fourth: Git provides reversibility for the vault, but destructive operations can be committed and pushed before anyone notices. WordPress changes have a 6-day recovery window at best. Some monitoring exists but has gaps.

Context:

  • vault-sync.sh runs git add -A unconditionally every 5 minutes. A mass file deletion (from a bad script run or a misunderstood Claude command) will be committed and pushed. Recovery is possible but requires noticing, identifying the bad commit, and running git checkout on affected paths.
  • The authorized_keys integrity check (added 2026-04-14) notifies within 24h of a key being removed — good. But there is no equivalent guard for vault bulk deletes or WP content changes.
  • The audit-log.json last entry is 2026-04-07. Either vault-audit.py stopped writing to it, or the audit cron stopped running. Worth investigating.

Steps:

  • 4.1 Add a bulk-delete guard to vault-sync.sh. Before git add -A, count pending deletions:

    Terminal window
    DELETE_COUNT=$(git status --porcelain | grep -c '^.D\| D')
    if [ "$DELETE_COUNT" -gt 20 ]; then
    log "WARN: $DELETE_COUNT deletions pending — skipping auto-commit, manual review required"
    exit 0
    fi

    Threshold of 20 is a starting point — adjust based on normal vault operations. This does not prevent deletion; it pauses the auto-commit so a human can review.

  • 4.2 Investigate the audit-log.json gap. SSH to baseworks-agents and check: (a) whether vault-audit.py is still in the cron, (b) when vault-audit-slack.sh last ran, (c) whether the audit output is going somewhere other than audit-log.json. If the audit stopped, restart it and identify why.

  • 4.3 Add a WP content change monitor for practice.baseworks.com. The practice site has active daily participants. Consider a simple daily DB row-count check in daily-infra-updates.sh — if the post/comment/user count drops by more than a threshold versus the previous day, flag it in the Slack alert. This is a lightweight tripwire, not a full audit.


Phase 5 — Privacy and Participant Data Governance

Section titled “Phase 5 — Privacy and Participant Data Governance”

Why fifth: This requires decisions, not just implementation. The technical setup exists; what’s missing is explicit governance and a decision record. This phase is primarily about making implicit choices explicit.

Context:

  • 02-areas/people/ contains named markdown files aggregating each participant’s data from form submissions, forum posts, and questionnaire responses. These files are committed to the vault git repo (private GitHub).
  • 02-areas/method-admin/audience/contact-inquiries/ contains 9 contact form submissions with real names, email addresses, and messages.
  • The forum content sync commits posts and replies from community participants to git.
  • The DM sync infrastructure is deployed and capable of committing private messages to git (currently 0 DM files, but the plumbing is running).
  • No visible retention policy, no documented participant disclosure, no deletion workflow.

Steps:

  • 5.1 Confirm GitHub repo access. Log into GitHub and verify: p-oancia/baseworks-kb-shared-brain is private, and the only collaborators are Patrick and Asia. No apps, bots, or third-party integrations with read access that aren’t needed.

  • 5.2 Make a documented decision on git as a store for participant data. The question is not whether to stop collecting it — it’s whether git (with its permanent, distributed history) is the right store for PII. Options:

    • Keep in git with explicit governance (access log, retention policy, deletion procedure)
    • Move people profiles and contact inquiries to a private database (e.g., the existing PostgreSQL on baseworks-n8n) and keep only non-PII notes in the vault
    • Keep in vault but mark files with privacy: pii frontmatter and add an automated check that these files are never in a public repo Record the decision and rationale in 03-resources/governance/data-privacy-policy.md (create this file).
  • 5.3 Establish a retention limit for contact inquiries. Decide: how long should contact form submissions be kept? Suggest: 2 years, then archive or delete. Add a script or a calendar reminder to review contact-inquiries annually.

  • 5.4 Address DM sync. Private 1-on-1 messages from participants being committed to git — even a private repo — is the sharpest privacy edge in the system. Decide: should DMs be synced to the vault at all? If yes, should they be excluded from git commits (gitignore the DM folder) and kept only as local working data on the VPS? If no, disable the DM sync path in forum-content-sync.py. Document the decision.

  • 5.5 Write a minimal participant data notice. A short internal document (not necessarily published) describing what data is collected from participants, where it’s stored, who has access, and how to request deletion. This is both a governance document and a forcing function — writing it surfaces anything that shouldn’t be collected.


Phase 6 — Infrastructure Documentation Cleanup

Section titled “Phase 6 — Infrastructure Documentation Cleanup”

Why last: Lower urgency. These are hygiene issues that don’t create immediate risk but accumulate technical debt.

Context:

  • claude-code-shared-context.md is committed to the vault repo and contains all server IPs, Tailscale IPs, SSH usernames, filesystem paths, expected authorized_keys fingerprints, and cron schedules. The repo is private, but the threat model depends entirely on that remaining true.
  • 132 broken wikilinks in a 754-file vault (~18% broken-link rate).
  • settings.local.json has accumulated many one-off session permissions that were never cleaned up — including full bash commands with embedded logic, stale SSH commands to specific IP/path combinations, and path-specific scp commands from individual past tasks.

Steps:

  • 6.1 Assess the server topology data in claude-code-shared-context.md. Decide whether IP addresses, usernames, and SSH fingerprints should remain in a git-tracked file. Options: move the sensitive fields to a separate file that is gitignored (but shared across machines via a separate mechanism), or accept the current arrangement with documented awareness that repo access must remain restricted. Record the decision.

  • 6.2 Run a settings.local.json permission audit. Go through the allow list and remove entries that are clearly stale session artifacts — the scp /tmp/closing-presentation-post-v7.html entry, the multi-line n8n workflow bash commands, and similar one-off commands that were added for a specific task and never removed. The goal is a permissions file that reflects ongoing needs, not a history of every command ever run.

  • 6.3 Run the wikilink checker and fix or remove broken links.

    Terminal window
    python3 scripts/check-wikilinks.py

    Aim to get broken links below 20. Schedule a quarterly wikilink audit.

  • 6.4 Check the audit-log.json status (if not already addressed in Phase 4) and confirm the vault-audit pipeline is healthy end-to-end: script runs → writes JSON → Slack notification fires.


PhaseDescriptionStatus
1Permission HardeningIn progress (1.1, 1.2, 1.3, 1.5 complete — 1.4 pending Asia coordination)
2Backup Verification and NAS CoverageNot started — resume here next session
3VPS Script Version Control and n8n Workflow GitComplete (2026-04-30)
4Reversibility and MonitoringNot started
5Privacy and Participant Data GovernanceNot started
6Infrastructure Documentation CleanupPartial (6.2 complete 2026-04-30 — 6.1, 6.3, 6.4 pending)

Paused 2026-04-30. All acute risks resolved. Remaining phases are important but not urgent. Resume with Phase 2 (backup verification) — highest remaining value. Phases 2, 4, 5, and most of 6 do not require Asia’s machine. Phase 1.4 is the only outstanding item for Asia.