People Profiles Aggregation — Phase 2 Plan

Created 2026-04-14

Updated 2026-04-19

Status implemented

Tags vault-structurepeoplecrmautomationplan

Status (2026-04-19): Implemented. All seven Implementation Steps and the Vault Index Extension are in production. See Implementation summary 2026-04-19 below for a step-by-step completion log, including mid-course corrections to the original spec.

Earlier milestones for reference: Approved 2026-04-16; Asia’s 2026-04-17 review comments incorporated 2026-04-18 (§4, §5, §6, §8); Step 2 + 2.5 + 4 landed 2026-04-18; Steps 1/3 + Vault Index Extension + CRM webhook + Slack alerting landed 2026-04-19.

Implementation summary 2026-04-19

This section records how the spec actually came true — including deviations, corrections, and additions — so the plan stays a truthful reference rather than an aspirational document.

Step-by-step status

Plan step	Status	Notes
Step 1 — Formidable export plugins	✅ Done	Questionnaire plugins (Form 40 on baseworks.com + Form 74 on practice) + new Primer plugin `bw-primer-exports-cli.php` for Forms 66 (Squat Experience, 41 entries) and 67 (Segment Feedback, 131 entries).
Step 2 — `build-people-index.py`	✅ Done	Core aggregator operational.
Step 2.5 — Remote-join pass	✅ Done	Admin exclusion, `wp_users` join, `wp_bw_activity` tier signal, FluentCRM tag + custom-field reader (via `u4_baseworkscrm` SSH login), BuddyBoss cohort memberships.
Step 3 — Cron integration	✅ Done	New wrapper `scripts/people-pipeline.sh` chains questionnaire export → contact-inquiry export → primer exports → `build-people-index.py` → `build-vault-index.py`. Runs 6× daily on the agents VPS at `0 4,8,12,16,20,0`. Replaces the standalone `questionnaire-export.sh` cron.
Step 4 — `02-areas/people/index.md` MOC	✅ Done	Auto-generated by the aggregator (`write_moc()` in `build-people-index.py`) — grouped by tier → entry-path, sorted by recency within active/past. Re-generated on every run.
Step 5 — TT graduate batch import	✅ Done — folded into aggregator	No separate one-time import needed. `seed_tt_graduates()` in the aggregator reads `yjat=Yes` subscribers from FluentCRM and materializes thin records directly. 22 records seeded (22 YJAT=Yes subscribers, 5 merged with pre-existing records).
§7 — Vault index extension	✅ Done (2026-04-19)	Option (a) implemented: generic `frontmatter(file_path, key, value)` table in `vault-index.db`, plus an index on `(key, value)`. Schema bumped to version 2; a version-mismatch check auto-triggers full rebuild on first run. Enables SQL queries like `SELECT file_path FROM frontmatter WHERE key='entry-path' AND value='tokyo-studio-alumni'` instead of walking files in Python. 9,190 frontmatter rows indexed across the current 1,160 vault files.
Open — Privacy policy / terms update	Still open	Separate copy task, not blocking the implementation.

Corrections to the original spec

Contact-inquiry source — the plan said “Form 40” on baseworks.com, but that’s actually the event-participation questionnaire (already wired via bw-export questionnaire). The real contact/inquiry forms on baseworks.com are Form 2 (Contact Eng) and Form 8 (Contact Jap). A new exporter scripts/export-contact-inquiries.py pulls from both forms into 02-areas/method-admin/audience/contact-inquiries/, with a spam filter combining built-in heuristics (known spam domains like socialboozt.com, talkwithlead.com, thoughtwick.com; SEO/sales phrase patterns; internal test submissions) and manual _inquiry-{allow,deny}list.yml files under 02-areas/people/. Historical backfill: 8 entries processed → 5 spam, 3 genuine.

TT graduate seed_tt_graduates() originally dropped CRM custom fields other than yjat. Discovered when the romaji_name integration landed — seeded TT records couldn’t get romaji slugs because the seeder only copied yjat: Yes into crm_custom_fields. Fixed 2026-04-19 to copy every field from lookups.crm_fields_by_email into the evidence record. Now any CRM custom field (romaji, lead_background, movement_industry, etc.) flows through to TT-seeded person files.

slug_for() fallback chain evolved twice during Step 2.5 and post-implementation work:

Original behavior: lowercased canonical_name → [a-z0-9]+ → strip hyphens. Japanese names stripped to empty → all collapsed on unknown.md (5 kanji-named TT graduates overwrote each other in the MOC with [奈美熊谷](/unknown/)-style broken wikilinks).
2026-04-19 fix (a): prefer CRM romaji_name when the canonical name has no Latin characters. Fixed the five collisions (nami-kumagai.md, shinobu-nakano.md, megumi-fukiage.md, momoko-fukuda.md, kaori-yamamoto.md).
2026-04-19 fix (b): added an email-local-part fallback for non-Latin names with no CRM record (e.g. Japanese contact-inquiry submitter 春菜清水 → haruna0502.md). Ensures every person file gets a stable ASCII slug even without CRM coverage.

Additions beyond the original plan

FluentCRM “Romaji name” custom field. Added field romaji_name (slug, type text) and populated it for all 65 subscribers with non-ASCII names — 9 diacritic-folded (Noémie → Noemie, Véronique → Veronique, etc.) and 56 Hepburn romanizations of Japanese names, all human-reviewed (see changelog 2026-04-18 for the romanization table). The aggregator reads this field via people_remote.py and prefers it for slug generation.
CRM webhook for sub-minute freshness. Originally labelled future/optional. Now deployed:
- scripts/people-webhook-daemon.py — stdlib HTTP receiver on 127.0.0.1:9090, bearer-token auth, 30-second coalescing cooldown, async aggregator dispatch.
- Installed as a systemd service on the agents VPS.
- Publicly exposed at https://agents.baseworks.com/webhook/people-index via Cloudflare-proxied DNS + nginx reverse proxy.
- FluentCRM automation #42 “People aggregator — webhook on CRM change” (published) fires the webhook on tag application. Seeded with tags 74 (Event: Practice Meet Tokyo) and 103 (Tokyo Studio Alumni); Asia has an inbox task to widen the trigger set.
- End-to-end verified: 4 seconds from tag apply → aggregator exit=0 → vault updated.
- Full setup / rotation doc at people-webhook-setup.
Slack alerting to #agent-alerts. Three failure tiers:
1. systemd OnFailure=slack-notify-systemd-failure@%n.service for daemon crashes.
2. Daemon posts to Slack when build-people-index.py exits non-zero (or times out, or raises).
3. Daily heartbeat scripts/people-webhook-heartbeat.sh at 4:20 AM ET checks: service active, local /health 200, nginx→daemon path via --resolve (bypasses Cloudflare’s UA filter that blocks curl from the origin IP back to itself). Silent on quiet days; alerts only on real failure. Safety-net note: the 4-hour pipeline cron continues to run independently of the webhook, so even a total webhook outage only degrades freshness, never correctness.

Current vault state (2026-04-19)

90 person files in 02-areas/people/ (19 active, 19 mention, 46 unknown, 3 inquiry, 0 past — tier counts from the auto-generated MOC).
167 primer assignment files (41 squat + 126 segment feedback + index stubs).
8 contact-inquiry files (5 spam-flagged and excluded from the aggregator; 3 genuine surfaced as tier: inquiry).
vault-index.db schema v2, 9,190 frontmatter rows across 1,160 vault files, supporting SQL queries on arbitrary frontmatter keys.

Purpose

Build a person-scoped aggregation layer in the vault that cross-references every way a student/client touches Baseworks: contact form submissions, CRM record, Primer assignment responses, questionnaire data, forum posts and DMs, session appearances, event participation. The goal is to surface patterns Obsidian’s graph and backlinks can make visible — patterns a relational CRM flattens out — while keeping Fluent CRM authoritative for operational contact state.

This is the editorial / knowledge layer on top of existing operational systems, not a replacement for any of them.

Decisions (finalized 2026-04-16)

1. One-way sync — Fluent CRM stays authoritative

Direction: read from Fluent CRM, Primer Formidable forms, BuddyBoss, contact forms → vault. Vault adds its own editorial layer (teaching observations, session notes, tags) that stays vault-local and never pushes back.
Why one-way: two-way introduces conflict resolution and a second failure mode. The operational/editorial separation already works between CRM (ops) and vault (strategy); this extends that pattern.
Future escape hatch: if vault-side observations need to become CRM tags (e.g., “needs check-in” based on a session note), build a separate small CLI tool that pushes a tag to Fluent CRM’s API. That tool is explicitly out of scope for Phase 2.

2. Cross-referencing scope — structural metadata only (Phase 2)

Person files hold counts, dates, tags, wikilinks — never verbatim content. Deeper dives happen by clicking through to the source file.

What Phase 2 enables (metadata joins via SQLite + wikilinks):

“All DMs from Tokyo studio alumni in the last six months” — find people with entry-path: tokyo-studio-alumni, follow links to DM files.
“Who in the Spring 2026 cohort hasn’t posted in a forum yet?” — find people tagged study-group-spring-2026, check which have no topic/reply links.
“How many people from each entry-path class completed Primer Segment 2?” — group person files by entry-path, count Primer S2 response links.

What needs Phase 3 (content-aware analysis, future):

“Which topics mention soreness, grouped by entry-path” — requires reading body text.
“Sentiment trend over time for a given person” — text analysis.

The dividing line: Phase 2 answers questions you can ask with joins over metadata and the link graph. Phase 3 answers questions that require reading the actual words.

3. Privacy — commit everything, match current practice

Decision: all person files (structural metadata only) are committed to git. All source files (DMs, questionnaire responses, forum posts) remain committed as they already are. No new gitignored tier.
Rationale: the GitHub repository is private (only Patrick, Asia, and the VPS deploy key have access). Event-participant files already commit medical self-reports (conditions: free-text) and personal disclosures. DM bodies are already committed. Person files add less sensitive data (links, dates, tags) than what’s already in the repo. Creating a new gitignored tier for Phase 2 while Phase 1 content is committed creates inconsistency without reducing actual exposure.
Deletion policy: honor right-to-erasure requests by removing the person’s data from the repo and scrubbing git history (BFG) if needed. Add a brief note to the privacy policy or terms of service: participant interaction data is stored securely on private servers for pedagogical purposes and can be deleted on request. No credit card data, passwords, or financially sensitive material is stored in the vault.
Privacy policy update: separate task, not blocking Phase 2 implementation.

4. Identity key

Primary join: email address. The only identifier that can be present for every person, including TT graduates and Tokyo studio alumni who may have no WP account on either site.
Secondary identifiers (independent, not hierarchical). A person may have an account on baseworks.com, on practice.baseworks.com, on both, or on neither. The aggregator records whichever IDs are present; none is a “fallback” for another. The full archetype (baseworks-only, practice-only, both, neither) is ultimately determined by CRM tags, not by which IDs are populated — see §6.
- practice_site_user_id — integer WP user ID on practice.baseworks.com. Same ID space as wp_bw_activity.user_id, BuddyBoss, Formidable Form 74 (Questionnaire), and Formidable Form 61 (Journal, stored in both areb_frm_items.user_id and Field 906 key xaewc2 — values always identical).
- baseworks_site_user_id — integer WP user ID on baseworks.com. Different ID space from practice; never assume the two are equal.
- user_login — practice-site BuddyBoss handle, used for @mention autocomplete in forum posts. Present only for practice-site registered users.
Source-site awareness. Any vault file that captures a WP user ID must record which site it came from. Form 40 (baseworks.com) emits baseworks_site_user_id; Form 74 and Form 61 (practice) emit practice_site_user_id. The aggregator reads the correct field based on the source file’s origin.
Display: full name from CRM / registration.
Collisions: two people with same first-last name get a qualifier (city, year, or numeric).

Resolved 2026-04-18 — Asia’s comment on using practice-site user_id as the secondary key is incorporated above. Journal Form 61 storage verified live: areb_frm_items.user_id and Field 906 meta both carry the integer WP user ID, values always match.

5. Person file creation trigger — auto-grow with tiers

The aggregator creates a file for every unique email it encounters across all sources. Files are stamped with a tier: field derived from what’s linked to them:

Tier	Criteria
`active`	Has an entry in `wp_bw_activity` within the last 12 months OR is in a current cohort group
`past`	Has entries in `wp_bw_activity` but most recent is more than 12 months ago, and no current cohort group
`inquiry`	Only contact-form submission(s), no platform activity or enrollment (after spam/sales filter — see below)
`mention`	Only appears as a named reference with no first-party data

Tier signal — wp_bw_activity. The bw-activity-plugin is the authoritative source for “is this person active on the platform.” Historical data was imported for every user (2,604 entries across 72 users, back to December 2020), so the plugin captures all learning activity — Primer lesson completions, in-person sessions, labs, practice, manual entries. Derivation query: SELECT user_id, MAX(activity_timestamp) AS last_activity, COUNT(*) AS entry_count FROM wp_bw_activity GROUP BY user_id. The aggregator joins this to practice_site_user_id on each person file.

Inquiry tier — spam filter required. The contact-form stream on baseworks.com contains a mix of genuine inquiries, bot submissions, and sales pitches. Before a person file gets tier: inquiry, the aggregator runs a filter pass on the contact-form source files. Filter signals: empty or obviously fake names, disposable-email domains, submission body containing known spam tokens (SEO offers, crypto, “I can rank your website”), registration from known bot-pattern IPs. Filtered-out entries produce tier: spam person files (or are written to a separate _spam.md index) rather than polluting the inquiry pool. A manual allowlist/denylist YAML file (02-areas/people/_inquiry-filter.yml) overrides the automated filter when needed.

Resolved 2026-04-18 — Asia’s comments on using wp_bw_activity as the tier signal and adding a spam filter to the inquiry tier are both incorporated above.

The MOC (02-areas/people/index.md) lists active and past prominently; inquiry and mention in a collapsed or secondary section. All tiers are searchable in the graph and via SQLite.

6. Entry path — explicit spec

Every person file carries an entry-path field describing how the person first connected with Baseworks. Stored once per person, not per message. DM and topic files stay lean and link to person files.

Values (extend as needed):

Value	Meaning
`tokyo-studio-alumni`	Pre-existing student from the physical studio in Tokyo
`teacher-training-graduate`	Completed a Baseworks teacher-training program
`primer-enrollee`	First contact was enrolling in the Primer on practice.baseworks.com
`study-group-winter-2026-montreal`	First contact was Winter 2026 Montreal Study Group enrollment
`study-group-spring-2026-montreal`	First contact was Spring 2026 Montreal Study Group enrollment
`practice-community-member`	Enrolled in general Practice Sessions, no cohort or TT history
`contact-inquiry`	First touch was a contact-form submission without subsequent enrollment
`unknown`	Can’t be determined — needs manual review

Data sources for derivation (priority order):

Manual override — 02-areas/people/_entry-path-overrides.yml, a flat email: entry-path dict. Takes priority over all automated sources. Patrick and Asia add entries only for people the automation gets wrong.
Fluent CRM / WP-Fusion tags — Asia is categorizing existing CRM contacts using WP-Fusion tags based on Lead Background / Movement Industry / Lead Source fields. Once tagged, a person’s CRM tag set maps directly to an entry-path value via a lookup table defined in the aggregator. This is the preferred signal for classification because it reflects the full archetype (where the person came from, which systems they use), not just a single enrollment event.
Teacher-training completion records — any existing TT graduate list (location TBD — may be a spreadsheet Patrick can provide, or a Fluent CRM tag). If email matches, entry-path is teacher-training-graduate. Past TT graduates on baseworks.com do not yet have CRM tags; Asia is adding them as part of the tagging pass. Until those tags land, this explicit TT list fills the gap.
BuddyBoss registration date — fallback only, for people without CRM tags. Users with user_registered before 2024-11-01 default to tokyo-studio-alumni (unless a TT record or manual override says otherwise). This cutoff is based on the empirical registration histogram: pre-November 2024 had ~65 users across 8 years (never more than 8/month), all from the pre-existing network. November 2024 onward saw 42+ registrations/month driven by Open Day / Primer / Study Group pipeline. Once CRM tagging is complete for the legacy population, this heuristic becomes largely redundant.
BuddyBoss group memberships — bp_groups_members joined to bp_groups. If the person is in a cohort group, the cohort becomes their entry path (unless an earlier pathway already claimed them).
Primer course enrollment — if enrolled in Primer without any of the above, entry-path is primer-enrollee.
Fallback: unknown, flagged for manual review.

Once assigned, entry-path is sticky. A Tokyo studio alumna who later enrolls in a 2026 study group stays tokyo-studio-alumni — the entry path records how someone first came in. Current engagement is tracked via current-programs: in the person file.

Archetype comes from tags, not ID presence. The aggregator does not classify people by whether practice_site_user_id or baseworks_site_user_id is populated. A person with only a baseworks.com account, only a practice account, both, or neither is ultimately identified by their CRM tag set. ID presence is recorded as data but not used for classification logic.

Resolved 2026-04-18 — Asia is doing the CRM tagging now (including past TT graduates who don’t yet have tags). CRM tags promoted to priority #2, registration-date heuristic demoted to fallback #4. The aggregator can start while tagging is in progress; re-runs will pick up new tags as they land.

Asia’s comment: I tagged people who are past studio attendees with WB Fusion Tag “Tokyo Studio Alumni” (tag 103) we also have Custom fields in CRM: YJAT: Yes - completed teacher training at Tokyo studio

7. Historical backfill

Aggregate everything already in the vault — DMs back to 2021, event-participants back to late 2024, all forum topics and activity.
Do NOT reach back into CRM for historical records not already in the vault. CRM is read-from-now-forward for new contacts.
Teacher-training graduates without platform accounts — handled as a separate batch import from whatever list Patrick provides. These person files will be thin (name, email, entry-path, maybe certification date) and can be enriched later if the person re-engages. Not part of the automated aggregator; a one-time import step.

8. Person file shape

One file per person at 02-areas/people/{firstname-lastname}.md (or qualified slug for collisions):

---
type: person
name: "Caitlin Bartlett"
email: caitlin@example.com
practice_site_user_id: 123
baseworks_site_user_id: 456
user_login: caitlin_b
profile_url: https://practice.baseworks.com/members/caitlin_b/
first_contact: 2026-01-15
entry-path: study-group-spring-2026-montreal
current-programs: [primer-in-progress, study-group-spring-2026]
tier: active
tags: [study-group-spring-2026, primer-in-progress]
crm-tags: [lead-source-open-day, movement-industry-yoga]
last_activity: 2026-04-17
activity_entry_count: 43
---

## Programs
- [Study Group Spring 2026 questionnaire](/areas/method-admin/audience/event-participants/caitlin-bartlett-260403/)
- [Primer Segment 2 assignment response](/primer-segment-2-caitlin-260320/)

## Forum activity
- [DM: question about breathing](/2026-03-22-caitlin-thread-56/)
- [Topic: soreness after Lesson 3](/2026-04-01-soreness-after-lesson-3/) (reply)

## Session appearances
- [Session 1 — Spring 2026](/2026-04-04-session-1-spring-2026/)

Both practice_site_user_id and baseworks_site_user_id are optional; either, both, or neither may be present on any given person file. Absence of a WP ID does not imply anything — TT graduates and Tokyo alumni may have neither.
crm-tags carries the raw Fluent CRM / WP-Fusion tags for the person; entry-path is the derived classification from those tags (plus overrides). Keeping both lets the aggregator re-derive entry-path whenever the tag → entry-path lookup changes.
last_activity and activity_entry_count come from wp_bw_activity via a grouped query and drive the tier classification.

Resolved 2026-04-18 — Asia’s comment on renaming user_id is applied, and baseworks_site_user_id added as a first-class sibling field since a substantial portion of users have accounts on both sites.

Files are regenerated by a script from authoritative sources on a schedule, not hand-maintained (except ## Editorial Notes sections, which are preserved through regeneration like the forum sync).
The existing vault-index.db SQLite index has the machinery for backlinks, tag search, and frontmatter queries — the aggregator leverages it.

Sequencing

Phase 1 (complete): community-forums-groups consolidation. Stabilized filenames, IDs, and frontmatter. Completed 2026-04-14.
Phase 2 (this plan): people profiles aggregation. Start now.
Phase 3 (future): analytics / pattern surfacing on top of the aggregated layer (content-aware queries). Not in scope for Phase 2.

Implementation steps

Step 1: Formidable export plugins

One export plugin per source form, modeled on bw-questionnaire-export-cli.php (Form 74). Each runs on a schedule, writes one Markdown file per respondent with identity frontmatter.

Forms to export:

Primer segment assignment forms — one per segment (form IDs TBD, Patrick to provide)
Contact form (Form 40 on baseworks.com) — if not already captured
Any other relevant forms surfaced during implementation

Step 2: `scripts/build-people-index.py`

The core aggregator. Walks all source folders, groups by email, emits one file per unique person into 02-areas/people/.

Vault-local input sources:

02-areas/method-admin/audience/event-participants/ — questionnaire responses (both Form 40 on baseworks.com and Form 74 on practice.baseworks.com)
02-areas/practice-platform/community-forums-groups/topics/ — forum topic participation
02-areas/practice-platform/direct-messages/ — DM threads
02-areas/practice-platform/community-forums-groups/activity/ — activity digest mentions
Primer assignment exports (once Step 1 exports are in place)
02-areas/people/_entry-path-overrides.yml — manual override lookup
02-areas/people/_inquiry-filter.yml — spam allowlist/denylist for the inquiry tier

Remote data sources (read via SSH + wp db query, or lightweight admin-only REST endpoint):

wp_bw_activity on practice.baseworks.com — grouped per user: last_activity, activity_entry_count (drives tier)
wp_users on practice.baseworks.com — email, user_registered, display_name (keyed on practice_site_user_id)
wp_users on baseworks.com — email, user_registered, display_name (keyed on baseworks_site_user_id)
Fluent CRM tags per contact (keyed on email)
BuddyBoss bp_groups_members — cohort membership (drives current-programs)

Processing logic:

Walk each vault source directory, read frontmatter from every .md file
Extract email + WP user ID from each source file, keyed to the correct site. Form 40 files → baseworks_site_user_id; Form 74 and Form 61 files → practice_site_user_id. Forum and DM files → user_login (maps to practice_site_user_id via wp_users lookup)
Pull remote data: activity summary, CRM tags, cohort memberships, registration dates
Group all identity evidence by email (canonical join key)
For each unique email, compute:
- entry-path via the priority chain: override → CRM tag map → TT list → registration-date fallback → groups → Primer → unknown
- tier from wp_bw_activity recency (active / past) or contact-form-only after spam filter (inquiry / spam) or mention-only
- first_contact from earliest source-file date across all evidence
- current-programs from active cohort memberships
Write/update 02-areas/people/{firstname-lastname}.md with frontmatter + wikilink sections
Preserve any existing ## Editorial Notes section (same pattern as forum sync)

Idempotent: running it repeatedly produces the same output if inputs haven’t changed.

Sibling pattern: scripts/build-program-dashboard.py does the same “flat source, generated views” pattern scoped by cohort. build-people-index.py does it scoped by person (email). The two scripts should feel like siblings.

Step 2.5: Remote-join pass (complete 2026-04-18)

Step 2 delivered a vault-local-only aggregator that produced 81 person files from 132 source files. This follow-up pass added remote data joins to fill the fields Step 2 left empty (tier, last_activity, activity_entry_count, crm-tags, missing emails) and to merge first-name-only records with their full-name counterparts.

Status (2026-04-18 evening): all four substeps landed.

2.5.1 admin exclusion → _admin-exclude.yml drops Patrick/Asia/vendor records.
2.5.2 wp_users join → 17 person files enriched with email/display_name/user_login/user_registered; 7 first-name orphans merged into full-name records (clementine, dawson, magali, mimi, elinor).
2.5.3 wp_bw_activity → tier classification working (19 active, 47 unknown). Unknown tier covers event-participants who never registered on practice, which resolves to inquiry once the contact-form source is wired in.
2.5.4 Fluent CRM reader → pulls tags + yjat/lead_background/movement_industry/list_source from crm.baseworks.com (requires sudo wp-cli, 500 tag-carrying subscribers). Entry-path priority chain (override → YJAT=Yes → tag map → cohort → unknown) active. Asia’s Tokyo Studio Alumni tagging flows in automatically on re-run.
2.5.5 bp_groups_members → current-programs populated with Primer Community / Practice Community / Montreal Study Group cohorts.

Final count: 70 person files, down from 81 after deduplication.

Remaining Step 2.5 work (ordered):

Admin exclusion — Patrick (practice_site_user_id=1, baseworks_site_user_id=1) and Asia (practice_site_user_id=2) are currently classified as study-group enrollees because they authored topics in cohort groups. Fix: add an _admin-exclude.yml (or extend SYSTEM_USER_IDS in the script) so admins don’t get a person file at all, OR get one tagged entry-path: admin with an admin: true flag.
wp_users email lookup on both sites — the highest-value join. SSH + wp db query against both wp_users tables. For every practice_site_user_id we have, fetch email + display_name from practice; same for baseworks_site_user_id against baseworks.com. Fills missing emails on all forum/DM authors and event-participants. Also enables merging first-name-only records (e.g. clementine.md + clementine-morrigan.md → single file once the user_id resolves to the full display_name).
- Query pattern: SELECT ID, user_email, display_name, user_login, user_registered FROM wp_users WHERE ID IN (...)
- Cache result in scripts/.people-index-remote-cache.json (gitignored) with a TTL so repeat runs don’t hammer the DB.
wp_bw_activity per-user summary — drives tier. Single grouped query: SELECT user_id, MIN(activity_timestamp) AS first_activity, MAX(activity_timestamp) AS last_activity, COUNT(*) AS entry_count FROM wp_bw_activity GROUP BY user_id. Threshold: last_activity within 12 months → active, older → past. Writes last_activity and activity_entry_count on person files with matching practice_site_user_id.
Fluent CRM tag reader — reads tags per email from the CRM. Requires a new YAML lookup file 02-areas/people/_crm-tag-mapping.yml that maps CRM tag names to entry-path values (Asia can populate as she tags). Writes crm-tags frontmatter and derives entry-path via the mapping. Works today for whatever Asia has tagged; richer each re-run. Does not replace the cohort-based derivation — sits above it in the priority chain per §6.
BuddyBoss group memberships — SELECT user_id, group_id, bg.name FROM bp_groups_members bgm JOIN bp_groups bg ON bg.id = bgm.group_id WHERE is_confirmed = 1. Enriches current-programs with active cohort group memberships for people not already captured via event-participant questionnaires.

Architectural note: Steps 2.5.2 through 2.5.5 are all remote reads. Single approach: one helper module (scripts/people_remote.py or inline functions in build-people-index.py) that opens an SSH connection per site, runs the batched queries, caches results in a gitignored JSON blob (TTL ~1 hour), and exposes lookup dicts (by_practice_uid, by_baseworks_uid, by_email) the main aggregator consumes. Keeps the aggregator offline-capable for fast local dev iteration.

Cache file: scripts/.people-index-remote-cache.json — gitignored. Holds the last successful remote-query output keyed by query name with timestamps. Deleting the cache forces a fresh pull on next run.

Dependencies for Step 2.5:

SSH access to bwsite_primo_82@5.180.253.171 (shared host for both sites). Confirmed working as of 2026-04-18 (used for Journal Form 61 inspection).
wp db query on both /var/www/baseworks.com and /var/www/practice.baseworks.com. Filter deprecation warnings with grep -v '^Deprecated:'.

Step 3: Post-sync hook integration

Run the people aggregator after the forum sync cron + any export cron lands new files. Same cron-on-agents pattern as existing sync systems (System 3). No new server config, no new UI.

Cron sequence:

forum-content-sync.py runs (every 15 min) — syncs topics, DMs, groups, forums, activity
Formidable export plugins run (every 4 hours or similar) — export assignment responses
build-people-index.py runs (every 4 hours, after exports) — regenerates person files
build-vault-index.py runs (after people index) — updates SQLite index with new/changed person files

Step 4: `02-areas/people/index.md` MOC

Auto-generated or hand-maintained MOC listing all people, grouped by tier and optionally by cohort / program / entry-path.

Step 5: TT graduate batch import (separate)

One-time import of teacher-training graduates who have no platform account. Source: list from Patrick (format TBD — spreadsheet, email list, or Fluent CRM tag export). Creates thin person files with entry-path: teacher-training-graduate and minimal metadata.

Vault index extension

The current vault-index.db schema (files, links, tags, meta tables) stores file-level metadata but not individual frontmatter fields. Phase 2 queries (e.g., “all people with entry-path X”) need to filter on specific frontmatter values.

Options:

(a) Add a frontmatter table — (file_path, key, value) rows for every frontmatter field. Generic, no schema changes needed per new field. Slightly slower for typed queries.
(b) Add columns to files — entry_path, tier, email, etc. Faster queries, but requires schema migration for each new field.

Recommendation: option (a) — generic frontmatter table. Matches the “walk files + read frontmatter” pattern used everywhere else, and new fields don’t require schema changes. Add indexes on (key, value) for the fields we query most.

Entry-path registration timeline (empirical basis)

Based on user_registered histogram pulled from practice.baseworks.com on 2026-04-16:

2016-03 to 2018-09:    8 users   — platform inception / testing
2020-02 to 2020-10:    5 users   — pandemic era
2021-03 to 2021-12:   22 users   — biggest early wave (TT + Tokyo alumni onboarding)
2022-02 to 2022-11:    9 users   — trickle (existing network)
2023-02 to 2023-11:    9 users   — trickle
2024-02 to 2024-07:   10 users   — gradual pickup
2024-10:                2 users
2024-11:              42 users   — SPIKE (Open Day / Primer launch)
2024-12:              20 users
2025-01 to 2025-12:   26 users   — steady enrollment
2026-01 to 2026-04:   58 users   — Spring cohort + continued growth
                     ___________
                     ~221 total users

Two eras:

Pre-November 2024 (~65 users): low and steady, never more than 8/month. Pre-existing network (Tokyo studio alumni, TT graduates, early testers, direct outreach).
November 2024 onward (~156 users): driven by Open Day / Primer / Study Group enrollment pipeline. 42 registrations in November 2024 alone marked the inflection point.

Cutoff: user_registered before 2024-11-01 → default tokyo-studio-alumni. Asia has been asked to review this breakdown (inbox item 2026-04-16).

Estimated effort

2–3 focused sessions spread over a week:

Session A (current): plan finalization + open-question resolution. Done.
Session B: Step 2 (build-people-index.py) — core aggregator, reading from existing vault sources. Possibly Step 4 (MOC).
Session C: Step 1 (Formidable export plugins, once form IDs provided) + Step 3 (cron integration) + Step 5 (TT batch import if list available).

Each step is a separate commit. Safe to revert at any point. Daily backups to Backblaze B2 via the vault-backup cron.

Open items (not blocking implementation)

Primer form IDs — ✅ Resolved 2026-04-19. Form 66 (Squat Experience), Form 67 (Segment Feedback).
TT graduate list — ✅ Resolved 2026-04-19. Folded into the aggregator via FluentCRM yjat=Yes seed (22 records); no separate one-time import needed.
Privacy policy / terms update — still open; separate copy task, not blocking.
Widen FluentCRM webhook trigger scope — Asia to review and expand the tag set on automation #42 beyond the starter pair (74, 103). Inbox item filed 2026-04-18.

Reference

community-forums-consolidation-plan — Phase 1 (complete)
forum-content-ingestion-plan — established the forum sync System 3
bw-questionnaire-export-cli.php — existing plugin pattern to generalize
scripts/build-vault-index.py — SQLite index the aggregator leverages
scripts/build-program-dashboard.py — sibling “flat source, generated views” script
02-areas/method-admin/audience/event-participants/ — closest existing seed (one file per questionnaire respondent)
00-inbox/claude-code-shared-context.md — infrastructure context