Forum Content Ingestion Plan — BuddyBoss to Vault
Overview
Section titled “Overview”Pull forum topics, replies, group feed posts, and group metadata from practice.baseworks.com (BuddyBoss) into the Obsidian knowledge base as structured markdown. Follows the same architecture pattern as the questionnaire export system.
Status: Live. Deployed 2026-04-07. Documented as System 3 in Vault Sync Systems.
Student forum activity on practice.baseworks.com is invisible to the knowledge base. This means:
- Claude has no access to past discussions when contextualizing responses
- No centralized view of how students interact with content
- Forum content can’t be cross-referenced with session summaries, taxonomy, or resources
This is Phase 1 — data ingestion only. A future Phase 2 (response-drafting agent) depends on the KB having enough depth from session summaries, taxonomy, and resources to draw from meaningfully.
What Gets Pulled In
Section titled “What Gets Pulled In”- Forum topics — discussion threads created in group forums
- Replies — all user responses on those topics (inline within the topic file)
- Group feed posts — activity/status updates posted in group feeds (monthly digests), with inline activity comments
- Groups — group metadata (name, description, members) for context and wikilinks
- Admin direct messages — DM threads involving Patrick or Asia (added 2026-04-08). One file per conversation thread, including broadcast messages to cohorts.
- Thread summaries — AI-generated structured metadata (topic, category, participants, resolution, tags) for settled activity threads (added 2026-04-08)
- Translations — automatic English translations appended to non-English messages in activity digests and DMs (added 2026-04-08)
Architecture
Section titled “Architecture”Data Source
Section titled “Data Source”- BuddyBoss REST API on practice.baseworks.com (for forums, topics, replies, groups)
- Direct SQL queries via
wp eval(for activity and direct messages — see note below) - No mu-plugin needed — BuddyBoss exposes endpoints natively
- Authentication: WP-CLI internal REST dispatch as admin user (ID 8) via SSH
Cloudflare workaround: Direct HTTP API calls from the VPS are blocked by Cloudflare bot challenge (same issue as Claude CLI OAuth). The script SSHes to baseworks-web and uses wp eval to dispatch REST requests internally via PHP — no external HTTP calls needed.
REST API limitation (discovered 2026-04-08): The BuddyBoss REST API /buddyboss/v1/activity does not return group activity for private groups, even when authenticated as admin. All 4 active groups are private. Activity sync and DM sync use direct SQL queries against bp_activity and bp_messages_messages tables via wp eval to bypass this limitation.
Key Endpoints (verified 2026-04-07)
Section titled “Key Endpoints (verified 2026-04-07)”All endpoints are under /buddyboss/v1/ (not /buddypress/v1/):
/wp-json/buddyboss/v1/forums— forum list/wp-json/buddyboss/v1/topics— discussion topics/wp-json/buddyboss/v1/reply— replies on topics/wp-json/buddyboss/v1/activity— activity feed (group posts)/wp-json/buddyboss/v1/groups— group metadata + members via/{id}/members
Also available: /buddyboss/v1/bb-topics (alternative topics endpoint with ordering).
Scripts on VPS (/home/patrick/scripts/)
Section titled “Scripts on VPS (/home/patrick/scripts/)”| File | Purpose | Schedule |
|---|---|---|
forum-content-sync.py | Python: API/SQL calls, markdown generation for forums, topics, activity, groups, DMs | Every 15 min |
forum-content-sync.sh | Bash wrapper: lockfile, git pull, run sync, run translations, git commit/push | Every 15 min |
translate-community-content.py | Detects non-English messages, appends English translations via Claude CLI | Every 15 min (called by sync.sh) |
activity-thread-summarizer.py | Claude CLI: generates structured thread summaries for settled activity threads | Daily 5 AM UTC |
activity-thread-summarizer.sh | Bash wrapper for thread summarizer | Daily 5 AM UTC |
Schedule
Section titled “Schedule”- Cron every 15 minutes (
*/15 * * * *). Changed from every-2-hours on 2026-04-08 — low server impact, keeps KB near-realtime. - Incremental sync: state file tracks last-sync timestamps per content type
Data Flow
Section titled “Data Flow”practice.baseworks.com (BuddyBoss REST API + direct SQL via wp eval) ↓forum-content-sync.py on VPS (forums, topics, activity, groups, DMs) ↓ (generates markdown files)translate-community-content.py (appends English translations) ↓ (git commit/push by forum-content-sync.sh)Vault at /srv/baseworks/knowledge-base/ ↓GitHub → all machines (vault-sync.sh pulls every 5 min) ↓ (post-sync hook)vault-index.db rebuilt + qmd updated
Daily (separate cron):activity-thread-summarizer.py → appends structured metadata to settled threadsVault Structure
Section titled “Vault Structure”02-areas/practice-platform/community-forums-groups/ community-forums-groups.md ← existing index (update Structure section) community-posts/ ← existing manual posts (keep as-is) forums/ ← one file per forum index.md {forum-slug}.md topics/ ← one file per topic + inline replies index.md {YYMMDD}-{topic-slug}.md groups/ ← one file per BuddyBoss group index.md {group-slug}.md activity/ ← group feed posts, monthly digests index.md {YYYY-MM}-activity-digest.md _thread-summary-state.json ← gitignored, tracks summarized threads direct-messages/ ← admin DM threads (added 2026-04-08) index.md {YYMMDD}-{participant-slug}-thread-{id}.md _sync-state.json ← gitignored, tracks last-sync timestamps _translation-state.json ← gitignored, tracks translated messagesDesign decisions:
- Replies are inline within topic files under a
## Repliessection — keeps each discussion as one coherent, readable document rather than scattering across dozens of files - Activity digests are monthly — group feed posts aggregated by month, with activity comments inline as blockquotes under their parent post
- Activity threads get structured summaries — after 48h of inactivity, Claude CLI generates metadata (topic, category, participants, resolution, tags) appended as a blockquote at the end of each thread
- Each activity post has a permalink —
**Source:** https://practice.baseworks.com/news-feed/p/{id}/links directly to the post on the platform - Groups get their own files — structural entities with metadata useful for wikilink context
- DM threads are one file per conversation — all messages inline chronologically. Only threads involving admin users (Patrick, Asia) are captured. Admin messages marked with (admin) label. Broadcast threads (messages to full cohorts) are included.
- Non-English messages get inline translations — original language first, English translation as
> **[English translation]**blockquote below. French and Japanese detected automatically.
Frontmatter Schemas
Section titled “Frontmatter Schemas”Topic (topics/{YYMMDD}-{topic-slug}.md)
Section titled “Topic (topics/{YYMMDD}-{topic-slug}.md)”type: forum-topictopic-id: 5678title: "Session 3 Summary - Moderation, Calibration Tools"slug: session-3-summary-moderation-calibration-toolsforum: "General Discussion"forum-id: 1234group: "Montreal Study Group Winter 2026 Cohort"group-id: 56author: "Patrick Oancia"author-id: 1status: publishcreated: 2026-02-01last-reply: 2026-02-04reply-count: 3source-url: "https://practice.baseworks.com/groups/.../discussion/..."tags: [forum-topic, community, auto-synced]Forum (forums/{forum-slug}.md)
Section titled “Forum (forums/{forum-slug}.md)”type: forumforum-id: 1234title: "General Discussion"slug: general-discussiongroup: "Montreal Study Group Winter 2026 Cohort"group-id: 56status: opencreated: 2026-01-15topic-count: 12last-activity: 2026-04-05tags: [forum, community, auto-synced]Group (groups/{group-slug}.md)
Section titled “Group (groups/{group-slug}.md)”type: buddyboss-groupgroup-id: 56title: "Montreal Study Group Winter 2026 Cohort"slug: montreal-study-group-winter-2026-cohortstatus: publiccreated: 2026-01-10member-count: 18forum-id: 1234tags: [group, community, auto-synced]Activity Digest (activity/{YYYY-MM}-activity-digest.md)
Section titled “Activity Digest (activity/{YYYY-MM}-activity-digest.md)”type: activity-digestperiod: 2026-04created: 2026-04-07entry-count: 34tags: [activity, community, auto-synced]Topic File Body Format
Section titled “Topic File Body Format”# Session 3 Summary - Moderation, Calibration Tools
**Forum:** [General Discussion](/general-discussion/)**Group:** [Montreal Winter 2026](/areas/practice-platform/community-forums-groups/groups/montreal-study-group-winter-2026-cohort/)**Author:** Patrick Oancia | **Posted:** 2026-02-01
---
{topic body — HTML converted to markdown}
---
## Replies
### Reply by Marie-Anne Desjardins — 2026-02-02
{reply content}
### Reply by Patrick Oancia — 2026-02-03
{reply content}
---
## Related
- [Community Forums & Groups](/areas/practice-platform/community-forums-groups/community-forums-groups/)Incremental Sync Strategy
Section titled “Incremental Sync Strategy”- Topics: State file stores
topics_last_synctimestamp. Each run fetches topics created/modified after that timestamp. For existing topics with new replies, the script comparesreply-countin frontmatter vs API — re-fetches if different. - Forums/groups: Full fetch every run (few entities, under 10 each). Compare content hash, only write if changed.
- Activity: State stores
activity_last_sync. Fetches new posts AND checks for new comments since last sync. For each affected month, regenerates the full digest from the database. - Direct messages: State stores
dm_last_sync. Checksbp_messages_messagesfor new messages since last sync. Only re-generates files for threads with new messages. - Translations: State tracks content hash per translated block. Only translates new or changed content. Does not re-translate existing translations.
- Thread summaries: State tracks comment count at time of summarization. Re-summarizes if new comments appear on a previously summarized thread.
Implementation Steps
Section titled “Implementation Steps”- Backup practice.baseworks.com database before any API discovery or site interaction (see
CLAUDE-INSTRUCTIONS.mdin baseworks-changelog for the mandatory backup rule) - Discover active endpoints (SSH to baseworks-web:
wp rest-api listor curl API index from VPS) - Create Application Password on practice.baseworks.com (wp-admin → Users → Profile)
- Add
BW_PRACTICE_API_USER/BW_PRACTICE_API_PASSto VPS~/.bashrc(before interactive guard) - Write
forum-content-sync.py— API calls, markdown generation, deduplication, index updates - Write
forum-content-sync.sh— bash wrapper with lockfile, git pull, run Python, git commit/push - Create vault directory structure + index.md files
- Update
community-forums-groups.mdstructure section - Add
_sync-state.jsonto.gitignore - Test manually on VPS:
bash /home/patrick/scripts/forum-content-sync.sh - Add cron entry:
*/15 * * * * - Document as System 3 in Vault Sync Systems
Verification Checklist
Section titled “Verification Checklist”-
curl -u user:pass https://practice.baseworks.com/wp-json/buddypress/v1/topics?per_page=1returns data - Manual
forum-content-sync.shrun generates expected files -
python3 scripts/build-vault-index.py --fullindexes new files without errors -
python3 scripts/check-wikilinks.py --changedreports no broken links from new files - After one cron cycle,
forum-content-sync.logshows successful run - Files appear on local machines after vault-sync.sh cycle
Dependencies
Section titled “Dependencies”- Python
requests+markdownify(orhtml2text) on VPS - WordPress Application Password for practice.baseworks.com
- VPS SSH access to baseworks-web (for endpoint discovery step only)
Future Phases (not designed)
Section titled “Future Phases (not designed)”- Phase 1.5 — Forum moderation + content extraction: Detect off-topic posts (billing, consent, promotions), flag for human review, auto-remove + email redirect. Also extract programming insights from on-topic posts. See Forum Moderation & Content Extraction Plan. Status: planning, pending Asia’s review.
- Phase 2: Response-drafting agent using Claude Code CLI on VPS. Depends on KB having sufficient depth from session summaries, taxonomy, and method resources. The agent would read new topics, search the vault for relevant context, and draft responses for human review.
- Forum auto-tagging: Open item — forum posts from Primer lesson pages don’t auto-tag with lesson/segment reference. Needs a BuddyBoss snippet or mu-plugin. Separate from this ingestion system.
Related
Section titled “Related”- community-forums-consolidation-plan — next step: consolidate duplicate folders, improve naming and metadata
- Vault SQLite Index Sync Plan — sibling infrastructure plan
- Vault Sync Systems — architecture pattern and existing System 1 (questionnaire export)
- Segment Feedback Automation Plan — parallel System 2 (not yet built)
- Practice Site Platform Infrastructure
- Community Forums & Groups