Skip to content

Forum Content Ingestion Plan — BuddyBoss to Vault

Created 2026-04-07
Updated 2026-04-08
Status live
Tags infrastructureautomationvault-syncforumsbuddyboss

Pull forum topics, replies, group feed posts, and group metadata from practice.baseworks.com (BuddyBoss) into the Obsidian knowledge base as structured markdown. Follows the same architecture pattern as the questionnaire export system.

Status: Live. Deployed 2026-04-07. Documented as System 3 in Vault Sync Systems.


Student forum activity on practice.baseworks.com is invisible to the knowledge base. This means:

  • Claude has no access to past discussions when contextualizing responses
  • No centralized view of how students interact with content
  • Forum content can’t be cross-referenced with session summaries, taxonomy, or resources

This is Phase 1 — data ingestion only. A future Phase 2 (response-drafting agent) depends on the KB having enough depth from session summaries, taxonomy, and resources to draw from meaningfully.


  1. Forum topics — discussion threads created in group forums
  2. Replies — all user responses on those topics (inline within the topic file)
  3. Group feed posts — activity/status updates posted in group feeds (monthly digests), with inline activity comments
  4. Groups — group metadata (name, description, members) for context and wikilinks
  5. Admin direct messages — DM threads involving Patrick or Asia (added 2026-04-08). One file per conversation thread, including broadcast messages to cohorts.
  6. Thread summaries — AI-generated structured metadata (topic, category, participants, resolution, tags) for settled activity threads (added 2026-04-08)
  7. Translations — automatic English translations appended to non-English messages in activity digests and DMs (added 2026-04-08)

  • BuddyBoss REST API on practice.baseworks.com (for forums, topics, replies, groups)
  • Direct SQL queries via wp eval (for activity and direct messages — see note below)
  • No mu-plugin needed — BuddyBoss exposes endpoints natively
  • Authentication: WP-CLI internal REST dispatch as admin user (ID 8) via SSH

Cloudflare workaround: Direct HTTP API calls from the VPS are blocked by Cloudflare bot challenge (same issue as Claude CLI OAuth). The script SSHes to baseworks-web and uses wp eval to dispatch REST requests internally via PHP — no external HTTP calls needed.

REST API limitation (discovered 2026-04-08): The BuddyBoss REST API /buddyboss/v1/activity does not return group activity for private groups, even when authenticated as admin. All 4 active groups are private. Activity sync and DM sync use direct SQL queries against bp_activity and bp_messages_messages tables via wp eval to bypass this limitation.

All endpoints are under /buddyboss/v1/ (not /buddypress/v1/):

  • /wp-json/buddyboss/v1/forums — forum list
  • /wp-json/buddyboss/v1/topics — discussion topics
  • /wp-json/buddyboss/v1/reply — replies on topics
  • /wp-json/buddyboss/v1/activity — activity feed (group posts)
  • /wp-json/buddyboss/v1/groups — group metadata + members via /{id}/members

Also available: /buddyboss/v1/bb-topics (alternative topics endpoint with ordering).

FilePurposeSchedule
forum-content-sync.pyPython: API/SQL calls, markdown generation for forums, topics, activity, groups, DMsEvery 15 min
forum-content-sync.shBash wrapper: lockfile, git pull, run sync, run translations, git commit/pushEvery 15 min
translate-community-content.pyDetects non-English messages, appends English translations via Claude CLIEvery 15 min (called by sync.sh)
activity-thread-summarizer.pyClaude CLI: generates structured thread summaries for settled activity threadsDaily 5 AM UTC
activity-thread-summarizer.shBash wrapper for thread summarizerDaily 5 AM UTC
  • Cron every 15 minutes (*/15 * * * *). Changed from every-2-hours on 2026-04-08 — low server impact, keeps KB near-realtime.
  • Incremental sync: state file tracks last-sync timestamps per content type
practice.baseworks.com (BuddyBoss REST API + direct SQL via wp eval)
forum-content-sync.py on VPS (forums, topics, activity, groups, DMs)
↓ (generates markdown files)
translate-community-content.py (appends English translations)
↓ (git commit/push by forum-content-sync.sh)
Vault at /srv/baseworks/knowledge-base/
GitHub → all machines (vault-sync.sh pulls every 5 min)
↓ (post-sync hook)
vault-index.db rebuilt + qmd updated
Daily (separate cron):
activity-thread-summarizer.py → appends structured metadata to settled threads

02-areas/practice-platform/community-forums-groups/
community-forums-groups.md ← existing index (update Structure section)
community-posts/ ← existing manual posts (keep as-is)
forums/ ← one file per forum
index.md
{forum-slug}.md
topics/ ← one file per topic + inline replies
index.md
{YYMMDD}-{topic-slug}.md
groups/ ← one file per BuddyBoss group
index.md
{group-slug}.md
activity/ ← group feed posts, monthly digests
index.md
{YYYY-MM}-activity-digest.md
_thread-summary-state.json ← gitignored, tracks summarized threads
direct-messages/ ← admin DM threads (added 2026-04-08)
index.md
{YYMMDD}-{participant-slug}-thread-{id}.md
_sync-state.json ← gitignored, tracks last-sync timestamps
_translation-state.json ← gitignored, tracks translated messages

Design decisions:

  • Replies are inline within topic files under a ## Replies section — keeps each discussion as one coherent, readable document rather than scattering across dozens of files
  • Activity digests are monthly — group feed posts aggregated by month, with activity comments inline as blockquotes under their parent post
  • Activity threads get structured summaries — after 48h of inactivity, Claude CLI generates metadata (topic, category, participants, resolution, tags) appended as a blockquote at the end of each thread
  • Each activity post has a permalink**Source:** https://practice.baseworks.com/news-feed/p/{id}/ links directly to the post on the platform
  • Groups get their own files — structural entities with metadata useful for wikilink context
  • DM threads are one file per conversation — all messages inline chronologically. Only threads involving admin users (Patrick, Asia) are captured. Admin messages marked with (admin) label. Broadcast threads (messages to full cohorts) are included.
  • Non-English messages get inline translations — original language first, English translation as > **[English translation]** blockquote below. French and Japanese detected automatically.

type: forum-topic
topic-id: 5678
title: "Session 3 Summary - Moderation, Calibration Tools"
slug: session-3-summary-moderation-calibration-tools
forum: "General Discussion"
forum-id: 1234
group: "Montreal Study Group Winter 2026 Cohort"
group-id: 56
author: "Patrick Oancia"
author-id: 1
status: publish
created: 2026-02-01
last-reply: 2026-02-04
reply-count: 3
source-url: "https://practice.baseworks.com/groups/.../discussion/..."
tags: [forum-topic, community, auto-synced]
type: forum
forum-id: 1234
title: "General Discussion"
slug: general-discussion
group: "Montreal Study Group Winter 2026 Cohort"
group-id: 56
status: open
created: 2026-01-15
topic-count: 12
last-activity: 2026-04-05
tags: [forum, community, auto-synced]
type: buddyboss-group
group-id: 56
title: "Montreal Study Group Winter 2026 Cohort"
slug: montreal-study-group-winter-2026-cohort
status: public
created: 2026-01-10
member-count: 18
forum-id: 1234
tags: [group, community, auto-synced]

Activity Digest (activity/{YYYY-MM}-activity-digest.md)

Section titled “Activity Digest (activity/{YYYY-MM}-activity-digest.md)”
type: activity-digest
period: 2026-04
created: 2026-04-07
entry-count: 34
tags: [activity, community, auto-synced]

# Session 3 Summary - Moderation, Calibration Tools
**Forum:** [General Discussion](/general-discussion/)
**Group:** [Montreal Winter 2026](/areas/practice-platform/community-forums-groups/groups/montreal-study-group-winter-2026-cohort/)
**Author:** Patrick Oancia | **Posted:** 2026-02-01
---
{topic body — HTML converted to markdown}
---
## Replies
### Reply by Marie-Anne Desjardins — 2026-02-02
{reply content}
### Reply by Patrick Oancia — 2026-02-03
{reply content}
---
## Related
- [Community Forums & Groups](/areas/practice-platform/community-forums-groups/community-forums-groups/)

  • Topics: State file stores topics_last_sync timestamp. Each run fetches topics created/modified after that timestamp. For existing topics with new replies, the script compares reply-count in frontmatter vs API — re-fetches if different.
  • Forums/groups: Full fetch every run (few entities, under 10 each). Compare content hash, only write if changed.
  • Activity: State stores activity_last_sync. Fetches new posts AND checks for new comments since last sync. For each affected month, regenerates the full digest from the database.
  • Direct messages: State stores dm_last_sync. Checks bp_messages_messages for new messages since last sync. Only re-generates files for threads with new messages.
  • Translations: State tracks content hash per translated block. Only translates new or changed content. Does not re-translate existing translations.
  • Thread summaries: State tracks comment count at time of summarization. Re-summarizes if new comments appear on a previously summarized thread.

  1. Backup practice.baseworks.com database before any API discovery or site interaction (see CLAUDE-INSTRUCTIONS.md in baseworks-changelog for the mandatory backup rule)
  2. Discover active endpoints (SSH to baseworks-web: wp rest-api list or curl API index from VPS)
  3. Create Application Password on practice.baseworks.com (wp-admin → Users → Profile)
  4. Add BW_PRACTICE_API_USER / BW_PRACTICE_API_PASS to VPS ~/.bashrc (before interactive guard)
  5. Write forum-content-sync.py — API calls, markdown generation, deduplication, index updates
  6. Write forum-content-sync.sh — bash wrapper with lockfile, git pull, run Python, git commit/push
  7. Create vault directory structure + index.md files
  8. Update community-forums-groups.md structure section
  9. Add _sync-state.json to .gitignore
  10. Test manually on VPS: bash /home/patrick/scripts/forum-content-sync.sh
  11. Add cron entry: */15 * * * *
  12. Document as System 3 in Vault Sync Systems

  • curl -u user:pass https://practice.baseworks.com/wp-json/buddypress/v1/topics?per_page=1 returns data
  • Manual forum-content-sync.sh run generates expected files
  • python3 scripts/build-vault-index.py --full indexes new files without errors
  • python3 scripts/check-wikilinks.py --changed reports no broken links from new files
  • After one cron cycle, forum-content-sync.log shows successful run
  • Files appear on local machines after vault-sync.sh cycle

  • Python requests + markdownify (or html2text) on VPS
  • WordPress Application Password for practice.baseworks.com
  • VPS SSH access to baseworks-web (for endpoint discovery step only)

  • Phase 1.5 — Forum moderation + content extraction: Detect off-topic posts (billing, consent, promotions), flag for human review, auto-remove + email redirect. Also extract programming insights from on-topic posts. See Forum Moderation & Content Extraction Plan. Status: planning, pending Asia’s review.
  • Phase 2: Response-drafting agent using Claude Code CLI on VPS. Depends on KB having sufficient depth from session summaries, taxonomy, and method resources. The agent would read new topics, search the vault for relevant context, and draft responses for human review.
  • Forum auto-tagging: Open item — forum posts from Primer lesson pages don’t auto-tag with lesson/segment reference. Needs a BuddyBoss snippet or mu-plugin. Separate from this ingestion system.