IBM Technology — Why AI Agents Need an Operating System
Source: Why AI Agents Need an Operating System — IBM Technology / Think Series (12:21)
Presenter: Bri Kopecki, AI Engineer, IBM
Status: Source review complete. Foundational vocabulary reference. Companion to nick-milo-ai-os-planning, simon-scrapes-agentic-os, chase-ai-agentic-os. Cross-source synthesis in ai-os-rework-synthesis.
Screenshots: media.baseworks.com/kb-dev/ibm-agent-os/ — NAS mirror: /volume1/baseworks/media/kb-dev/ibm-agent-os/
Applicability note: This is a 101-level conceptual framework, not an implementation guide. It does not prescribe folder structures, file formats, or specific tools. Its value is vocabulary and architecture: it names the six kernel components that the other three sources (Milo, Scrapes, Chase) all implement in different ways without naming them consistently. Reading IBM first gives a shared language for comparing the practitioner sources.
The Core Problem: Agents Without an OS
Section titled “The Core Problem: Agents Without an OS”
“Right now, somewhere in the world, an AI agent is booking flights, writing code, and answering customer questions, and it has absolutely no idea what it did 5 minutes ago. It’s like giving a genius goldfish the keys to your company.”
The opening frames the problem precisely. AI agents — systems that can actually do things in the world (book flights, run code, send emails, talk to other agents) — are operating without coordination infrastructure. They:
- Forget everything between sessions
- Don’t know what tools they have access to
- Can’t explain what they did or why
- Have no concept of what they shouldn’t touch
- When multiple agents run together, they have no way to coordinate without conflict
This is what the video calls the “goldfish problem” — a brilliant, capable system with no continuity.
The OS Analogy
Section titled “The OS Analogy”
The argument is structural. A computer OS does four things:
- Manages memory (what stays active, what gets stored)
- Schedules tasks (who gets the CPU when, in what order)
- Controls access (who can read what, what’s off-limits)
- Keeps things from crashing into each other
Windows, macOS, and Linux are all variations on this structure. Without one, a computer is “just an expensive paperweight.”
An agent OS does the same things for AI agents. Agents are the applications. The agent OS kernel is the operating system. The infrastructure (computers, AI models, databases) is the hardware.
This analogy is IBM’s main contribution: a clean conceptual frame that unifies what the practitioner sources are all building.
The Three-Layer Architecture
Section titled “The Three-Layer Architecture”
┌─────────────────────────────────────┐│ AI AGENTS │ ← the workers│ (travel agent, coding agent, ││ customer service agent...) │├─────────────────────────────────────┤│ AGENT OS KERNEL │ ← the management layer│ (scheduler, memory, tools, ││ identity, observability, ││ guardrails + governance) │├─────────────────────────────────────┤│ INFRASTRUCTURE │ ← the hardware layer│ (computers, AI models, ││ databases, tools) │└─────────────────────────────────────┘The middle layer — the kernel — is where the architectural work happens. Agents sit above it and use its services. Infrastructure sits below it and provides raw capabilities. The kernel coordinates between the two.
The Six Kernel Components
Section titled “The Six Kernel Components”
1. Scheduler / Orchestrator
Section titled “1. Scheduler / Orchestrator”When 10 agents all want to use the AI model at the same time, the scheduler decides who goes first. Priority is contextual: a live customer chat outranks a background summarization job. This component also coordinates sequential work between agents — routing output from one agent as input to another.
“The scheduler figures that out. Is the customer complaint more urgent than the weekly report?”
Maps to: skill chaining (Simon Scrapes), routines and cron scheduling (Chase AI)
2. Memory Manager
Section titled “2. Memory Manager”Fixes the goldfish problem. Three memory types:
| Type | Duration | Purpose |
|---|---|---|
| Short-term | Current session | Active conversation context |
| Long-term | Weeks / persistent | What happened last week, ongoing preferences |
| Episodic | Pattern-based | What worked or failed last time I tried this approach |
“Your HR agent remembers that you asked about parental leave last month, so when you come back, it doesn’t start from scratch.”
Maps to: Simon’s 6-level memory framework, Nick Milo’s AIOS/History/ session logs, Chase’s claude/memory auto-memory subfolder
3. Tool Manager
Section titled “3. Tool Manager”Agents need to interact with the real world: send emails, query databases, call APIs. The tool manager maintains a registry of available tools, enforces access permissions (who is allowed to use what), and runs tool execution in a sandbox.
“If an agent writes code and runs it, you really don’t want it accidentally deleting your production database. So the sandbox is kind of like a padded room where the agent can try things without burning down the house.”
The coding agent can write and execute Python, but only inside a specific folder scope. No access to passwords. No internet unless explicitly permitted.
Maps to: Chase’s local vs remote automation distinction (local tools only available on the machine; remote execution sandboxed by default)
4. Identity Manager
Section titled “4. Identity Manager”Answers: who are you, and what are you allowed to do?
Agents need credentials — short-lived tokens that expire, permissions that limit scope, and a clear chain of authorization: “this agent is acting on behalf of this user.” Every action should be traceable to a human who authorized it.
“When your travel agent books a flight using your credit card, there’s a clear audit trail of who authorized what.”
Maps to: me-patrick.md / me-asia.md (operator identity), session attribution in AIOS/History/ (who ran this session and what they authorized)
5. Observability
Section titled “5. Observability”Every decision the agent makes, every tool it calls, every response it generates — all logged and traceable. If something goes wrong, you can rewind the tape.
“Your agent approved a refund and it shouldn’t have. With observability, you can trace back through the entire decision chain and figure out why.”
This is the “security camera system” — passive logging that becomes active only when you need to reconstruct what happened.
Maps to: AIOS/History/ session logs (Milo), session start hook logging (Simon Scrapes)
6. Guardrails + Governance
Section titled “6. Guardrails + Governance”Two levels:
Guardrails — real-time filters on inputs and outputs:
- Input guardrails: is someone trying to trick the agent with a malicious prompt?
- Output guardrails: is the agent about to say something inappropriate or incorrect?
Governance — the policy layer:
- Some actions require human approval before execution
- Some data is off-limits entirely
- Some decisions are too consequential to automate

“Refunds under $50: automatically. Over $50: a human has to approve it.”
This is the “human in the loop” — the governance layer that defines which actions agents handle autonomously and which escalate to a person.
Maps to: no direct equivalent in the practitioner sources. All three assume a trusted human operator is always present. IBM addresses the multi-agent / enterprise context where agent actions affect real customers and money without human supervision.
The Closing Argument
Section titled “The Closing Argument”
“Teams that implement agent operating system first will be able to scale AI systems efficiently and reliably. Everyone else will be stuck with expensive, fragile, goldfished experiments.”
“Without it, agents are brilliant but unreliable. With it, agents become infrastructure you can actually trust.”
The final statement: “The age of AI agents is here. The question is, who’s going to be the principal?”
Mapping IBM’s Six Components to the Practitioner Sources
Section titled “Mapping IBM’s Six Components to the Practitioner Sources”| IBM Component | Nick Milo | Simon Scrapes | Chase AI |
|---|---|---|---|
| Scheduler / Orchestrator | Not addressed | Skill systems, cron, Channels | Org chart domain routing; Claude Desktop Routines |
| Memory Manager | AIOS/History/ session logs | 6-level memory framework | /raw→/wiki→/projects; claude/memory subfolder |
| Tool Manager | Skills in AIOS/Skills/ (manually invoked) | Progressive Disclosure skills; CLIs | Skill branches per domain; local vs remote distinction |
| Identity Manager | me.md — operator identity | user.md + SOUL.md | No explicit identity file; org chart implies operator role |
| Observability | History logs provide passive record | Session hook logs; skill output to vault | Vault Cleanup skill; session tracking implicit |
| Guardrails + Governance | Not addressed (solo, trusted operator) | Not addressed | Not addressed; human-in-loop implicit only |
Key observation: All three practitioner sources address components 1–5 in some form. None addresses component 6 (Guardrails + Governance) explicitly — because all three assume a single trusted operator (you) rather than a multi-agent enterprise system interacting with external parties. For Baseworks, guardrails are currently implicit in the human-always-present model. This only becomes a design concern if autonomous agents begin acting without session-by-session operator oversight.
Applicability to Baseworks
Section titled “Applicability to Baseworks”Directly applicable — as vocabulary:
The IBM framework provides the vocabulary for what the Baseworks OS needs to build. Each time a design decision comes up in implementation, these six component names clarify which layer is being addressed:
- “Should the session start hook load the inbox?” → Memory Manager question
- “Which skills can the VPS cron run unsupervised?” → Scheduler + Guardrails question
- “How does Claude know which operator is running the session?” → Identity Manager question
- “How do we trace what happened in a session after the fact?” → Observability question
Not directly applicable — implementation:
IBM gives no implementation guidance. No folder structures, no file formats, no specific tools. For implementation, the Milo / Simon Scrapes / Chase models are the reference.
The governance gap:
If Baseworks eventually moves toward autonomous multi-agent execution (e.g., scheduled agents that interact with practitioners or external contacts without Patrick or Asia in the loop), the Guardrails + Governance component becomes relevant. Not a current concern; worth flagging as a future design consideration.
Related
Section titled “Related”- AI OS Rework — Synthesis
- Nick Milo AI OS — companion source
- Simon Scrapes Agentic OS — companion source
- Chase AI Agentic OS — companion source
- Plans Index
- Source video: https://www.youtube.com/watch?v=IVGjBxqygmI