Skip to content

IBM Technology — Why AI Agents Need an Operating System

Created 2026-05-13
Tags planningai-osknowledge-basevault-development

Source: Why AI Agents Need an Operating System — IBM Technology / Think Series (12:21)

Presenter: Bri Kopecki, AI Engineer, IBM

Status: Source review complete. Foundational vocabulary reference. Companion to nick-milo-ai-os-planning, simon-scrapes-agentic-os, chase-ai-agentic-os. Cross-source synthesis in ai-os-rework-synthesis.

Screenshots: media.baseworks.com/kb-dev/ibm-agent-os/ — NAS mirror: /volume1/baseworks/media/kb-dev/ibm-agent-os/

Applicability note: This is a 101-level conceptual framework, not an implementation guide. It does not prescribe folder structures, file formats, or specific tools. Its value is vocabulary and architecture: it names the six kernel components that the other three sources (Milo, Scrapes, Chase) all implement in different ways without naming them consistently. Reading IBM first gives a shared language for comparing the practitioner sources.


Bri Kopecki, AI Engineer at IBM — IBM Think Series

“Right now, somewhere in the world, an AI agent is booking flights, writing code, and answering customer questions, and it has absolutely no idea what it did 5 minutes ago. It’s like giving a genius goldfish the keys to your company.”

The opening frames the problem precisely. AI agents — systems that can actually do things in the world (book flights, run code, send emails, talk to other agents) — are operating without coordination infrastructure. They:

  • Forget everything between sessions
  • Don’t know what tools they have access to
  • Can’t explain what they did or why
  • Have no concept of what they shouldn’t touch
  • When multiple agents run together, they have no way to coordinate without conflict

This is what the video calls the “goldfish problem” — a brilliant, capable system with no continuity.


Hand-drawn diagram: OS { Windows / MacOS / Linux → AI agents

The argument is structural. A computer OS does four things:

  1. Manages memory (what stays active, what gets stored)
  2. Schedules tasks (who gets the CPU when, in what order)
  3. Controls access (who can read what, what’s off-limits)
  4. Keeps things from crashing into each other

Windows, macOS, and Linux are all variations on this structure. Without one, a computer is “just an expensive paperweight.”

An agent OS does the same things for AI agents. Agents are the applications. The agent OS kernel is the operating system. The infrastructure (computers, AI models, databases) is the hardware.

This analogy is IBM’s main contribution: a clean conceptual frame that unifies what the practitioner sources are all building.


Three-layer cake: Agents (top, with candles) / Agent OS Kernel (middle) / Infrastructure (bottom)

┌─────────────────────────────────────┐
│ AI AGENTS │ ← the workers
│ (travel agent, coding agent, │
│ customer service agent...) │
├─────────────────────────────────────┤
│ AGENT OS KERNEL │ ← the management layer
│ (scheduler, memory, tools, │
│ identity, observability, │
│ guardrails + governance) │
├─────────────────────────────────────┤
│ INFRASTRUCTURE │ ← the hardware layer
│ (computers, AI models, │
│ databases, tools) │
└─────────────────────────────────────┘

The middle layer — the kernel — is where the architectural work happens. Agents sit above it and use its services. Infrastructure sits below it and provides raw capabilities. The kernel coordinates between the two.


Full kernel component list with three-layer cake: scheduler/orchestrator · memory man · tool man · identity man · observability · guardrails + gov.

When 10 agents all want to use the AI model at the same time, the scheduler decides who goes first. Priority is contextual: a live customer chat outranks a background summarization job. This component also coordinates sequential work between agents — routing output from one agent as input to another.

“The scheduler figures that out. Is the customer complaint more urgent than the weekly report?”

Maps to: skill chaining (Simon Scrapes), routines and cron scheduling (Chase AI)


Fixes the goldfish problem. Three memory types:

TypeDurationPurpose
Short-termCurrent sessionActive conversation context
Long-termWeeks / persistentWhat happened last week, ongoing preferences
EpisodicPattern-basedWhat worked or failed last time I tried this approach

“Your HR agent remembers that you asked about parental leave last month, so when you come back, it doesn’t start from scratch.”

Maps to: Simon’s 6-level memory framework, Nick Milo’s AIOS/History/ session logs, Chase’s claude/memory auto-memory subfolder


Agents need to interact with the real world: send emails, query databases, call APIs. The tool manager maintains a registry of available tools, enforces access permissions (who is allowed to use what), and runs tool execution in a sandbox.

“If an agent writes code and runs it, you really don’t want it accidentally deleting your production database. So the sandbox is kind of like a padded room where the agent can try things without burning down the house.”

The coding agent can write and execute Python, but only inside a specific folder scope. No access to passwords. No internet unless explicitly permitted.

Maps to: Chase’s local vs remote automation distinction (local tools only available on the machine; remote execution sandboxed by default)


Answers: who are you, and what are you allowed to do?

Agents need credentials — short-lived tokens that expire, permissions that limit scope, and a clear chain of authorization: “this agent is acting on behalf of this user.” Every action should be traceable to a human who authorized it.

“When your travel agent books a flight using your credit card, there’s a clear audit trail of who authorized what.”

Maps to: me-patrick.md / me-asia.md (operator identity), session attribution in AIOS/History/ (who ran this session and what they authorized)


Every decision the agent makes, every tool it calls, every response it generates — all logged and traceable. If something goes wrong, you can rewind the tape.

“Your agent approved a refund and it shouldn’t have. With observability, you can trace back through the entire decision chain and figure out why.”

This is the “security camera system” — passive logging that becomes active only when you need to reconstruct what happened.

Maps to: AIOS/History/ session logs (Milo), session start hook logging (Simon Scrapes)


Two levels:

Guardrails — real-time filters on inputs and outputs:

  • Input guardrails: is someone trying to trick the agent with a malicious prompt?
  • Output guardrails: is the agent about to say something inappropriate or incorrect?

Governance — the policy layer:

  • Some actions require human approval before execution
  • Some data is off-limits entirely
  • Some decisions are too consequential to automate

Human in the loop: guardrails + gov. → human → customer $ Y/N

“Refunds under $50: automatically. Over $50: a human has to approve it.”

This is the “human in the loop” — the governance layer that defines which actions agents handle autonomously and which escalate to a person.

Maps to: no direct equivalent in the practitioner sources. All three assume a trusted human operator is always present. IBM addresses the multi-agent / enterprise context where agent actions affect real customers and money without human supervision.


Recap list: scheduler/orchestrator · memory man · tool man · identity man · observability · guardrails + gov.

“Teams that implement agent operating system first will be able to scale AI systems efficiently and reliably. Everyone else will be stuck with expensive, fragile, goldfished experiments.”

“Without it, agents are brilliant but unreliable. With it, agents become infrastructure you can actually trust.”

The final statement: “The age of AI agents is here. The question is, who’s going to be the principal?”


Mapping IBM’s Six Components to the Practitioner Sources

Section titled “Mapping IBM’s Six Components to the Practitioner Sources”
IBM ComponentNick MiloSimon ScrapesChase AI
Scheduler / OrchestratorNot addressedSkill systems, cron, ChannelsOrg chart domain routing; Claude Desktop Routines
Memory ManagerAIOS/History/ session logs6-level memory framework/raw→/wiki→/projects; claude/memory subfolder
Tool ManagerSkills in AIOS/Skills/ (manually invoked)Progressive Disclosure skills; CLIsSkill branches per domain; local vs remote distinction
Identity Managerme.md — operator identityuser.md + SOUL.mdNo explicit identity file; org chart implies operator role
ObservabilityHistory logs provide passive recordSession hook logs; skill output to vaultVault Cleanup skill; session tracking implicit
Guardrails + GovernanceNot addressed (solo, trusted operator)Not addressedNot addressed; human-in-loop implicit only

Key observation: All three practitioner sources address components 1–5 in some form. None addresses component 6 (Guardrails + Governance) explicitly — because all three assume a single trusted operator (you) rather than a multi-agent enterprise system interacting with external parties. For Baseworks, guardrails are currently implicit in the human-always-present model. This only becomes a design concern if autonomous agents begin acting without session-by-session operator oversight.


Directly applicable — as vocabulary:

The IBM framework provides the vocabulary for what the Baseworks OS needs to build. Each time a design decision comes up in implementation, these six component names clarify which layer is being addressed:

  • “Should the session start hook load the inbox?” → Memory Manager question
  • “Which skills can the VPS cron run unsupervised?” → Scheduler + Guardrails question
  • “How does Claude know which operator is running the session?” → Identity Manager question
  • “How do we trace what happened in a session after the fact?” → Observability question

Not directly applicable — implementation:

IBM gives no implementation guidance. No folder structures, no file formats, no specific tools. For implementation, the Milo / Simon Scrapes / Chase models are the reference.

The governance gap:

If Baseworks eventually moves toward autonomous multi-agent execution (e.g., scheduled agents that interact with practitioners or external contacts without Patrick or Asia in the loop), the Guardrails + Governance component becomes relevant. Not a current concern; worth flagging as a future design consideration.