IBM Technology — Why AI Agents Need an Operating System

Created 2026-05-13

Tags planningai-osknowledge-basevault-development

Source: Why AI Agents Need an Operating System — IBM Technology / Think Series (12:21)

Presenter: Bri Kopecki, AI Engineer, IBM

Status: Source review complete. Foundational vocabulary reference. Companion to nick-milo-ai-os-planning, simon-scrapes-agentic-os, chase-ai-agentic-os. Cross-source synthesis in ai-os-rework-synthesis.

Screenshots: media.baseworks.com/kb-dev/ibm-agent-os/ — NAS mirror: /volume1/baseworks/media/kb-dev/ibm-agent-os/

Applicability note: This is a 101-level conceptual framework, not an implementation guide. It does not prescribe folder structures, file formats, or specific tools. Its value is vocabulary and architecture: it names the six kernel components that the other three sources (Milo, Scrapes, Chase) all implement in different ways without naming them consistently. Reading IBM first gives a shared language for comparing the practitioner sources.

The Core Problem: Agents Without an OS

Bri Kopecki, AI Engineer at IBM — IBM Think Series

“Right now, somewhere in the world, an AI agent is booking flights, writing code, and answering customer questions, and it has absolutely no idea what it did 5 minutes ago. It’s like giving a genius goldfish the keys to your company.”

The opening frames the problem precisely. AI agents — systems that can actually do things in the world (book flights, run code, send emails, talk to other agents) — are operating without coordination infrastructure. They:

Forget everything between sessions
Don’t know what tools they have access to
Can’t explain what they did or why
Have no concept of what they shouldn’t touch
When multiple agents run together, they have no way to coordinate without conflict

This is what the video calls the “goldfish problem” — a brilliant, capable system with no continuity.

The OS Analogy

Hand-drawn diagram: OS { Windows / MacOS / Linux → AI agents

The argument is structural. A computer OS does four things:

Manages memory (what stays active, what gets stored)
Schedules tasks (who gets the CPU when, in what order)
Controls access (who can read what, what’s off-limits)
Keeps things from crashing into each other

Windows, macOS, and Linux are all variations on this structure. Without one, a computer is “just an expensive paperweight.”

An agent OS does the same things for AI agents. Agents are the applications. The agent OS kernel is the operating system. The infrastructure (computers, AI models, databases) is the hardware.

This analogy is IBM’s main contribution: a clean conceptual frame that unifies what the practitioner sources are all building.

The Three-Layer Architecture

Three-layer cake: Agents (top, with candles) / Agent OS Kernel (middle) / Infrastructure (bottom)

┌─────────────────────────────────────┐
│           AI AGENTS                 │  ← the workers
│  (travel agent, coding agent,       │
│   customer service agent...)        │
├─────────────────────────────────────┤
│         AGENT OS KERNEL             │  ← the management layer
│  (scheduler, memory, tools,         │
│   identity, observability,          │
│   guardrails + governance)          │
├─────────────────────────────────────┤
│           INFRASTRUCTURE            │  ← the hardware layer
│  (computers, AI models,             │
│   databases, tools)                 │
└─────────────────────────────────────┘

The middle layer — the kernel — is where the architectural work happens. Agents sit above it and use its services. Infrastructure sits below it and provides raw capabilities. The kernel coordinates between the two.

The Six Kernel Components

Full kernel component list with three-layer cake: scheduler/orchestrator · memory man · tool man · identity man · observability · guardrails + gov.

1. Scheduler / Orchestrator

When 10 agents all want to use the AI model at the same time, the scheduler decides who goes first. Priority is contextual: a live customer chat outranks a background summarization job. This component also coordinates sequential work between agents — routing output from one agent as input to another.

“The scheduler figures that out. Is the customer complaint more urgent than the weekly report?”

Maps to: skill chaining (Simon Scrapes), routines and cron scheduling (Chase AI)

2. Memory Manager

Fixes the goldfish problem. Three memory types:

Type	Duration	Purpose
Short-term	Current session	Active conversation context
Long-term	Weeks / persistent	What happened last week, ongoing preferences
Episodic	Pattern-based	What worked or failed last time I tried this approach

“Your HR agent remembers that you asked about parental leave last month, so when you come back, it doesn’t start from scratch.”

Maps to: Simon’s 6-level memory framework, Nick Milo’s AIOS/History/ session logs, Chase’s claude/memory auto-memory subfolder

3. Tool Manager

Agents need to interact with the real world: send emails, query databases, call APIs. The tool manager maintains a registry of available tools, enforces access permissions (who is allowed to use what), and runs tool execution in a sandbox.

“If an agent writes code and runs it, you really don’t want it accidentally deleting your production database. So the sandbox is kind of like a padded room where the agent can try things without burning down the house.”

The coding agent can write and execute Python, but only inside a specific folder scope. No access to passwords. No internet unless explicitly permitted.

Maps to: Chase’s local vs remote automation distinction (local tools only available on the machine; remote execution sandboxed by default)

4. Identity Manager

Answers: who are you, and what are you allowed to do?

Agents need credentials — short-lived tokens that expire, permissions that limit scope, and a clear chain of authorization: “this agent is acting on behalf of this user.” Every action should be traceable to a human who authorized it.

“When your travel agent books a flight using your credit card, there’s a clear audit trail of who authorized what.”

Maps to: me-patrick.md / me-asia.md (operator identity), session attribution in AIOS/History/ (who ran this session and what they authorized)

5. Observability

Every decision the agent makes, every tool it calls, every response it generates — all logged and traceable. If something goes wrong, you can rewind the tape.

“Your agent approved a refund and it shouldn’t have. With observability, you can trace back through the entire decision chain and figure out why.”

This is the “security camera system” — passive logging that becomes active only when you need to reconstruct what happened.

Maps to: AIOS/History/ session logs (Milo), session start hook logging (Simon Scrapes)

6. Guardrails + Governance

Two levels:

Guardrails — real-time filters on inputs and outputs:

Input guardrails: is someone trying to trick the agent with a malicious prompt?
Output guardrails: is the agent about to say something inappropriate or incorrect?

Governance — the policy layer:

Some actions require human approval before execution
Some data is off-limits entirely
Some decisions are too consequential to automate

Human in the loop: guardrails + gov. → human → customer $ Y/N

“Refunds under $50: automatically. Over $50: a human has to approve it.”

This is the “human in the loop” — the governance layer that defines which actions agents handle autonomously and which escalate to a person.

Maps to: no direct equivalent in the practitioner sources. All three assume a trusted human operator is always present. IBM addresses the multi-agent / enterprise context where agent actions affect real customers and money without human supervision.

The Closing Argument

Recap list: scheduler/orchestrator · memory man · tool man · identity man · observability · guardrails + gov.

“Teams that implement agent operating system first will be able to scale AI systems efficiently and reliably. Everyone else will be stuck with expensive, fragile, goldfished experiments.”

“Without it, agents are brilliant but unreliable. With it, agents become infrastructure you can actually trust.”

The final statement: “The age of AI agents is here. The question is, who’s going to be the principal?”

Mapping IBM’s Six Components to the Practitioner Sources

IBM Component	Nick Milo	Simon Scrapes	Chase AI
Scheduler / Orchestrator	Not addressed	Skill systems, cron, Channels	Org chart domain routing; Claude Desktop Routines
Memory Manager	AIOS/History/ session logs	6-level memory framework	/raw→/wiki→/projects; claude/memory subfolder
Tool Manager	Skills in AIOS/Skills/ (manually invoked)	Progressive Disclosure skills; CLIs	Skill branches per domain; local vs remote distinction
Identity Manager	`me.md` — operator identity	`user.md` + `SOUL.md`	No explicit identity file; org chart implies operator role
Observability	History logs provide passive record	Session hook logs; skill output to vault	Vault Cleanup skill; session tracking implicit
Guardrails + Governance	Not addressed (solo, trusted operator)	Not addressed	Not addressed; human-in-loop implicit only

Key observation: All three practitioner sources address components 1–5 in some form. None addresses component 6 (Guardrails + Governance) explicitly — because all three assume a single trusted operator (you) rather than a multi-agent enterprise system interacting with external parties. For Baseworks, guardrails are currently implicit in the human-always-present model. This only becomes a design concern if autonomous agents begin acting without session-by-session operator oversight.

Applicability to Baseworks

Directly applicable — as vocabulary:

The IBM framework provides the vocabulary for what the Baseworks OS needs to build. Each time a design decision comes up in implementation, these six component names clarify which layer is being addressed:

“Should the session start hook load the inbox?” → Memory Manager question
“Which skills can the VPS cron run unsupervised?” → Scheduler + Guardrails question
“How does Claude know which operator is running the session?” → Identity Manager question
“How do we trace what happened in a session after the fact?” → Observability question

Not directly applicable — implementation:

IBM gives no implementation guidance. No folder structures, no file formats, no specific tools. For implementation, the Milo / Simon Scrapes / Chase models are the reference.

The governance gap:

If Baseworks eventually moves toward autonomous multi-agent execution (e.g., scheduled agents that interact with practitioners or external contacts without Patrick or Asia in the loop), the Guardrails + Governance component becomes relevant. Not a current concern; worth flagging as a future design consideration.

AI OS Rework — Synthesis
Nick Milo AI OS — companion source
Simon Scrapes Agentic OS — companion source
Chase AI Agentic OS — companion source
Plans Index
Source video: https://www.youtube.com/watch?v=IVGjBxqygmI