JarvisCore 1.0.0: An Open Source Multi-Agent Framework Built for What Agents Actually Do

JarvisCore 1.0.0 is now available under the Apache 2.0 license. This is the first stable release, and it is the right moment to explain what we built and why.

The multi-agent space is crowded. CrewAI, AutoGen, LangGraph, OpenAI Swarm, and others are all serious projects with real users. We did not build JarvisCore because those frameworks are bad. We built it because we had a clear set of opinions about what multi-agent systems should look like in production, and those opinions drove us to different architectural decisions in almost every layer of the stack.

This post explains five of them.

Agents Should Be Peers, Not Workers

Most orchestration frameworks model agents as workers that execute tasks assigned by a central orchestrator. There is a workflow graph, a task queue, or a step-by-step plan. Agents receive work, do it, and report back. The orchestrator is always in control.

This model is fine for many workflows. But it breaks down when you want agents that respond to events, initiate tasks based on what they observe, or coordinate directly with other agents without routing everything through a central process. Reactive workers cannot do that. Proactive peers can.

JarvisCore's P2P layer is built on the SWIM protocol for gossip-based membership and ZMQ for message transport. Every node in a JarvisCore mesh runs a P2PCoordinator that manages discovery, capability announcements, and message routing. Agents do not just wait for instructions. They can discover who else is on the mesh, query peers for their capabilities, and initiate communication directly:

analysts = self.peers.discover(role="analyst", strategy="round_robin")
count = await self.peers.broadcast({"type": "new_data_available", "source": self.agent_id})

A ListenerAgent can run continuously in the background, handling incoming messages from the mesh without a workflow loop. JarvisLifespan integrates this with FastAPI in three lines. When an agent joins a cluster late, it calls request_peer_capabilities() to catch up on what it missed during mesh formation.

The cognitive context system extends this further. Agents can call get_cognitive_context() to receive a live view of what peers are doing, which lets them adapt their own behaviour without being explicitly told anything changed.

Tools That Write, Verify, and Graduate Themselves

The industry has converged on MCP as the standard transport between AI clients and the tools agents use. MCP servers for Slack, GitHub, Salesforce, Google Calendar, and hundreds of other providers are being written today. It is a reasonable choice. We made a different one.

The complaint about MCP that we keep hearing is context bloat. Every MCP server exposes a schema to the agent, and agents have to understand what those schemas mean before they can use them. In a multi-agent system with many connected providers, this becomes a significant part of the context window before the agent has done any work. For proactive agents that need to act on changing conditions, static tool registrations are also a constraint: the tool has to exist before the agent can use it.

JarvisCore uses a FunctionRegistry with three concepts: atoms, bundles, and graduation stages.

An atom is a versioned Python function stored on disk with a SHA256 integrity hash. It has metadata describing the system it belongs to, its capabilities, and its execution history. A bundle is a {System}Capabilities class that groups related atoms for a provider into a single interface.

Functions move through three graduation stages based on their execution history:

Candidate: just generated or registered, no execution history yet.
Verified: has at least one successful execution.
Golden: has at least five successful executions and is considered production-ready.

The FunctionRegistry ships with 47 connected-app bundles covering approximately 310 atoms, spanning communication tools, development platforms, CRM systems, analytics, and infrastructure providers. These were seeded from production function registries and carry real execution history.

When an agent needs a function that does not exist yet, the CoderSubAgent generates it at request time using the REACTPP code generation discipline: identify the provider and action, validate prerequisites, write code that stores its result in a canonical variable, and self-repair on failures. Generated functions enter the registry as candidates and graduate through verified and golden as they accumulate successful executions.

When an existing function's API changes and the agent detects a failure, it can regenerate the atom, validate the new version, and promote it in the registry. The tools stay current without manual intervention from the developer.

Human-in-the-Loop as a Framework Primitive

Most frameworks treat human oversight as something you add to a workflow. You write a step that pauses execution and sends a notification somewhere, then resume when you get a response. This works, but it means every team building agents has to invent their own escalation logic.

In JarvisCore, HITL is part of the framework. Every agent gets a self.hitl object injected at start time by the Mesh. From inside any agent, the call looks like this:

item_id = await self.hitl.request(
    title="Approve outbound customer email",
    content=email_draft,
    urgency="high",
    context={"customer_id": customer_id, "step": "draft_review"},
)
resolution = await self.hitl.wait(item_id, timeout=3600)

The HITLQueue persists requests to both a Redis store and flat JSON files simultaneously, so file-based dashboards can poll for new items without a Redis dependency. The contracts layer defines typed HITLRequest and HITLResolution models with a fixed decision vocabulary: approve, reject, defer, escalate. The kernel validates decisions against this schema rather than accepting arbitrary strings.

Content guards are enforced at the framework level. HITL payloads are capped at 2,000 characters for the review content and 1,000 characters for the structured context field. Agents cannot send multi-kilobyte data dumps to the review queue. What reviewers see is always a human-readable summary, not a raw data export.

This connects to something in the agent profile system worth naming. Each agent in JarvisCore can load a typed YAML profile that declares, among other things, an escalates_to list and a sops list. The escalates_to field names the people or roles to contact via HITL when the agent is blocked — it is the same data the HITL queue uses to route escalations, so there is no separate configuration for who gets notified. The sops field is a list of standing operating procedures rendered as numbered instructions with the directive to follow them autonomously, without being asked. This is not personality framing the way CrewAI's backstory or AutoGen's system message is. An SOP like "always cross-reference findings across at least three independent sources before asserting a claim" is a procedural constraint that runs before any task instruction arrives. The agent does not need to be told. It is already in the profile.

Memory Is a Hard Problem and We Have Taken It Seriously

Agent memory is one of the most difficult open problems in this space, and anyone who tells you they have fully solved it is overstating things. We have not solved it either. What we have done is build a memory architecture that is more structured than what most frameworks provide, and we are actively researching the next layer.

JarvisCore's memory system is organised into four tiers via UnifiedMemory, which gives the kernel a single interface across all of them:

Working scratchpad: blob-backed intermediate storage for the current OODA loop turn. Lives only as long as the step.
Episodic ledger: Redis-backed log of every OODA turn across a workflow. The kernel uses this to rehydrate context after a crash or cold start.
Long-term memory: requires both Redis and blob storage. Checkpoints and cross-session artifacts that survive beyond individual workflow runs.
Athena MemOS (optional): a fourth tier that bridges to Athena, Prescott Data's internal memory research system. When configured, every turn is also written to Athena's short-term memory as a typed event, and the kernel can pull semantic chains from Athena's medium-term memory during context rehydration. This gives agents cross-session, semantically searchable memory rather than just the raw Redis event stream.

Every tier is optional and the system degrades gracefully. A pure in-memory run with no persistence backends works for development and testing. A fully configured deployment with Redis, blob storage, and Athena gives agents genuinely long-horizon memory.

We are planning to open source the Athena memory system separately. When that happens, any JarvisCore agent can be wired into a four-tier memory architecture with a single configuration change.

Agents Should Never Hold Credentials

When an agent integrates with an external system — Slack, GitHub, Salesforce, a payment processor — it needs credentials to authenticate. The standard approach is to put those credentials in the agent's environment: environment variables, a secrets manager that the agent queries at runtime, or a configuration file the agent reads on startup. The agent holds the credentials while it runs.

We think this is wrong. Credentials in an agent's environment are credentials in the LLM's execution context. If the agent produces a tool call that is not what you intended, or if there is a prompt injection in external content the agent processed, those credentials are reachable. The blast radius of a compromised agent is the full scope of everything it is authenticated to.

JarvisCore's integration with Nexus changes this boundary. Agents request tokens through the Nexus credential proxy. They never see the underlying client ID, client secret, or API key. Credentials are stored in a local encrypted store using AES-256-GCM with a key derived from PBKDF2-HMAC-SHA256. They are never written in plaintext and are never passed to the agent process. If the Nexus gateway is configured, agents authenticate through the full Nexus Broker and Bridge stack, where the OAuth flow, token refresh, and scope enforcement all happen outside the agent's process boundary.

A compromised agent can make API calls on behalf of the user, within the scopes that were granted. It cannot exfiltrate the underlying credentials, because it never had them.

The 1.0.0 Stable API

The full infrastructure that shipped in the 0.x series — the Kernel OODA loop, UnifiedMemory, the distributed WorkflowEngine, MailboxManager, FunctionRegistry, and Nexus authentication — is now at a stable API surface. Breaking changes before 2.0.0 require a deprecation cycle.

The license switches from MIT to Apache 2.0 with this release. Apache 2.0 includes an explicit patent grant, which matters for enterprise legal review. CLA documents for individual and corporate contributors are in the repository.

1.0.0 also ships the investment committee reference implementation: a seven-agent system running a full investment committee workflow with parallel step execution, a FastAPI dashboard on port 8004, and long-term institutional memory via UnifiedMemory. It is the most complete demonstration of AutoAgent and CustomAgent working together in a production-grade pipeline.

Get Started

pip install jarviscore==1.0.0

The full documentation, guides, and examples are at the JarvisCore documentation site. The source is on GitHub.