v1.1 — LangGraph-Inspired Improvements

Goal: Adopt key patterns from LangGraph analysis to harden the orchestrator before scaling.

Source: Deep analysis of langchain-ai/langgraph — 30 markdown files covering core engine, checkpoint system, prebuilt agents, SDK, CLI, and internals.

Analysis files: analysis/langgraph/

Key Findings

What LangGraph Does Better	What We Do Better
Channel-based state with typed reducers	True provider-agnostic ABC (swap Claude/GPT/Gemini/local)
First-class interrupt/resume (HITL)	Cost-aware routing (6 strategies)
Content-addressed checkpoint blobs	Agent cooperation protocols (delegation, conflict resolution)
7 stream modes with SSE reconnection	Budget enforcement (per task/session/day)
Task-level result caching	Provider health monitoring with auto-failover
Conformance test suite for checkpointers

Full comparison: analysis/langgraph/28-comparison.md

Sprint 1: State & Caching ✅

Task	Inspired By	Status
Channel-based state with reducers	03-channels	✅ `core/channels.py`
Task-level result caching	15-cache	✅ `core/cache.py`
Conformance test suite	16-conformance-tests	✅ `core/conformance.py`

Channel-Based State

Each state field maps to a typed channel with explicit concurrency semantics:

LastValue — single writer per step (error on conflict)
BinaryOperatorAggregate — fold concurrent writes via reducer (e.g., operator.add)
Topic — append all writes (pubsub)

This solves concurrent agent writes to shared state — our current biggest gap.

Task-Level Caching

Cache skill/node results by input hash. CachePolicy per skill. InMemory backend first, Redis later. Skip re-execution on cache hit. Expected to reduce redundant LLM calls by 30%+.

Conformance Tests

Capability-based test harness for Provider and Checkpoint interfaces. Any new implementation runs against it automatically. LangGraph's suite covers 47+ test cases across 8 capabilities.

Sprint 2: HITL & Memory ✅

Task	Inspired By	Status
Interrupt/resume (HITL)	19-human-in-the-loop	✅ `core/graph.py`
Store abstraction	14-store	✅ `core/store.py`
Skill middleware pattern	18-tool-node	✅ `core/skill.py`

Interrupt/Resume

GraphInterrupt pauses graph execution, persists state to checkpoint. Resume with resume_from checkpoint ID + human_input dict merged into state. Interrupt is control flow, not an error.

Supports: HUMAN_INPUT, APPROVAL, CUSTOM interrupt types. Works in both single-node and parallel execution.

Store (Cross-Agent Memory)

BaseStore with namespace-based hierarchy (("users", "alice")), aget/aput/adelete/asearch/alist_namespaces. Filter operators: $eq, $ne, $gt, $gte, $lt, $lte. TTL support with lazy expiration.

InMemoryStore implementation with 13 conformance tests. Use cases: user profiles, shared knowledge base, agent learning.

Skill Middleware

SkillMiddleware(request, next_fn) -> result pattern with immutable SkillRequest and override() for non-destructive modification. Middlewares compose in registration order (first = outermost).

Built-in middlewares: logging_middleware, retry_middleware, timeout_middleware.

Sprint 3: Persistence & Streaming

Task	Inspired By	Priority
Content-addressed checkpoint blobs	13-checkpoint-postgres	Medium
Anti-stall via managed values	09-managed-values	Medium
Encrypted serialization	11-checkpoint-serialization	Low
SSE streaming improvements	27-streaming	Low

Content-Addressed Blobs

Split complex checkpoint values into a checkpoint_blobs table keyed by (thread, ns, channel, version). Same blob shared across checkpoints via ON CONFLICT DO NOTHING. Massive storage savings for long-running agents.

Managed Values

Inject RemainingSteps / IsLastStep into agents as computed, read-only state. Enables graceful degradation instead of hard recursion limit errors.

KPIs

Channel-based state operational with reducer tests
HITL interrupt/resume working end-to-end
Conformance suite passing for all providers and checkpointers
Task caching reducing redundant LLM calls by 30%+
Store abstraction with namespace-based cross-agent memory

Analysis Reference

The full LangGraph analysis is available in analysis/langgraph/:

Section	Files	Topics
Core Engine	00-09	StateGraph, channels, Pregel BSP, routing, functional API
Persistence	10-16	Checkpoint, serialization, SQLite/Postgres, Store, cache
Prebuilt & SDK	17-24	create_react_agent, ToolNode, HITL, SDK, auth, CLI
Insights	25-29	Internals, retry/errors, streaming, comparison, lessons

Key Findings​

Sprint 1: State & Caching ✅​

Channel-Based State​

Task-Level Caching​

Conformance Tests​

Sprint 2: HITL & Memory ✅​

Interrupt/Resume​

Store (Cross-Agent Memory)​

Skill Middleware​

Sprint 3: Persistence & Streaming​

Content-Addressed Blobs​

Managed Values​

KPIs​

Analysis Reference​