v1.1 — LangGraph-Inspired Improvements
Goal: Adopt key patterns from LangGraph analysis to harden the orchestrator before scaling.
Source: Deep analysis of langchain-ai/langgraph — 30 markdown files covering core engine, checkpoint system, prebuilt agents, SDK, CLI, and internals.
Analysis files: analysis/langgraph/
Key Findings
| What LangGraph Does Better | What We Do Better |
|---|---|
| Channel-based state with typed reducers | True provider-agnostic ABC (swap Claude/GPT/Gemini/local) |
| First-class interrupt/resume (HITL) | Cost-aware routing (6 strategies) |
| Content-addressed checkpoint blobs | Agent cooperation protocols (delegation, conflict resolution) |
| 7 stream modes with SSE reconnection | Budget enforcement (per task/session/day) |
| Task-level result caching | Provider health monitoring with auto-failover |
| Conformance test suite for checkpointers |
Full comparison: analysis/langgraph/28-comparison.md
Sprint 1: State & Caching ✅
| Task | Inspired By | Status |
|---|---|---|
| Channel-based state with reducers | 03-channels | ✅ core/channels.py |
| Task-level result caching | 15-cache | ✅ core/cache.py |
| Conformance test suite | 16-conformance-tests | ✅ core/conformance.py |
Channel-Based State
Each state field maps to a typed channel with explicit concurrency semantics:
LastValue— single writer per step (error on conflict)BinaryOperatorAggregate— fold concurrent writes via reducer (e.g.,operator.add)Topic— append all writes (pubsub)
This solves concurrent agent writes to shared state — our current biggest gap.
Task-Level Caching
Cache skill/node results by input hash. CachePolicy per skill. InMemory backend first, Redis later. Skip re-execution on cache hit. Expected to reduce redundant LLM calls by 30%+.
Conformance Tests
Capability-based test harness for Provider and Checkpoint interfaces. Any new implementation runs against it automatically. LangGraph's suite covers 47+ test cases across 8 capabilities.
Sprint 2: HITL & Memory ✅
| Task | Inspired By | Status |
|---|---|---|
| Interrupt/resume (HITL) | 19-human-in-the-loop | ✅ core/graph.py |
| Store abstraction | 14-store | ✅ core/store.py |
| Skill middleware pattern | 18-tool-node | ✅ core/skill.py |
Interrupt/Resume
GraphInterrupt pauses graph execution, persists state to checkpoint. Resume with resume_from checkpoint ID + human_input dict merged into state. Interrupt is control flow, not an error.
Supports: HUMAN_INPUT, APPROVAL, CUSTOM interrupt types. Works in both single-node and parallel execution.
Store (Cross-Agent Memory)
BaseStore with namespace-based hierarchy (("users", "alice")), aget/aput/adelete/asearch/alist_namespaces. Filter operators: $eq, $ne, $gt, $gte, $lt, $lte. TTL support with lazy expiration.
InMemoryStore implementation with 13 conformance tests. Use cases: user profiles, shared knowledge base, agent learning.
Skill Middleware
SkillMiddleware(request, next_fn) -> result pattern with immutable SkillRequest and override() for non-destructive modification. Middlewares compose in registration order (first = outermost).
Built-in middlewares: logging_middleware, retry_middleware, timeout_middleware.
Sprint 3: Persistence & Streaming
| Task | Inspired By | Priority |
|---|---|---|
| Content-addressed checkpoint blobs | 13-checkpoint-postgres | Medium |
| Anti-stall via managed values | 09-managed-values | Medium |
| Encrypted serialization | 11-checkpoint-serialization | Low |
| SSE streaming improvements | 27-streaming | Low |
Content-Addressed Blobs
Split complex checkpoint values into a checkpoint_blobs table keyed by (thread, ns, channel, version). Same blob shared across checkpoints via ON CONFLICT DO NOTHING. Massive storage savings for long-running agents.
Managed Values
Inject RemainingSteps / IsLastStep into agents as computed, read-only state. Enables graceful degradation instead of hard recursion limit errors.
KPIs
- Channel-based state operational with reducer tests
- HITL interrupt/resume working end-to-end
- Conformance suite passing for all providers and checkpointers
- Task caching reducing redundant LLM calls by 30%+
- Store abstraction with namespace-based cross-agent memory
Analysis Reference
The full LangGraph analysis is available in analysis/langgraph/:
| Section | Files | Topics |
|---|---|---|
| Core Engine | 00-09 | StateGraph, channels, Pregel BSP, routing, functional API |
| Persistence | 10-16 | Checkpoint, serialization, SQLite/Postgres, Store, cache |
| Prebuilt & SDK | 17-24 | create_react_agent, ToolNode, HITL, SDK, auth, CLI |
| Insights | 25-29 | Internals, retry/errors, streaming, comparison, lessons |