Skip to main content

v1.1 — LangGraph-Inspired Improvements

Goal: Adopt key patterns from LangGraph analysis to harden the orchestrator before scaling.

Source: Deep analysis of langchain-ai/langgraph — 30 markdown files covering core engine, checkpoint system, prebuilt agents, SDK, CLI, and internals.

Analysis files: analysis/langgraph/

Key Findings

What LangGraph Does BetterWhat We Do Better
Channel-based state with typed reducersTrue provider-agnostic ABC (swap Claude/GPT/Gemini/local)
First-class interrupt/resume (HITL)Cost-aware routing (6 strategies)
Content-addressed checkpoint blobsAgent cooperation protocols (delegation, conflict resolution)
7 stream modes with SSE reconnectionBudget enforcement (per task/session/day)
Task-level result cachingProvider health monitoring with auto-failover
Conformance test suite for checkpointers

Full comparison: analysis/langgraph/28-comparison.md


Sprint 1: State & Caching ✅

TaskInspired ByStatus
Channel-based state with reducers03-channelscore/channels.py
Task-level result caching15-cachecore/cache.py
Conformance test suite16-conformance-testscore/conformance.py

Channel-Based State

Each state field maps to a typed channel with explicit concurrency semantics:

  • LastValue — single writer per step (error on conflict)
  • BinaryOperatorAggregate — fold concurrent writes via reducer (e.g., operator.add)
  • Topic — append all writes (pubsub)

This solves concurrent agent writes to shared state — our current biggest gap.

Task-Level Caching

Cache skill/node results by input hash. CachePolicy per skill. InMemory backend first, Redis later. Skip re-execution on cache hit. Expected to reduce redundant LLM calls by 30%+.

Conformance Tests

Capability-based test harness for Provider and Checkpoint interfaces. Any new implementation runs against it automatically. LangGraph's suite covers 47+ test cases across 8 capabilities.


Sprint 2: HITL & Memory ✅

TaskInspired ByStatus
Interrupt/resume (HITL)19-human-in-the-loopcore/graph.py
Store abstraction14-storecore/store.py
Skill middleware pattern18-tool-nodecore/skill.py

Interrupt/Resume

GraphInterrupt pauses graph execution, persists state to checkpoint. Resume with resume_from checkpoint ID + human_input dict merged into state. Interrupt is control flow, not an error.

Supports: HUMAN_INPUT, APPROVAL, CUSTOM interrupt types. Works in both single-node and parallel execution.

Store (Cross-Agent Memory)

BaseStore with namespace-based hierarchy (("users", "alice")), aget/aput/adelete/asearch/alist_namespaces. Filter operators: $eq, $ne, $gt, $gte, $lt, $lte. TTL support with lazy expiration.

InMemoryStore implementation with 13 conformance tests. Use cases: user profiles, shared knowledge base, agent learning.

Skill Middleware

SkillMiddleware(request, next_fn) -> result pattern with immutable SkillRequest and override() for non-destructive modification. Middlewares compose in registration order (first = outermost).

Built-in middlewares: logging_middleware, retry_middleware, timeout_middleware.


Sprint 3: Persistence & Streaming

TaskInspired ByPriority
Content-addressed checkpoint blobs13-checkpoint-postgresMedium
Anti-stall via managed values09-managed-valuesMedium
Encrypted serialization11-checkpoint-serializationLow
SSE streaming improvements27-streamingLow

Content-Addressed Blobs

Split complex checkpoint values into a checkpoint_blobs table keyed by (thread, ns, channel, version). Same blob shared across checkpoints via ON CONFLICT DO NOTHING. Massive storage savings for long-running agents.

Managed Values

Inject RemainingSteps / IsLastStep into agents as computed, read-only state. Enables graceful degradation instead of hard recursion limit errors.


KPIs

  • Channel-based state operational with reducer tests
  • HITL interrupt/resume working end-to-end
  • Conformance suite passing for all providers and checkpointers
  • Task caching reducing redundant LLM calls by 30%+
  • Store abstraction with namespace-based cross-agent memory

Analysis Reference

The full LangGraph analysis is available in analysis/langgraph/:

SectionFilesTopics
Core Engine00-09StateGraph, channels, Pregel BSP, routing, functional API
Persistence10-16Checkpoint, serialization, SQLite/Postgres, Store, cache
Prebuilt & SDK17-24create_react_agent, ToolNode, HITL, SDK, auth, CLI
Insights25-29Internals, retry/errors, streaming, comparison, lessons