Solution Factory.Idea to vetted architecture.
An autonomous solution architect built on LangGraph. Hand it a plain-English description of a business process you want to automate. A panel of expert AI personas interrogates the problem, a corpus-grounded architect designs the solution, a second panel attacks that design until no blocking objection survives, and the output is a build-ready spec. Two human gates. Everything between them runs on its own.
Honest status: this is a local dev tool, built and verified end to end on real runs, not a deployed product. The pipeline through the approved build spec is functional. Autonomous implementation against a real repo (v2) is not built. The baseline run completed across two threads with three bugs fixed mid-run, so one clean single-thread run has not yet been demonstrated. On novel briefs the panel frequently caps and escalates to the human gate by design.
A LangGraph StateGraph of 14 nodes: Send fan-out for the parallel critic panel, interrupt() for the human pause points, a Postgres checkpointer for durable cross-process pause and resume, and a deterministic zero-blocker consensus rule. The architect that designs is structurally forbidden from grading its own work.
What the Solution Factory Is.
The Solution Factory turns a sentence about something you want to automate into a defensible architecture. You hand it a plain-English description of a business process. A panel of expert AI personas interrogates the problem the way a sharp consultant would, surfacing the requirements you did not think to name. You approve a refined proposal. A corpus-grounded architect then designs the solution, citing a curated library of agentic-AI patterns rather than inventing from thin air. A second panel attacks that design in parallel and in isolation, each persona hunting for the flaw that sinks it in month three. The architect revises until no blocking objection survives.
It is a ground-up rebuild on LangGraph and LangChain. The earlier Claude Code version is retired. The pipeline is a single LangGraph StateGraph: the critic panel fans out with Send, the human gates are durable interrupt() pauses backed by a Postgres checkpointer, and the whole revision loop is governed by a five-line router that does pure arithmetic over the findings. Consensus is not a feeling. It is a countable condition: zero remaining open BLOCKER findings. A panel that returns no findings is treated as a failure, not a win.
It is general-purpose, not tied to FORGE or any one stack. The knowledge corpus is a resource it cites, not a constraint it obeys. v1 stops at an approved build spec, a planning roadmap with per-phase verification criteria that an engineer can build from. I have driven the same brief through it three times, with different architect modes and panels, and watched it land on three different terminal paths. Every one of them ended safely at a human gate with an honest, reconciled finding count, never a false green.
One Graph. Two Gates.
The pipeline is a LangGraph StateGraph. Every phase writes a markdown artifact first, then emits structured rows into a Postgres ledger that is a rebuildable projection of the markdown. v1 stops at an approved build spec.
PANEL & INTAKE
A vague idea becomes a real problem statement.
The setup node reads your one sentence and chooses which experts belong in the room, a security reviewer, a performance reviewer, a systems architect, and so on, picked to fit the problem. It opens a ledger row to track the run. A discussion panel then reads the brief and generates the questions a good consultant would ask before quoting work: what volume, what defines success, what happens to failures, how fast it must run. It puts roughly a dozen of these to you and pauses at the first interrupt().
You answer in free text. From your answers the system writes a refined, sharpened statement of the actual problem and its constraints. Nothing expensive has happened yet. The Socratic front end is a deliberate brake on a model that would otherwise start generating immediately, because the most expensive mistake in any build is solving the wrong problem precisely.
- You read and approve the refined proposal before any design work starts
- You can approve, reject (the run ends), or send it back for another round of questions
- This gate sits before the expensive design work, so a misunderstanding costs a few intake calls, not a full siege
ARCHITECT DRAFT
A corpus-grounded architect designs the solution.
Once you approve, a single architect agent goes to work. It queries the curated agentic-knowledge corpus, a read-only vault of field-harvested atoms, pulling the patterns, failure modes, and reference designs relevant to your problem. It drafts an end-to-end schematic: components, data flow, infrastructure, cost notes, alternatives, and explicit gap flags where it is guessing. Every claim it can ground, it cites as a wikilink to a real atom. Where the corpus is silent, it says so and tags a web fallback.
Grounding is by-domain retrieval over the markdown corpus, not a vector store. The default path prefetches a fixed top-N of atoms, then designs. An opt-in agentic mode runs the architect as a create_react_agent that queries the corpus itself through tools, iteratively, then designs. On the verified runs the agentic architect made 30 to 32 live corpus tool calls and every citation in the draft resolved to a real atom. Critically, this architect never grades its own work.
ADVERSARIAL PANEL
The panel fans out and attacks.
The panel assembled at the start now fans out with a LangGraph Send: each persona reviews the schematic in parallel and in isolation, seeing only the design and its own lens, never the architect reasoning that produced it. Each returns findings classified BLOCKER (cannot ship as drawn) or WARNING (a risk worth recording). A security persona might flag an unauthenticated webhook as a BLOCKER; a performance persona might warn the classifier will miss the latency you specified. A deferred reduce node barriers on all critics before merging their findings.
The separation is the single most important quality mechanism. A design is assumed guilty until a hostile panel fails to convict it. Critics receive only the schematic and their lens as a self-contained payload, never the architect state, so the review cannot collude with the design. The rigid taxonomy is what makes the next phase countable: consensus is defined as zero remaining open blockers, which is only measurable because every finding carries a severity.
BOUNDED REVISION
Revise, re-attack, or escalate. Pure arithmetic.
After every critique round a router decides with pure Python. Zero open blockers means consensus, the design survived. Blockers that remain and are still shrinking under the iteration cap send the reviser back to address every open finding, and the panel re-attacks the new draft. A stall (blockers stopped strictly decreasing), the cap, or a panel that returned nothing routes to escalation. Zero findings from an adversarial panel is treated as a failure, a parser or dispatch bug, never as success.
The loop runs at most three real revision rounds. If it cannot converge, it pauses at a third interrupt() and tells you why: the reason, how many blockers remain, which iteration it died on. You choose to accept the residual risk, force one more round, or abandon. This is by design. Hard problems are supposed to surface a human, not get rubber-stamped. On genuinely novel briefs the panel frequently does not converge organically, so the human gate is load-bearing, not decorative.
- No findings this round: escalate (the panel said nothing, treat as a bug)
- Zero open blockers: consensus (the design survived)
- Iteration cap hit: escalate (backstop)
- Blockers not strictly decreasing: escalate (stall)
- Otherwise: revise (shrinking, under cap)
CONSENSUS & BUILD SPEC
Zero blockers becomes an executable plan.
When zero BLOCKERs remain, organically or by your accept-risk call, the system writes the consensus document: the surviving schematic, the full findings table with every objection and how it was resolved, and the open-blocker curve across rounds. A deterministic reconciliation guard then asserts that the blocker count in the consensus document equals the finding rows in the ledger, catching any silent finding loss between the markdown source of truth and its projection.
Consensus is translated into a planning roadmap: per-phase plan files, each with its own verification criteria. On the verified runs the build-spec tasks carried decision-ID provenance back to the consensus design, and the verification criteria were concrete and executable, specific inputs with asserted outputs, not generic boilerplate. This is the thing an engineer, or a future autonomous implementer, can actually build from.
- You read and approve a defensible, build-ready specification
- Autonomous implementation against a real repo (v2) is not built; the deliverable is the approved spec, full stop
- The spec is the last cheap place to catch a wrong turn before real code gets written
LangGraph for the Spine.
A ground-up rebuild on LangGraph and LangChain. The earlier Claude Code version is retired. This is the basis for any LangGraph or LangChain claim: substantial, load-bearing usage, not a token import.
The Design Principles.
A pipeline is just plumbing. What makes one trustworthy is the judgment behind it. These are the decisions that shaped the Solution Factory.
The architect cannot grade its own work.
The agent that designs is structurally forbidden from reviewing the design. Critics receive only the schematic and their own lens as a self-contained payload, in parallel and in isolation, never the architect state or reasoning. A design is assumed guilty until a hostile panel fails to convict it. Even on the agentic path the architect only reads the corpus; the tool-calling is grounding, not collusion.
Consensus is a count, not a feeling.
Consensus has exactly one definition: zero remaining open BLOCKER findings. Not a vibe that the panel feels good. A deterministic count, enforced by a five-line router and verified by a reconciliation guard that asserts the consensus blocker count equals the ledger rows. A panel that returns no findings is an alarm, not an all-clear, treated as a parser or dispatch bug until a human says otherwise.
Ground every claim, flag every gap.
The architect cites real atoms a practitioner shipped, written as wikilinks, or it openly tags what it cannot ground with a web fallback and a gap flag. Provenance is recorded so you can audit where a design decision came from. Grounding is by-domain retrieval over a curated markdown corpus, not a vector database. On the verified runs every citation in the draft resolved to a real atom.
Spend tokens where they change the outcome.
A run is dominated by panel volume, up to roughly a dozen cheap critics across up to three rounds. Those run on the flat-rate Claude Max subscription, so the high-cardinality cost is effectively fixed. The few calls that actually steer the design run on the metered native API, where the model-layer leverage is highest. On the baseline this was $0.71 real versus $2.97 if every call were metered, about 76 percent saved. It pairs directly with FORGE model routing as a documented through-line: spend the expensive dollar only where quality compounds.
Bound every loop.
The revision loop has a hard cap of three rounds, a strictly-decreasing stall check, and a zero-findings anomaly guard. The machine would rather stop and ask a human than loop forever or declare a false victory. On genuinely novel briefs it frequently caps and escalates by design, which means the human gate is load-bearing on hard problems, not decorative. Budget for being called in.
Markdown is the source of truth, the ledger is a projection.
Every artifact is written to disk first. Then structured rows are written to a 9-table Postgres ledger at each phase boundary. The ledger is rebuildable from the markdown and is never the only copy, so a failed database write never loses the real record. This mirrors FORGE own discipline: structured data for retrieval, but the working medium stays where the agents natively operate.
Interrogate before you design.
The first phase asks you questions instead of assuming answers, because the most expensive failure is solving the wrong problem well. The Socratic front end is a deliberate brake on a model tendency to start generating immediately. Catching a misframed problem at the first gate costs a conversation; catching it after the architecture is drawn costs the whole back half. Two human gates, and v1 stops at the approved spec, so the tool earns trust before it earns autonomy.
The Top of the Loop.
FORGE ships websites. The knowledge corpus accumulates the frontier of what is known about building agentic systems. The Solution Factory reasons over that corpus to turn knowledge into architecture.
The systems form a loop. FORGE ships websites and runs the daily work. A curated corpus accumulates the frontier of agentic-engineering patterns as typed, field-harvested atoms. The Solution Factory sits at the top: it reasons over that corpus to turn knowledge into a defensible, build-ready architecture, with the major objections already surfaced and resolved. It is general-purpose, so the same machinery designs whatever the problem and the evidence call for, infrastructure included.
The value is not a fixed answer. Across three real runs the same brief converged organically, hit the iteration cap, and stalled, and that non-determinism is inherent to LLM design plus LLM critique. The point is that every terminal path is safe: cap, stall, and organic all end at a human gate with an honest, reconciled finding count, never a false green. Where you already know exactly what to build, this is overhead; it is a design tool, not a code generator. Where the cost of a bad design is high enough to justify an adversarial review and two human checkpoints, it earns its keep.
There's More.
The Solution Factory is one of the AI systems on my resume page. FORGE (the centerpiece) and the Founders Circle council are from-scratch builds with full deep-dives of their own.