QA Orchestra
Written by Anass R. · Sr. QA Automation Engineer · DoQALand
How a live training session turned into an open-source multi-agent QA toolkit.
It started with a training problem
I run a free live training called the Diff-First Method. The idea is simple: teach QA engineers how to use AI to review diffs, map functional impact, and focus testing effort where it actually matters.
The training covers four pillars: multi-repo workspace setup, Chrome MCP browser automation, diff-based functional review, and AI-assisted test generation. Each session, I would walk attendees through the workflow live, showing how AI can read a release diff, cross-reference acceptance criteria, and produce a prioritized QA brief.
It worked. People got it. But then the same question kept coming back after every session:
"This is great. But how do I set this up on my own project without spending a week writing prompts?"
Fair question. The training teaches the methodology. But the methodology has moving parts: context files, prompt patterns, output formats, chaining logic. Every attendee was going home and rebuilding the same pieces from scratch. Some succeeded. Many got stuck at the gap between understanding the concept and having a working setup.
That gap is what QA Orchestra closes.
What QA Orchestra is
QA Orchestra is an open-source, multi-agent QA toolkit for Claude Code. It packages everything from the training into 10 specialized AI agents, each designed to answer a specific question about your pull requests and features.
No SaaS. No proprietary API keys. No new tool to learn. It runs inside Claude Code, the environment most attendees were already using during the training. You install it, point it at your project, and start asking questions.
Each agent reads your project context, analyzes the relevant data, and writes structured Markdown output. The next agent in the chain picks up that file automatically. No copy-pasting between tools.
From training to toolkit: the architecture
The training teaches a mindset: give AI the full context of your project and let it do the analysis work while you make the decisions. QA Orchestra encodes that mindset into two layers.
The expertise layer
Ten agents, each with deep domain knowledge. They are just Markdown files in .claude/agents/. No SDK, no build step. Each file defines a role, instructions, input expectations, and output format. If you attended the training, you will recognize the prompt patterns.
The data layer
MCP servers that fetch real project data: GitHub or GitLab diffs, Chrome browser sessions, Jira tickets. The same integrations we set up live during Module 1 of the training.
The bridge between them
A single file called CONTEXT.md that describes your entire stack: application URLs, repository paths, environment setup steps, health check signals, testing frameworks, severity definitions. Every agent reads it. Your QA lead updates terminology or AC formats in one place, and all 10 agents adjust their behavior.
# Project Context Application URL: http://localhost:3000 Frontend repo: ./frontend Backend repo: ./backend Start command: docker compose up -d Health check: curl http://localhost:3000/health Test framework: Playwright Bug severity: Critical / Major / Minor / Cosmetic
This is the "context as code" approach from the training. Instead of every prompt containing project-specific details, the context lives in one file and agents reference it at runtime.
What happens when you run it
Here is the simplest workflow. You have a pull request. You want to know if it matches the acceptance criteria.
One command. The agent reads the diff, compares it against your acceptance criteria, identifies gaps, and writes a structured review. You read the output and decide: ship, test more, or block.
Now here is the full pipeline, the one we demonstrate in Module 4 of the training:
# 1. Set up the environment environment-manager → checkout branch, start app, health check # 2. Analyze in parallel functional-reviewer → AC compliance report test-scenario-designer → test scenarios # 3. Validate in a real browser browser-validator → reads scenarios, clicks through the app # 4. Report (if gaps found) bug-reporter → structured bug reports automation-writer → runnable Playwright tests
Each agent reads from qa-output/ and writes to qa-output/. The orchestrator knows which agents can run in parallel. Missing tools don't break the pipeline; agents note the gap and skip gracefully.
The part the training could not teach
The training teaches you to think in diffs, to give AI the right context, to chain prompts into a workflow. But there is one thing a live session cannot give you: live browser validation.
In the training, we demonstrate Chrome MCP controlling a browser. People get excited. But in practice, setting up a consistent browser validation pipeline that chains with diff analysis is hard. The environment needs to be running, the scenarios need to be structured in a way the browser agent can execute, and the findings need to feed back into the functional review.
QA Orchestra solves this chain. The environment-manager starts your app. The test-scenario-designer produces structured scenarios. The browser-validator reads those scenarios and walks through the app step by step using Chrome DevTools MCP: clicking buttons, filling forms, checking that text appears, screenshotting failures.
The browser is ground truth. A diff might look correct but the API call might fail, the UI might not update, or there might be a race condition. Browser validation turns theoretical risks into confirmed findings. Bug reports based on observed failures, not guesses.
Agents are just Markdown
This is the part that surprises people. There is no framework. No SDK to learn. No registration step. Each agent is a Markdown file with YAML frontmatter:
--- name: functional-reviewer description: Compares code diff against acceptance criteria model: opus tools: Read, Glob, Grep, Bash, Agent --- # Role You are a senior QA engineer reviewing a pull request. Compare the diff against the acceptance criteria. Identify functional gaps, regression risks, missing edge cases... # Output Write your review to qa-output/functional-review.md
Want a compliance reviewer? A performance reviewer? An accessibility auditor? Copy the closest existing agent, edit the Role section, and it works. The training teaches the prompt patterns. QA Orchestra gives you the starting files.
What it does not do
QA Orchestra is scoped to functional correctness against acceptance criteria. That focus is intentional.
It does not do code quality review, linting, or formatting feedback. It does not do security scanning. It does not do performance profiling. It does not generate unit tests. It does not do static type checking.
There are good tools for all of those things. QA Orchestra fills the gap that none of them cover: does this feature actually work the way the requirements say it should?
Getting started
Three options, depending on how you work:
- Plugin install: install directly into Claude Code. Check the documentation for setup instructions. Agents load automatically.
- Global agents: clone the repo and copy agent files to
~/.claude/agents/. Available in every project. - Workspace clone: clone into your project workspace. Fill in
context/CONTEXT.md. Agents auto-load.
Then pick your use case:
- Review a PR for AC compliance →
@functional-reviewer - Generate test scenarios from a ticket →
@test-scenario-designer - Find which tests a diff affects →
@smart-test-selector - Validate in a real browser →
@environment-managerthen@browser-validator - Turn findings into bug reports →
@bug-reporter
The README has the full recipe list and a completed example context for an e-commerce store.
From methodology to muscle memory
The training teaches the thinking. QA Orchestra encodes it.
Every concept from the sessions maps directly to a component in the toolkit: the multi-repo workspace is the CONTEXT.md file. The Chrome MCP demo is the browser-validator agent. The diff-first review is the functional-reviewer. The prompt patterns are the agent definitions.
If you attended the training, you already know why each piece exists. QA Orchestra means you don't have to rebuild them. If you haven't attended, the agents still work. But the training helps you understand when to trust the output and when to push back. That judgment is the part AI can't automate.
AI handles the analysis. You handle the judgment. That is QA orchestration.
Open source. 10 agents. Works with any stack. Install it in Claude Code and run your first functional review in under 5 minutes.
Documentation → View on GitHub → Join the training →