QA Orchestra

Written by Anass R. · Sr. QA Automation Engineer · DoQALand

How a live training session turned into an open-source multi-agent QA toolkit.

It started with a training problem

I run a free live training called the Diff-First Method. The idea is simple: teach QA engineers how to use AI to review diffs, map functional impact, and focus testing effort where it actually matters.

The training covers four pillars: multi-repo workspace setup, Chrome MCP browser automation, diff-based functional review, and AI-assisted test generation. Each session, I would walk attendees through the workflow live, showing how AI can read a release diff, cross-reference acceptance criteria, and produce a prioritized QA brief.

It worked. People got it. But then the same question kept coming back after every session:

"This is great. But how do I set this up on my own project without spending a week writing prompts?"

Fair question. The training teaches the methodology. But the methodology has moving parts: context files, prompt patterns, output formats, chaining logic. Every attendee was going home and rebuilding the same pieces from scratch. Some succeeded. Many got stuck at the gap between understanding the concept and having a working setup.

That gap is what QA Orchestra closes.

What QA Orchestra is

QA Orchestra is an open-source, multi-agent QA toolkit for Claude Code. It packages everything from the training into 10 specialized AI agents, each designed to answer a specific question about your pull requests and features.

No SaaS. No proprietary API keys. No new tool to learn. It runs inside Claude Code, the environment most attendees were already using during the training. You install it, point it at your project, and start asking questions.

functional-reviewer Does this diff implement the acceptance criteria? Where are the gaps?
test-scenario-designer What test scenarios cover this feature? Happy path, negative, boundary, edge cases.
browser-validator Navigate the running app, execute scenarios, verify results in a real browser.
smart-test-selector Which existing tests does this diff affect? What might break?
bug-reporter Turn findings into structured, developer-ready bug reports.
environment-manager Check out the PR branch, start the app, verify health before testing.
automation-writer Convert test scenarios into runnable Playwright, Cypress, or Gherkin code.
release-analyzer Multi-repo release diffs, cross-repo impact, deployment risks.
orchestrator Read a ticket, decide which agents to run and in what order.
manual-validator Guide manual test execution scenario by scenario, produce validation reports.

Each agent reads your project context, analyzes the relevant data, and writes structured Markdown output. The next agent in the chain picks up that file automatically. No copy-pasting between tools.

From training to toolkit: the architecture

The training teaches a mindset: give AI the full context of your project and let it do the analysis work while you make the decisions. QA Orchestra encodes that mindset into two layers.

The expertise layer

Ten agents, each with deep domain knowledge. They are just Markdown files in .claude/agents/. No SDK, no build step. Each file defines a role, instructions, input expectations, and output format. If you attended the training, you will recognize the prompt patterns.

The data layer

MCP servers that fetch real project data: GitHub or GitLab diffs, Chrome browser sessions, Jira tickets. The same integrations we set up live during Module 1 of the training.

The bridge between them

A single file called CONTEXT.md that describes your entire stack: application URLs, repository paths, environment setup steps, health check signals, testing frameworks, severity definitions. Every agent reads it. Your QA lead updates terminology or AC formats in one place, and all 10 agents adjust their behavior.

context/CONTEXT.md
# Project Context

Application URL:  http://localhost:3000
Frontend repo:   ./frontend
Backend repo:    ./backend
Start command:   docker compose up -d
Health check:    curl http://localhost:3000/health
Test framework:  Playwright
Bug severity:    Critical / Major / Minor / Cosmetic

This is the "context as code" approach from the training. Instead of every prompt containing project-specific details, the context lives in one file and agents reference it at runtime.

What happens when you run it

Here is the simplest workflow. You have a pull request. You want to know if it matches the acceptance criteria.

You: @functional-reviewer Agent reads diff + ACs qa-output/functional-review.md

One command. The agent reads the diff, compares it against your acceptance criteria, identifies gaps, and writes a structured review. You read the output and decide: ship, test more, or block.

Now here is the full pipeline, the one we demonstrate in Module 4 of the training:

Full multi-agent pipeline
# 1. Set up the environment
environment-manager → checkout branch, start app, health check

# 2. Analyze in parallel
functional-reviewer    → AC compliance report
test-scenario-designer → test scenarios

# 3. Validate in a real browser
browser-validator → reads scenarios, clicks through the app

# 4. Report (if gaps found)
bug-reporter      → structured bug reports
automation-writer → runnable Playwright tests

Each agent reads from qa-output/ and writes to qa-output/. The orchestrator knows which agents can run in parallel. Missing tools don't break the pipeline; agents note the gap and skip gracefully.

The part the training could not teach

The training teaches you to think in diffs, to give AI the right context, to chain prompts into a workflow. But there is one thing a live session cannot give you: live browser validation.

In the training, we demonstrate Chrome MCP controlling a browser. People get excited. But in practice, setting up a consistent browser validation pipeline that chains with diff analysis is hard. The environment needs to be running, the scenarios need to be structured in a way the browser agent can execute, and the findings need to feed back into the functional review.

QA Orchestra solves this chain. The environment-manager starts your app. The test-scenario-designer produces structured scenarios. The browser-validator reads those scenarios and walks through the app step by step using Chrome DevTools MCP: clicking buttons, filling forms, checking that text appears, screenshotting failures.

The browser is ground truth. A diff might look correct but the API call might fail, the UI might not update, or there might be a race condition. Browser validation turns theoretical risks into confirmed findings. Bug reports based on observed failures, not guesses.

Agents are just Markdown

This is the part that surprises people. There is no framework. No SDK to learn. No registration step. Each agent is a Markdown file with YAML frontmatter:

.claude/agents/functional-reviewer.md
---
name: functional-reviewer
description: Compares code diff against acceptance criteria
model: opus
tools: Read, Glob, Grep, Bash, Agent
---

# Role
You are a senior QA engineer reviewing a pull request.
Compare the diff against the acceptance criteria.
Identify functional gaps, regression risks, missing
edge cases...

# Output
Write your review to qa-output/functional-review.md

Want a compliance reviewer? A performance reviewer? An accessibility auditor? Copy the closest existing agent, edit the Role section, and it works. The training teaches the prompt patterns. QA Orchestra gives you the starting files.

What it does not do

QA Orchestra is scoped to functional correctness against acceptance criteria. That focus is intentional.

It does not do code quality review, linting, or formatting feedback. It does not do security scanning. It does not do performance profiling. It does not generate unit tests. It does not do static type checking.

There are good tools for all of those things. QA Orchestra fills the gap that none of them cover: does this feature actually work the way the requirements say it should?

Getting started

Three options, depending on how you work:

  1. Plugin install: install directly into Claude Code. Check the documentation for setup instructions. Agents load automatically.
  2. Global agents: clone the repo and copy agent files to ~/.claude/agents/. Available in every project.
  3. Workspace clone: clone into your project workspace. Fill in context/CONTEXT.md. Agents auto-load.

Then pick your use case:

The README has the full recipe list and a completed example context for an e-commerce store.

From methodology to muscle memory

The training teaches the thinking. QA Orchestra encodes it.

Every concept from the sessions maps directly to a component in the toolkit: the multi-repo workspace is the CONTEXT.md file. The Chrome MCP demo is the browser-validator agent. The diff-first review is the functional-reviewer. The prompt patterns are the agent definitions.

If you attended the training, you already know why each piece exists. QA Orchestra means you don't have to rebuild them. If you haven't attended, the agents still work. But the training helps you understand when to trust the output and when to push back. That judgment is the part AI can't automate.

AI handles the analysis. You handle the judgment. That is QA orchestration.

Try QA Orchestra

Open source. 10 agents. Works with any stack. Install it in Claude Code and run your first functional review in under 5 minutes.

Documentation → View on GitHub → Join the training →