A new MCP server proposal introduces AI Watch Tester (AWT) — a framework for AI-powered end-to-end testing of web applications and agent workflows. First testing-focused community server submission.
As AI agents become more capable of operating autonomously — browsing the web, executing code, interacting with APIs — testing these workflows becomes increasingly critical. Traditional E2E testing tools like Playwright or Cypress were designed for deterministic human-authored test cases. Agent workflows are probabilistic and context-dependent.
AWT proposes to solve this by using AI itself to generate, execute, and validate test scenarios. The key insight: if an AI agent can perform a task, another AI (or the same one with different instructions) can verify the task was completed correctly.
The AI testing market is projected to grow from $0.5B in 2025 to $4.2B by 2030 (Allied Market Research). Early tooling in this space will have significant influence on how agents are validated in production.
PR #3766 proposes adding a new MCP server with the following capabilities:
The server exposes MCP tools for:
{
"tools": [
"awt_record_flow",
"awt_generate_test",
"awt_run_test",
"awt_validate_outcome",
"awt_compare_baseline",
"awt_heal_selectors"
]
}
AWT sits between the orchestrating AI and a browser automation layer (Playwright or Puppeteer). During recording, it captures:
During playback, the AI can use semantic descriptions rather than brittle CSS selectors. If a button's class changes from .btn-primary to .action-button, AWT uses the semantic description ("the blue Submit button in the checkout form") to locate it dynamically.
Unlike traditional E2E frameworks that fail immediately on selector changes, AWT's AI-powered healing can maintain test stability across UI refactors — reducing the maintenance burden that makes E2E testing costly at scale.
AWT is designed to work alongside other MCP servers:
This composability means AWT doesn't need to reinvent browser control — it focuses on the testing intelligence layer while delegating execution to existing, battle-tested servers.
Agent Workflow Validation: Before deploying an agent that books flights, run AWT to verify the agent correctly handles edge cases (no flights available, session timeout, payment failure).
UI Regression Testing: Capture baseline recordings of critical user flows. AWT alerts when visual or behavioral regressions occur.
Compliance Verification: For regulated industries, AWT can validate that agent actions remain within defined boundaries — useful for auditing AI systems that handle financial transactions or personal data.
AWT addresses a real gap in the MCP ecosystem. As agents move from demos to production, testing infrastructure becomes essential. This proposal is the first serious attempt to bring AI-native testing to MCP.