I’m tired of Claude saying it fixed something when it didn’t actually work. You probably know this frustration too. Claude says the task is done but when you check the app, nothing changed. Then you have to send screenshots and explain what went wrong.
My Solution: Auto-Testing with Playwright
I built a system that makes Claude check its own work automatically. Here’s how it works:
- Claude finishes a task and triggers a validation script
- Playwright opens the browser and visits your pages
- It takes screenshots and checks for errors
- Claude looks at the results and fixes issues if needed
Setup Steps:
Install Playwright first:
npm install @playwright/test
npx playwright install
Create hook config (.claude/settings.json):
{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "node validation/check-work.js"
}
]
}
]
}
}
The validation script:
const { chromium } = require('@playwright/test');
const fs = require('fs');
const path = require('path');
const SETTINGS = {
serverUrl: 'http://localhost:3000',
outputDir: './test-results',
testPages: [
{
url: '/',
title: 'home',
checkElements: ['h1', 'nav', 'main']
},
{
url: '/dashboard',
title: 'dashboard',
checkElements: ['header', '.sidebar', '.content']
}
]
};
async function testPage(browser, pageInfo) {
const testResult = {
pageName: pageInfo.title,
passed: true,
issues: [],
responseTime: 0
};
console.log(`Testing ${pageInfo.title} page...`);
const browserPage = await browser.newPage();
const errorLogs = [];
browserPage.on('console', message => {
if (message.type() === 'error') {
errorLogs.push(message.text());
}
});
try {
const startTime = Date.now();
await browserPage.goto(`${SETTINGS.serverUrl}${pageInfo.url}`, {
waitUntil: 'domcontentloaded',
timeout: 8000
});
testResult.responseTime = Date.now() - startTime;
if (!fs.existsSync(SETTINGS.outputDir)) {
fs.mkdirSync(SETTINGS.outputDir, { recursive: true });
}
await browserPage.screenshot({
path: path.join(SETTINGS.outputDir, `${pageInfo.title}-screenshot.png`),
fullPage: true
});
for (const element of pageInfo.checkElements) {
try {
await browserPage.waitForSelector(element, { timeout: 2000 });
console.log(`Found element: ${element}`);
} catch (err) {
testResult.issues.push(`Element not found: ${element}`);
testResult.passed = false;
console.log(`Missing element: ${element}`);
}
}
if (errorLogs.length > 0) {
testResult.issues.push(...errorLogs.map(log => `Browser error: ${log}`));
testResult.passed = false;
}
} catch (err) {
testResult.issues.push(`Page load failed: ${err.message}`);
testResult.passed = false;
}
await browserPage.close();
return testResult;
}
async function runTests() {
console.log('Starting automated validation...');
const browser = await chromium.launch({ headless: true });
const testResults = [];
for (const pageConfig of SETTINGS.testPages) {
const result = await testPage(browser, pageConfig);
testResults.push(result);
}
await browser.close();
const successCount = testResults.filter(r => r.passed).length;
const totalCount = testResults.length;
console.log(`Test Results: ${successCount}/${totalCount} pages passed`);
if (successCount !== totalCount) {
console.log('Found issues:');
testResults.forEach(result => {
if (!result.passed) {
console.log(`${result.pageName}: ${result.issues.join(', ')}`);
}
});
}
process.exit(0);
}
runTests().catch(console.error);
Add to claude.md file:
## Task Validation Requirements
For any code changes, always:
1. Run: `node validation/check-work.js`
2. Review screenshots in test-results folder
3. Fix any issues found before marking task complete
This has saved me tons of time. Claude now catches its own mistakes before I have to point them out. What kind of validation checks would help with your projects?