I’ve been struggling with keeping our Playwright tests maintainable as the app evolves. Every time the design team tweaks the UI, selectors break and we’re stuck rewriting half the test suite. Started experimenting with describing what I want to test in plain language instead of hardcoding selectors everywhere.
The idea is simple: write something like “verify user can log in with email and password, then check the dashboard loads” and have it translate into actual working Playwright code. Sounds like magic, but I wanted to see if it actually holds up when CSS classes change or elements get reorganized.
Turned out when the AI generates the workflow from that plain description, it seems to understand the intent better than just picking random selectors. Instead of breaking on every style update, it adapts to find elements by role, text content, or other attributes that tend to survive design refreshes.
I’ve tested this with login flows, form submissions, and navigation patterns. The generated code is readable, not some garbage unstructured mess. What’s more, when I need to tweak it, the generated workflow gives me a solid foundation instead of starting from scratch.
Has anyone else tried this approach? Are you finding that AI-generated Playwright workflows actually stay stable across UI changes, or does it still fall apart after a couple of weeks?