Going from description to working automation—how often do you actually get it right on the first try?

I’ve been curious about the AI Copilot workflow generation feature. The idea is you describe what you need in plain English, and the AI generates a ready-to-run workflow. For something like testing a checkout flow, that sounds amazing in principle.

I tried it. Described the task: “Test a checkout flow by navigating to the product page, adding items to cart, going to checkout, filling in payment info, and verifying the order confirmation.”

The AI generated a workflow that was… honestly, pretty close. It had the right steps in the right order. But there were issues. It missed some specifics about how to handle certain page interactions. The form filling had generic field selectors that didn’t match my site. The validation logic at the end was checking for the wrong confirmation message.

So I’m wondering: is this feature meant to give you a starting point that you then refine, or should a well-described task actually work without tweaking? Because if it’s the former, I’m not sure how much time it actually saves versus just building it from scratch. What’s been your experience?

This is exactly how it works in practice. The AI is good at understanding the sequence of actions and setting up the basic flow, but it can’t possibly know all the specific details of your site—field names, selectors, exact text on buttons, validation messages. Those details are always going to need your input.

What I’ve found is that the AI-generated workflow cuts out about 40-50% of the manual work for typical browser tasks. You don’t have to structure the whole thing; you just have to fill in the details. For testing a checkout flow specifically, the AI gets the happy path right most of the time, which is the hardest part to structure correctly. Then you add the edge cases and site-specific tweaks.

So yeah, it’s a starting point, not a complete solution. But it’s a really good starting point that saves actual time.

Never first try. Always need tweaks for site-specific details. Selectors, field IDs, exact text—the AI can’t know those. But the structure and logic are usually spot on, so you’re refining, not rebuilding. Saves time.

The sweet spot I’ve found is that the AI handles structural complexity well but struggles with specificity. For a checkout flow, it understands the flow itself—navigate, add items, fill form, confirm—but gets lost on details like which API calls are being made behind the scenes, how form validation errors are handled, and what the exact selectors should be. The description needs to be fairly detailed for better results. “Fill in payment info” is vague; “fill email field with [email protected], card number with 4111111111111111” gives better output. It’s a tool for acceleration, not elimination of work.