Former AI doubter tried building a complete agent-based development system - here's what I learned

TL;DR

Setting up an agent workflow feels like coding with unnecessary complexity. Not really worth it if you’re going full prompt engineering.

Background

My workplace doesn’t have AI mandates or licenses. We can’t use copilot or put company code into ChatGPT. I mainly use ChatGPT as enhanced search and for AWS template generation.

Because of FOMO, I decided to test a complete agent workflow on a personal project. Main goal was figuring out if this would actually make me faster or more productive.

The Project

  • Swift iOS app with SwiftUI (never did mobile dev before)
  • Python backend using Flask/FastAPI
  • CI/CD with GitHub Actions, Docker, bash scripts
  • Deployed on DigitalOcean server

Agent Setup

  • Cursor IDE (normally use basic text editor)
  • ChatGPT Plus subscription + API integration
  • Budget focused approach

My Workflow Structure

I organized everything into 4 main folders:

  • templates/ - reusable prompts like auth-login-handler.md
  • examples/ - sample functions, tests, validation patterns in my coding style
  • schemas/ - API contracts, data models, business rules
  • history/ - tracking log of agent modifications

ChatGPT suggested this structure after some discussion.

What Worked Well

  • Cursor can reference documentation directly. You can ask “based on @framework-docs, what does this function return?”
  • Generates massive amounts of code super fast
  • Works decently when you accept 80-90% quality output
  • Excellent for reviewing unfamiliar code (Swift was new to me)
  • Great at answering “does this follow best practices per @language-docs
  • Helpful for translation like “convert this Python logic to Swift”

What Didn’t Work

  • Creating detailed schemas and prompts means you’ve already done the hard thinking
  • System design, architecture, API structure still requires human brain power
  • Huge review overhead since I didn’t write the generated code
  • Need human involvement for either writing tests OR writing code (not both)
  • Ignores style guides and does whatever it wants anyway
  • Regenerates entire files instead of targeted changes, breaking working code

Key Realization

Writing code isn’t actually my bottleneck. This became super obvious when drowning in tons of generated code that wasn’t particularly useful.

Better Use Cases

  • Library/language questions
  • Specific isolated tasks like “create CloudFormation lambda function”
  • Code review for unfamiliar languages
  • System design feedback

Example Template

Using conventions from `templates/code_style.md`
Following patterns in:
- `examples/validation_schema.py` for validation setup
- `examples/api_endpoint.py` for route structure

Build a Flask endpoint for login at `/authenticate`

### Specs:
**Validation:**
- Use validation schema matching `schemas/login_contract.json`

**Endpoint:**
- `/authenticate` POST only
- Validate request with schema
- Success: use `db.commit()`, return 200 with token
- User not found: raise `AuthenticationError`, return 401
- Add `@swagger` decorator for docs
- No session commit on validation failures

### Style Requirements:
- Match reference file patterns exactly
- Keep code clean per style guide

## Logging:
Add entry to `history/changes.md` with today's date summarizing additions

Schema Example

{
    "title": "LoginCredentials",
    "type": "object",
    "properties": {
        "email": {
            "type": "string",
            "format": "email"
        },
        "password": {
            "type": "string",
            "minLength": 6
        }
    },
    "required": ["email", "password"],
    "additionalProperties": false
}

Honestly felt more frustrating than helpful. When you need pseudo-code level detail in prompts plus extensive review time, it seems like extra work rather than a productivity boost.

I get what you mean! Tried streamlining my workflow but ended up with a mess of generated code to sort through. Quick searches and simple snippets worked way better than that whole setup.

Yeah, that schema-heavy approach hits home. I went through the same thing moving from traditional dev to AI workflows - ended up writing specs that were basically as detailed as the final code. Game changer for me was treating AI like a junior dev, not some magic wand. I ditched the elaborate prompt structures and started using it for small improvements on code I’d already written. With new frameworks, I’ll bang out a basic implementation first, then ask for optimization tips or best practices. That history folder sounds useful. I keep something way simpler - just a running log of what worked and what bombed. Saves me from making the same prompt mistakes twice. You’re dead right about the review overhead. AI code looks solid at first glance but has these sneaky issues that only pop up during testing. All that time you saved upfront gets burned debugging edge cases that wouldn’t exist if you’d just written it yourself.

Same here, but I went the other way. Started with basic ChatGPT stuff and kept adding layers until I hit that wall too. The moment I realized I’d spent three hours perfecting a prompt for something I could code in forty minutes - that’s when it clicked. The worst part? You lose your gut instinct about whether the code is actually good or trash when it spits out a hundred lines instantly. When I write it myself, I know exactly where the sketchy parts are. AI works great for prototyping or when you’re learning something new, but for production code you’ll need to maintain and debug later? Not worth the headache. I kept it around for documentation help and dropped the rest. Way less frustrating and honestly more productive.

Yeah, I’ve seen this exact thing at my company when teams jump into AI workflows. Everyone underestimates the setup overhead.

Start way smaller. I just keep a simple prompts.txt file with 5-6 templates I actually use. If you’re maintaining more prompt templates than code files, you’ve gone too far.

For mobile dev, AI’s better at explaining platform quirks than generating whole components. “Why doesn’t this SwiftUI view update” works way better than “build me a login screen.”

The review overhead kills it though. I spend more time reading generated code than I would’ve spent writing it myself. Only worth it when I’m learning a new language or framework.

My approach now is boring but actually works. I use AI for research, debugging weird errors, and “how do I do X in language Y” questions. The real productivity boost comes from getting unstuck faster, not generating tons of code.