Can ai agents actually detect and refactor duplicate code across microservices?

I’m struggling with a common monorepo problem that’s driving me nuts. We have about 15 TypeScript microservices that have evolved over time, and there’s a ton of duplicated business logic across them. Some of it is subtle - similar validation logic, data transformation functions, error handling, etc.

Manually finding and refactoring all of this into shared modules would take weeks, and I’m wondering if anyone has successfully used AI to automate this process.

I’ve been experimenting with Latenode’s autonomous AI agents to analyze our codebase and identify patterns of duplication. The idea is to set up a workflow where one agent scans the code, another identifies duplicate patterns, and a third generates shared modules that can be imported across services.

So far the results are promising - it’s found several chunks of nearly identical code that we didn’t even realize were duplicated. But I’m concerned about edge cases and whether the AI-generated shared modules will actually work correctly across all services.

Has anyone tackled this problem at scale? Any tips for making this process more reliable?

I’ve successfully used Latenode’s AI agents to solve this exact problem at my company. We had 23 microservices with tons of duplicated logic - everything from validation rules to data transformations.

What worked well was creating a multi-stage workflow:

  1. First agent does a deep scan using code embeddings to find semantic similarities (not just identical code)

  2. Second agent groups these into potential shared modules

  3. Third agent generates the refactored code

  4. Fourth agent creates unit tests for the new shared modules

The key was having access to multiple AI models through Latenode’s subscription. We used Claude for the initial analysis since it handles large context windows well, then GPT-4 for the actual code generation since it tends to produce better TypeScript.

We identified over 30 opportunities for shared modules and successfully refactored about 80% of them automatically. The other 20% needed human review.

Start small with non-critical code paths before tackling core business logic. This approach saved us hundreds of engineering hours.

I led a similar refactoring effort for our monorepo last year, though we used a combination of automated tools and manual review rather than pure AI.

Our approach was to first use static analysis tools like jscpd and SonarQube to identify duplicate and similar code blocks. This gave us a good starting point but missed semantic duplications (code that does the same thing but looks different).

We then used LLM-based tools to analyze the functions identified by the static analysis and group them into potential shared modules based on their purpose, not just their syntax.

The most challenging part wasn’t finding the duplicates but designing shared modules that were actually usable across services with different needs. We found that blindly extracting code into shared libs sometimes created more problems than it solved.

My advice is to focus on high-value targets first - complex business logic that changes frequently and is used in multiple places. Don’t try to deduplicate everything at once.

I recently tackled this problem for a fintech company with 30+ microservices. We had duplicate validation logic, API client wrappers, and data transformation functions scattered throughout our codebase.

Our approach combined automated tools with human oversight. We used a custom tool that generated embeddings for each function in our codebase and clustered similar functions together. This gave us groups of potentially duplicated code.

Then we had engineers review each cluster to determine if it made sense to extract to a shared module. Not all duplication should be eliminated - sometimes services have slightly different requirements that justify separate implementations.

For the code that we did decide to refactor, we used AI to generate initial versions of the shared modules, but we always had human review before deployment. This hybrid approach worked well - we reduced our codebase size by about 15% and improved consistency across services.

I’ve implemented large-scale code deduplication projects across monorepos with 50+ services. The challenge is not just finding duplicate code but identifying which duplications should actually be unified.

AI can be extremely effective at detecting patterns, but you need a structured process:

  1. Begin with semantic analysis to group code by function, not just textual similarity. This is where AI excels.

  2. Prioritize based on complexity, usage frequency, and maintenance cost. Focus on business logic that changes frequently.

  3. When refactoring, create versatile shared modules with clear interfaces and appropriate configurability. Resist the urge to add every edge case.

  4. Implement comprehensive tests for shared modules to ensure they work across all consumption patterns.

The most successful projects I’ve seen use AI to identify candidates and propose initial refactorings, with engineers making final decisions about architecture and implementation details.

tried this last month. worked surprisingly well but u need good test coverage first. AI missed some subtle dependencies between functions so we had regression bugs.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.