That subquery works but gets messy with larger datasets or regular use.
Why not automate this instead of writing manual queries each time? Set up a workflow that monitors your database and automatically pulls duplicate category records when they update.
I built this exact thing for our analytics team last month. They kept running queries to find duplicate survey responses. Now it runs automatically and spits out clean reports.
Best part? You can add filtering, scheduling, and push results to other systems without touching SQL again. If your data structure changes, update the workflow once.
Beats memorizing complex JOIN syntax every time you need this data.
You could also use EXISTS - it’s readable and performs well: SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id WHERE EXISTS (SELECT 1 FROM survey s2 WHERE s2.category = s.category AND s2.survey_id != s.survey_id). The EXISTS clause just checks if there’s another survey with the same category but different ID. I find it way more intuitive than window functions when I’m explaining stuff to other devs. Performance is similar to subqueries but can be faster depending on your indexes and optimizer. Test both on your actual data to see what works better.
hey, give this a shot: SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id WHERE s.category IN (SELECT category FROM survey GROUP BY category HAVING COUNT(*) > 1) - should do the trick!
quick alternative if your db supports CTEs: WITH multi_cats AS (SELECT category FROM survey GROUP BY category HAVING COUNT(*) > 1) SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id JOIN multi_cats mc ON s.category = mc.category. way cleaner than nested subqueries.
You can use a window function here - it’ll probably perform better than subqueries. Try this: SELECT r.question_ref, r.response_text FROM (SELECT r.question_ref, r.response_text, COUNT(*) OVER (PARTITION BY s.category) as cat_count FROM responses r JOIN survey s ON r.question_ref = s.survey_id) subq WHERE cat_count > 1. The window function counts per category without needing a separate subquery. This scales way better with large tables since it processes everything in one pass instead of running the subquery multiple times. Just tested it on a similar dataset and consistently beat the HAVING clause method.