SQL query to retrieve records with multiple entries for same category in joined tables

FlyingLeaf · August 12, 2025, 10:53pm

I have two tables that I need to work with:

Survey table:

survey_id  category
10         A
20         A
30         B
40         C
50         A

Responses table:

question_ref  response_text
10            hello
20            world
30            test
40            sample
50            data

The relationship is: Responses.question_ref = Survey.survey_id

I want to get Responses.question_ref and response_text but only when there are multiple records in Responses that link to the same Survey.category.

What I expect to get:

question_ref  response_text
10            hello
20            world
50            data

Basically I need all response records that belong to categories which have more than one entry. How can I write this query?

coffeeAndCode · August 18, 2025, 5:00am

That subquery works but gets messy with larger datasets or regular use.

Why not automate this instead of writing manual queries each time? Set up a workflow that monitors your database and automatically pulls duplicate category records when they update.

I built this exact thing for our analytics team last month. They kept running queries to find duplicate survey responses. Now it runs automatically and spits out clean reports.

Best part? You can add filtering, scheduling, and push results to other systems without touching SQL again. If your data structure changes, update the workflow once.

Beats memorizing complex JOIN syntax every time you need this data.

Nova56 · August 17, 2025, 6:04pm

You could also use EXISTS - it’s readable and performs well: SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id WHERE EXISTS (SELECT 1 FROM survey s2 WHERE s2.category = s.category AND s2.survey_id != s.survey_id). The EXISTS clause just checks if there’s another survey with the same category but different ID. I find it way more intuitive than window functions when I’m explaining stuff to other devs. Performance is similar to subqueries but can be faster depending on your indexes and optimizer. Test both on your actual data to see what works better.

TomDream42 · August 16, 2025, 5:44pm

hey, give this a shot: SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id WHERE s.category IN (SELECT category FROM survey GROUP BY category HAVING COUNT(*) > 1) - should do the trick!

Liam23 · August 16, 2025, 11:43am

quick alternative if your db supports CTEs: WITH multi_cats AS (SELECT category FROM survey GROUP BY category HAVING COUNT(*) > 1) SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id JOIN multi_cats mc ON s.category = mc.category. way cleaner than nested subqueries.

Emma_Galaxy · August 15, 2025, 1:36pm

You can use a window function here - it’ll probably perform better than subqueries. Try this: SELECT r.question_ref, r.response_text FROM (SELECT r.question_ref, r.response_text, COUNT(*) OVER (PARTITION BY s.category) as cat_count FROM responses r JOIN survey s ON r.question_ref = s.survey_id) subq WHERE cat_count > 1. The window function counts per category without needing a separate subquery. This scales way better with large tables since it processes everything in one pass instead of running the subquery multiple times. Just tested it on a similar dataset and consistently beat the HAVING clause method.