SQL query to retrieve records with multiple entries for same category in joined tables

I have two tables that I need to work with:

Survey table:

survey_id  category
10         A
20         A
30         B
40         C
50         A

Responses table:

question_ref  response_text
10            hello
20            world
30            test
40            sample
50            data

The relationship is: Responses.question_ref = Survey.survey_id

I want to get Responses.question_ref and response_text but only when there are multiple records in Responses that link to the same Survey.category.

What I expect to get:

question_ref  response_text
10            hello
20            world
50            data

Basically I need all response records that belong to categories which have more than one entry. How can I write this query?

That subquery works but gets messy with larger datasets or regular use.

Why not automate this instead of writing manual queries each time? Set up a workflow that monitors your database and automatically pulls duplicate category records when they update.

I built this exact thing for our analytics team last month. They kept running queries to find duplicate survey responses. Now it runs automatically and spits out clean reports.

Best part? You can add filtering, scheduling, and push results to other systems without touching SQL again. If your data structure changes, update the workflow once.

Beats memorizing complex JOIN syntax every time you need this data.

You could also use EXISTS - it’s readable and performs well: SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id WHERE EXISTS (SELECT 1 FROM survey s2 WHERE s2.category = s.category AND s2.survey_id != s.survey_id). The EXISTS clause just checks if there’s another survey with the same category but different ID. I find it way more intuitive than window functions when I’m explaining stuff to other devs. Performance is similar to subqueries but can be faster depending on your indexes and optimizer. Test both on your actual data to see what works better.

hey, give this a shot: SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id WHERE s.category IN (SELECT category FROM survey GROUP BY category HAVING COUNT(*) > 1) - should do the trick!

quick alternative if your db supports CTEs: WITH multi_cats AS (SELECT category FROM survey GROUP BY category HAVING COUNT(*) > 1) SELECT r.question_ref, r.response_text FROM responses r JOIN survey s ON r.question_ref = s.survey_id JOIN multi_cats mc ON s.category = mc.category. way cleaner than nested subqueries.

You can use a window function here - it’ll probably perform better than subqueries. Try this: SELECT r.question_ref, r.response_text FROM (SELECT r.question_ref, r.response_text, COUNT(*) OVER (PARTITION BY s.category) as cat_count FROM responses r JOIN survey s ON r.question_ref = s.survey_id) subq WHERE cat_count > 1. The window function counts per category without needing a separate subquery. This scales way better with large tables since it processes everything in one pass instead of running the subquery multiple times. Just tested it on a similar dataset and consistently beat the HAVING clause method.