I’m working on a database design and trying to figure out the right approach for indexing. Someone mentioned that adding indexes on columns used in JOIN operations can improve performance, but I want to understand this better.
I have a basic understanding that indexes work like a book’s table of contents, helping find data without scanning everything. However, I’m not sure about the practical implementation.
Here’s my scenario with three tables for a rating system:
MEMBERS table
member_id member_name
1 john
2 mary
3 alex
4 lisa
5 mike
POSTS table
post_id thread_id message author_id
1 1 great! 1
2 2 awesome 2
3 1 not good 3
RATINGS table
rating_id score post_id rater_id
1 -1 1 5
2 1 1 4
3 1 3 1
4 -1 2 5
5 1 2 4
My query looks like this:
select m.member_id, m.member_name,
p.post_id, p.thread_id, p.message,
count(r.score) as total_ratings, sum(r.score > 0) as positive, sum(r.score < 0) as negative,
user_rating.score as my_rating
from posts p
join members m on m.member_id = p.author_id
left join ratings r on r.post_id = p.post_id
left join ratings user_rating on user_rating.post_id = p.post_id
and user_rating.rater_id = 1
where p.thread_id = 1
group by p.post_id, m.member_name, p.post_id, p.thread_id, my_rating;
With millions of records expected, should I create indexes on posts.author_id, ratings.rater_id, and ratings.post_id columns to optimize performance?
yeah, definitely add those indexes but don’t overdo it. each one slows down inserts/updates, so be selective. for your query, i’d prioritize posts(thread_id) first - that’s your main filter. then add ratings(post_id). the author_id index is useful but less critical unless you’re runnin lots of author-based queries.
Yeah, you definitely need indexes with millions of records. Since you’re filtering by thread_id first, that’s your most selective condition - index that for sure. The join columns you picked are solid choices.
I spotted some redundancy in your GROUP BY that’s probably hurting performance. More importantly, create a covering index on ratings with (post_id, score, rater_id). This lets MySQL handle your joins and aggregation without hitting the actual table data.
With your rating system, you might get better performance by denormalizing some aggregated data. Running COUNT and SUM operations on millions of rating records gets expensive, even with proper indexes. If this query runs a lot, consider keeping running totals in your posts table instead.
Your index suggestions look solid - you’re definitely heading in the right direction. With millions of records, indexing makes or breaks performance. For your setup, I’d create indexes on posts.author_id, ratings.post_id, and ratings.rater_id since those are your join conditions. But don’t skip posts.thread_id - you’re filtering on it in your WHERE clause, so it’s just as important. Here’s something I’ve learned the hard way: compound indexes usually beat single-column ones. Try a composite index on ratings(post_id, rater_id) since you’re joining on both columns in your user_rating alias. One index handles both joins more efficiently. Quick fix - your GROUP BY has p.post_id listed twice. Clean that up because it messes with query optimization. MySQL’s planner works better with cleaner syntax. Run EXPLAIN on your queries to check the execution plan before and after adding indexes. With your data volume, the performance difference should be huge.
This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.