Optimizing MySQL Performance for Grouped Query with Index Strategy

I’m working with a database table named route_data that contains approximately 1 million records. I need help optimizing a query that’s currently taking several seconds to execute.

My current query looks like this:

SELECT * FROM (
    SELECT * FROM route_data WHERE station_id='DEF' AND status=1 
    UNION 
    SELECT * FROM route_data WHERE station_id='GHI' 
    UNION 
    SELECT * FROM route_data WHERE station_id='JKL'
) AS combined_results 
GROUP BY combined_results.train_id 
ORDER BY departure_time

The problem is that this query keeps showing using temporary, using filesort in the execution plan. I’ve tried creating various indexes but nothing seems to help. The query combines multiple rows from the same table based on different station_id values, then groups by train_id.

Here’s my table structure:

CREATE TABLE `route_data` (
  `train_id` varchar(1000) NOT NULL,
  `sequence_num` int(11) NOT NULL,
  `station_id` varchar(1000) NOT NULL,
  `location_name` varchar(1000) NOT NULL,
  `arrival_time` time NOT NULL,
  `departure_time` time NOT NULL,
  `stop_duration` varchar(1000) NOT NULL,
  `platform` varchar(1000) NOT NULL,
  `travel_day` int(11) NOT NULL,
  `distance` varchar(1000) NOT NULL,
  `velocity` varchar(1000) NOT NULL,
  `altitude` varchar(1000) NOT NULL,
  `region` varchar(1000) NOT NULL,
  `location` varchar(1000) NOT NULL,
  `status` int(11) DEFAULT '1',
  KEY `composite_idx1` (`train_id`(767),`station_id`(767)),
  KEY `seq_idx` (`train_id`(767),`sequence_num`),
  KEY `status_time_idx` (`station_id`(767),`status`,`departure_time`),
  KEY `station_status_idx` (`station_id`(767),`status`),
  KEY `train_time_idx` (`train_id`(767),`departure_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

What would be the best approach to rewrite this query and create proper indexes to get millisecond response times?

Your table design is the problem - those varchar(1000) fields with 767 character index prefixes are killing performance. Your indexes aren’t working because of the complex UNION structure and you’re using SELECT *. Rewrite this as one query: SELECT * FROM route_data WHERE (station_id='DEF' AND status=1) OR (station_id='GHI') OR (station_id='JKL') GROUP BY train_id ORDER BY departure_time. This cuts out the subquery and UNION overhead. For indexes, build a covering index with just the columns you actually need instead of SELECT *. Do you really need train_id as varchar(1000)? That’s probably way too big and it’s forcing those index prefix limits. Even varchar(50) would boost performance significantly. You’ll probably still get filesort since you’re grouping by train_id but ordering by departure_time, but this rewrite should help a lot.

try WHERE IN instead of UNION - like SELECT * FROM route_data WHERE (station_id='DEF' AND status=1) OR (station_id IN ('GHI','JKL')) GROUP BY train_id ORDER BY departure_time. those varchar(1000) fields are killing perf too - shrink them if you can. the filesort with GROUP BY + ORDER BY on diff columns is prob unavoidable tho.