SQL LEFT JOIN with aggregation and date formatting

I’m dealing with two tables and I’m looking for assistance with a LEFT JOIN query to group the data by date. Here’s how my database is structured:

CREATE TABLE feedback (
    id INT(11) NOT NULL AUTO_INCREMENT,
    store_id INT(11),
    transaction_id CHAR(255),
    item_id CHAR(32),
    feedback_date INT(11),
    PRIMARY KEY (id)
);

INSERT INTO feedback (store_id, transaction_id, item_id, feedback_date)
VALUES
    (15, '200', '2000', 1466190000),
    (15, '200', '2000', 1466276400),
    (15, '200', '2000', 1466462800),
    (25, '900', '9000', 1466249200);

CREATE TABLE labels (
    id INT(11) NOT NULL AUTO_INCREMENT,
    store_id INT(11),
    transaction_id CHAR(255),
    item_id CHAR(32),
    label_date INT(11) NULL,
    PRIMARY KEY (id)
);

INSERT INTO labels (store_id, transaction_id, item_id, label_date)
VALUES
    (15, '200', '2000', 1466449200),
    (15, '200', '2000', NULL),
    (15, '200', '4000', NULL),
    (25, '900', '9000', 1469449200);

I want to generate a report reflecting daily data regarding total feedback counts, including numbers for labeled and unlabeled items. Here’s my current query:

SELECT
    DATE_FORMAT(FROM_UNIXTIME(f.feedback_date), "%d.%m.%Y") AS submission_date,
    f.store_id,
    COUNT(*) as total_items,
    COUNT(*) as labeled_items
FROM
    feedback AS f
LEFT JOIN labels as l
    ON f.store_id = l.store_id AND
    f.transaction_id = l.transaction_id AND
    f.item_id = l.item_id
WHERE
    l.label_date IS NOT NULL
GROUP BY f.store_id, f.transaction_id, f.item_id
ORDER BY submission_date ASC;

I expect the output to look like this:

| submission_date | store_id | total_items | labeled_items |
|-----------------|----------|-------------|---------------|
|      17.06.2016 |       15 |           3 |             1 |
|      18.06.2016 |       25 |           1 |             1 |

What adjustments do I need to make to this query so that I can obtain the accurate totals for both total and labeled items?

Your WHERE clause l.label_date IS NOT NULL is filtering out rows before counting happens - you’re only seeing items that already have labels. Move that condition into the SELECT using conditional aggregation instead.

Here’s the fix:

SELECT
    DATE_FORMAT(FROM_UNIXTIME(f.feedback_date), "%d.%m.%Y") AS submission_date,
    f.store_id,
    COUNT(*) as total_items,
    COUNT(CASE WHEN l.label_date IS NOT NULL THEN 1 END) as labeled_items
FROM
    feedback AS f
LEFT JOIN labels as l
    ON f.store_id = l.store_id AND
    f.transaction_id = l.transaction_id AND
    f.item_id = l.item_id
GROUP BY DATE_FORMAT(FROM_UNIXTIME(f.feedback_date), "%d.%m.%Y"), f.store_id
ORDER BY submission_date ASC;

I removed the WHERE clause and used COUNT(CASE WHEN l.label_date IS NOT NULL THEN 1 END) for labeled items. Also changed the GROUP BY to group by formatted date and store_id instead of individual items.

Your GROUP BY is the problem - it’s grouping by individual items instead of dates. You’re getting one row per item when you want one row per date. Plus your WHERE clause cuts out unlabeled records completely, so you can’t count total items properly.

Try this:

SELECT
    DATE_FORMAT(FROM_UNIXTIME(f.feedback_date), "%d.%m.%Y") AS submission_date,
    f.store_id,
    COUNT(DISTINCT CONCAT(f.store_id, '-', f.transaction_id, '-', f.item_id)) as total_items,
    COUNT(DISTINCT CASE WHEN l.label_date IS NOT NULL THEN CONCAT(f.store_id, '-', f.transaction_id, '-', f.item_id) END) as labeled_items
FROM
    feedback AS f
LEFT JOIN labels as l
    ON f.store_id = l.store_id AND
    f.transaction_id = l.transaction_id AND
    f.item_id = l.item_id
GROUP BY DATE_FORMAT(FROM_UNIXTIME(f.feedback_date), "%d.%m.%Y"), f.store_id
ORDER BY submission_date ASC;

The DISTINCT stops duplicate joins from inflating your counts when there are multiple label records for the same item. This’ll give you the daily breakdown you need.

Your LEFT JOIN is creating duplicate records and messing up your counts. When one feedback record matches multiple labels, your numbers get inflated. I’ve hit this same problem before - subqueries work way better than cramming everything into one query.

SELECT 
    DATE_FORMAT(FROM_UNIXTIME(feedback_date), "%d.%m.%Y") AS submission_date,
    store_id,
    COUNT(*) as total_items,
    SUM(CASE WHEN has_label = 1 THEN 1 ELSE 0 END) as labeled_items
FROM (
    SELECT DISTINCT
        f.feedback_date,
        f.store_id,
        f.transaction_id,
        f.item_id,
        CASE WHEN l.label_date IS NOT NULL THEN 1 ELSE 0 END as has_label
    FROM feedback f
    LEFT JOIN labels l ON f.store_id = l.store_id 
        AND f.transaction_id = l.transaction_id 
        AND f.item_id = l.item_id
) distinct_records
GROUP BY DATE_FORMAT(FROM_UNIXTIME(feedback_date), "%d.%m.%Y"), store_id
ORDER BY submission_date ASC;

The inner query deduplicates first, then the outer query handles aggregation without duplicate joins screwing up your totals.

Yeah, your grouping logic is the problem - you’re grouping by item level but need store/date level results. That WHERE clause also kills your left join since it filters out unlabeled records. Drop the WHERE and use SUM() with conditional logic instead of COUNT() to handle the duplicate joins.