Latest record per group

By David J. Lake
2007-03-29

How many times have you needed to get specific rows from your table only to find out you aren't sure how to go about retrieving them?

Here is a scenario that you will come across again and again, knowing how to retrieve the correct rows will lessen the load so you can concentrate on the rest of your query, whether it be to join other tables to your query or just move on to other work.

This article is on retrieving the latest row for each related group of rows.

Suppose our records involve cars, where there is a specific make, model and release year for each. The boss wants a report that should only list the last year each make and model was released.

These are the rows in our table:

CARS

MakeModel Year
Ford Taurus 1999
Ford Taurus 1998
Ford Taurus 1997
Ford Taurus 1996
Chevy S-10 2005
ChevyS-10 2004
Chevy S-10 2003
Chevy S-10 2002
Chevy S-10 2001
Chrysler Concorde 2003
Chrysler Concorde 2002
Chrysler Concorde 2000

The three rows that we want to retrieve from the table:

Ford Taurus 1999
Chevy S-10 2005
Chrysler Concorde 2003

What we are after is called a co-related sub-query. The sub-query compares information to the rows in the outer query. Let's see the query first and I'll give you an explanation afterwards.

SELECT
c1.make,
c1.model,
c1.year
FROM cars as c1
WHERE c1.year = (SELECT MAX(year)
        FROM cars
        WHERE make=c1.make
        AND model=c1.model)

Now I can't guarantee this explanation is exactly how the query is done, the processor would make whatever choice is necessary to do the query, but the important thing is to understand what you need to get the query to work.

The main part of the query will go through each make and model, when it comes to the year condition it runs into the where clause.

That where clause says to do a sub-query to come up with the year we are looking for. We want the maximum year for the make and model that the outer query is looking at.

Suppose the outer query right now is testing Chrysler Concorde. The year can only be the maximum for that make and model because of the conditions inside the sub-query. When those items all match up, the maximum year is returned to the main query.

The main query will then "store" the make, model and maximum year for that make and model. When all makes and models have been grabbed, with their corresponding years, all the data is fed out to your screen or your outfile as you specify.

Now the table you are working with likely doesn't match this example so here are the necessary items to look for when you are deciding which column belongs in the WHERE clause and which items are necessary inside the sub-query.

You are looking for your column(s) in your table that are repeating groups. Those items need to be inside your subquery as matching conditions to the outer query. The other column you want in the WHERE condition in the outer query will be the one you want to grab the maximum value for in each group.

Let's look at another example just so things are clear.

POSTS

userid title body dateposted
1NHLThe NHL is great 2007-03-25
2NHLI think it could be better 2007-03-26
1NHLHow would you change it then? 2007-03-26
2NHLI would take out the goons. 2007-03-27
1NHLI wouldn't go because I like the fights 2007-03-27

We want to check for the max(value) of our dateposted column so that belongs in our WHERE clause.

Inside our subquery though we need to check that the userid matches up. We want the latest post PER USER:

SELECT
p1.userid,
p1.title,
p1.body,
p1.dateposted
FROM POSTS as p1
WHERE p1.dateposted = (SELECT max(dateposted)
        FROM POSTS
        WHERE userid=p1.userid)

That query returns these two rows:

2 NHL I would take out the goons. 2007-03-27
1 NHL I wouldn't go because I like the fights 2007-03-27

There is one item I left out there on purpose, right now the title field is the same for each post so we don't need it. Of course your discussion forum would have all sorts of threads going on so we'll add a bit more data and then alter the query.

useridtitle body dateposted
1 NHL The NHL is great 2007-03-25
2 NHL I think it could be better 2007-03-26
1 NHL How would you change it then? 2007-03-26
1 NBA Raptors all the way baby! 2007-03-22
2 NHL I would take out the goons. 2007-03-27
1 NHL I wouldn't go because I like the fights 2007-03-27

Alter the query by adding in the title in the WHERE clause

SELECT
p1.userid,
p1.title,
p1.body,
p1.dateposted
FROM POSTS as p1
WHERE p1.dateposted = (SELECT max(dateposted)
        FROM POSTS
        WHERE userid=p1.userid
        AND title=p1.title)

and end up with these three rows:

2 NHL I would take out the goons. 2007-03-27
1 NHL I wouldn't go because I like the fights 2007-03-27
1 NBA Raptors all the way baby! 2007-03-22

Now you know how to retrieve the necessary rows per group. You can expand this by using it in a joined query as well. Just remember, concentrate first on working out the correct rows from this table before you add additional tables.