Calculating Top N items per Group in SQL

How to find the top / bottom "N" rows in each group?

This question is more commonly seen that you think:

  • Out of a list of twitter users, find their N recent most tweets
  • Out of all our salespeople, find the top N performing ones in each region
  • Out of all the cities, find the top N cities in each country by population

Conside a table cities with three columns country, city and population:

country city population
United States New York 8175133
United States Los Angeles 3792621
United States Chicago 2695598
France Paris 2181000
France Marseille 808000
France Lyon 422000
United Kingdom London 7825300
United Kingdom Birmingham 1016800
United Kingdom Leeds 770800

Top N without Grouping

We have 9 cities in our table, let's select the top 3 without grouping.

SELECT country, 
       city,
       population 
FROM cities 
ORDER BY population DESC 
LIMIT 3;
country city population
United States New York 8175133
United Kingdom London 7825300
United States Los Angeles 3792621

Top N with Grouping with row_number()

Fortunately, we have access to the window function row_number() that numbers the current row within its partition (group), counting at 1.

When we don't specify any partition:

SELECT country, 
       city,
       population, 
       row_number() OVER (ORDER BY population DESC) AS country_rank FROM cities;
country city population country_rank
United States New York 8175133 1
United Kingdom London 7825300 2
United States Los Angeles 3792621 3
United States Chicago 2695598 4
France Paris 2181000 5
United Kingdom Birmingham 1016800 6
France Marseille 808000 7
United Kingdom Leeds 770800 8
France Lyon 422000 9

When we create a partition for each country:

SELECT country, 
       city,
       population, 
       row_number() OVER (PARTITION BY country ORDER BY population DESC) AS country_rank FROM cities;
country city population country_rank
France Paris 2181000 1
France Marseille 808000 2
France Lyon 422000 3
United Kingdom London 7825300 1
United Kingdom Birmingham 1016800 2
United Kingdom Leeds 770800 3
United States New York 8175133 1
United States Los Angeles 3792621 2
United States Chicago 2695598 3

Almost there!

Now to select the top 2 cities from each country, we wrap the above query in another query and filter the row on the country_rank column.

SELECT * FROM (
    SELECT country, 
           city,
           population, 
           row_number() OVER (PARTITION BY country ORDER BY population DESC) AS country_rank FROM cities) ranks
WHERE country_rank <= 2;
country city population country_rank
France Paris 2181000 1
France Marseille 808000 2
United Kingdom London 7825300 1
United Kingdom Birmingham 1016800 2
United States New York 8175133 1
United States Los Angeles 3792621 2

No fuss, just SQL We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free. You'll learn bucketing, finding outliers, calculating correlations, budgeting & forecasting, return on spend and investment, data smoothing and cleaning applied to freemium, transactional, advertising, two-sided marketplace and subscription based business models. We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊

No spam, ever! Unsubscribe any time.