Calculating Top N items and Aggregating (sum) the remainder into "All Other"

The top / bottom N problem is often seen when ranking items in your database. We've previously written about the ranking problem in Calculating Top N items per Group.

Another requirement that co-exists with this question is to rollup all the other rows into a row called "All other". The reason for doing this is to provide more context when presenting data.


Example data

Consider a table with two columns: a salesperson's name and the amount of business they've closed:

Name Sales
Alice 8405837
Bob 3884307
Chris 2718782
Dan 2195914
Eve 1553165
Fae 1513367
George 1409019
Hadi 1355896
Ida 1257676
Justine 998537
... ...

1. Calculating simple top N items

Our first question is to list the top 10 salespeople by business closed, which can be accomplished with the following query. For more sophistication with window functions, see Calculating Top N items per Group:

select name,
    sales
from deals
order by 3 desc
limit 10
name sales
Alice 8405837
Bob 3884307
Chris 2718782
Dan 2195914
Eve 1553165
Fae 1513367
George 1409019
Hadi 1355896
Ida 1257676
Justine 998537

2. Aggregating into "All Other"

If we were to visualize the top salespeople as a pie chart, it'd look something like this:

The problem with the above chart is that we are missing data on how much business the other salespeople closed to put the visualization in context. We'll attempt to solve this question now.

What we are trying to accomplish is a final row appended to the results above that is the aggregation of the rest of the data, something like this:

name sales
Alice 8405837
Bob 3884307
Chris 2718782
... ...
Justine 998537
All Other 7833544

The following query accomplishes that:

with top10 as
    (select name,
            sales
    from deals
    order by 2 desc
    limit 10)
select *
from top10
union all
select 'All other' as name,
    sum(sales) as sales
from sales
where name not in
        (select name
        from top10)

How it works

We've organized the query into two queries and combined them with the union keyword. The first query is as before that finds the top 10 salespeople. The second builds upon this by excluding names that were previously found with the not in keyword. We've additionally "refactored" the core query as a CTE using the with keyword.


👋 No fuss, just SQL We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free. We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊

No spam, ever! Unsubscribe any time. See past emails here.

results matching ""

    No results matching ""