Concatenating Rows of String Values for Aggregation

What would an aggregation of rows of string columns look like? For numeric columns, we can calculate averages and sums over many rows to aggregate them. Rows with string columns require a different treatment, which we'll explore in this recipe.

This recipe is useful in many-to-many relationships, when you want to aggregate one side of a relationship for reporting purposes:

  • security management – a table of user and their corresponding roles in an organization
  • account management – a table of sales people / account managers and the customers they manage
  • inventory management – a table of stores and a list of products they hold or have sold
  • tag management – a table of tags applied to resources and you want to show a summary of tags against a resource

If you have a table of store locations and the products they hold along with their quantities:

location product quantity
New York Sports Collectibles 200
New York Electronics 1400
Seattle Shoes 1200
Seattle Electronics 1400
San Francisco Textbook Rentals 600
San Francisco Shoes 1000
... ... ...

Or, if your interested in the account management example, the table will look like this:

Account Manager Customer Contract Value
Alice Microsoft $1,000
Alice Apple $2,000
Bob Google $4,000
Bob Dell $1,200
Eve Netflix $600
... ... ...

Usually, best practices for data management will instruct you to normalize your data into three tables – a a store_location table, a product table and a product_quantity table. You can accomplish the first level of aggregation seen in the tables above with a couple of joins between them.

What we want our result to look like

location products_held
New York Electronics, Sports Collectibles
Seattle Electronics, Shoes
San Francisco Shoes, Textbook Rentals

We want to create a new aggregate column products_held on the product column from the original table that contains string values. Furthermore, we want to sort the aggregation by the quantity which would make the data more readily usable.

PostgreSQL's string_agg to the rescue

select location, 
       string_agg(product, ', ' order by quantity desc) 
from products
group by location;

PostgreSQL's aggregation functions are well-fleshed out. You can use other functions in the same class like array_agg and json_agg to accomplish similar results with array and json data types.


👋 No fuss, just SQL We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free. We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊

No spam, ever! Unsubscribe any time. See past emails here.

results matching ""

    No results matching ""