# Creating Pareto Charts to visualize the 80/20 principle in SQL

A Pareto chart highlights the most important factors amongst a large number of factors. A social scientist in Italy, Vilfredo Pareto discovered that roughly 20% of the population owned 80% of the wealth. From there, he noticed that these proportions described many other aspects of society. This led him to propose the 80-20 rule. The Pareto chart, named in his honor, shows the large contribution from a small proportion of the population.

You'll notice the 80-20 principle in action across your business:

- a small proportion of customers generate the most customer support load
- a small proportion of customers contribute the most to the over all revenue – financial statements usually list this as a risk factor
- a small proportion of web pages generate the most pageviews
- a small proportion of adwords keywords generate the most clicks and conversions, etc.

The Pareto chart has two components:

- a column chart where the height of the bar is the frequency of that particular category
- a line chart that shows the cumulative percentage contribution to the total

For this example, we'll be aggregating the top pages on our site by pageviews and a cumulative percentage contribution to the total number of page views in that period.

## 1. Frequency distribution using GROUP BY

To calculate the total number of pageviews, categorized by page title and sorted descending by pageviews:

```
SELECT title,
count(*) as pageviews
FROM visits
WHERE visits.dt > (now() - '2 weeks'::interval)
-- just calculate for the last two weeks
GROUP BY title
ORDER BY 2 DESC
```

The above query will give us a table like the following:

title | pageviews |
---|---|

Calculating Percentage (%) of Total Sum | 354 |

Calculating Month-over-Month Growth Rate | 156 |

Calculating Relationships with Correlation Matrices | 156 |

Forecasting, Identifying Trends with Linear Regression | 134 |

... | ... |

## 2. Cumulative sum using Window functions

We'll next have to calculate a cumulative, running total (read more about SQL Running Totals.) For this, we'll be making use of window functions:

```
SELECT title,
pageviews,
SUM(pageview) OVER (ORDER BY pageviews DESC) as cumsum
FROM (
// query as before
) tbl;
```

The result will now look like this:

title | pageviews | cumsum |
---|---|---|

Calculating Percentage (%) of Total Sum | 354 | 354 |

Calculating Month-over-Month Growth Rate | 156 | 510 |

Calculating Relationships with Correlation Matrices | 156 | 666 |

Forecasting, Identifying Trends with Linear Regression | 134 | 800 |

... | ... | ... |

The last remaining step is to divide the cumulative sum by the total number of pageviews to calculate a percentage. We can precalculate the total number of pageviews with a common table expression:

```
WITH total_pageviews AS
(
SELECT count(pageviews) AS cnt
FROM visits
WHERE visits.dt > (now() - '2 weeks'::interval)
)
SELECT title,
pageviews,
SUM(pageview) OVER (ORDER BY pageviews DESC) * 1.0 / total_pageviews.cnt as cumsum
FROM (
SELECT title,
count(*) as pageviews
FROM visits
WHERE visits.dt > (now() - '2 weeks'::interval)
-- just calculate for the last two weeks
GROUP BY title
ORDER BY 2 DESC
) tbl;
```

No spam, ever! Unsubscribe any time.