MySQL: Generate a sequential range of numbers for time series analysis

One of our favorite features in PostgreSQL is the generate_series function. Given start, stop and step interval, PostgreSQL can generate a series of values, from start to stop with a step size of step. It can even work with dates or timestamps:

select generate_series('2017-01-01'::date, '2017-05-01'::date, '1 week'::interval);

Unfortunately, MySQL does not have this very useful feature and so, we'll have to build an equivalent.

The common use-case for this function is to generate a sequential range of dates, and use a left join to figure out dates where you have no data. If you didn't do this, your timeseries will have gaps and your chart will be misleading.


Method 1: Create a table with sequential numbers

The simplest option is to create a table, for example, numbers and select from that. You can convert each number into the relevant date using MySQL's date manipulation functions:

select date_sub(date(now()), interval n day) from numbers;

Method 2: Create a CTE counter

If you don't want to create a table before hand, you can create one on the fly – using MySQL's Common Table Expressions. A CTE works like a temporary table that only exists during the execution of the query.

with digit as (
    select 0 as d union all 
    select 1 union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9        
),
seq as (
    select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
    from digit a
        cross join
        digit b
        cross join
        digit c
        cross join
        digit d
    order by 1        
)
select current_date - seq.num as "Date"
from seq;

The way this works is by first creating a temporary table digit with 10 rows containing values from 0 through 9. Next, we create all permutations of the digit table with the help of a couple of cross joins. By multiplying the digits by 1, 10, 100 and 1000, all values from 0 to 9999 are calculated.

We can finally use the sequential numbers to generate a continuous sequence of dates using MySQL's current_date and some simple manipulation.


👋 No fuss, just SQL We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free. You'll learn bucketing, finding outliers, calculating correlations, budgeting & forecasting, return on spend and investment, data smoothing and cleaning applied to freemium, transactional, advertising, two-sided marketplace and subscription based business models. We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊

No spam, ever! Unsubscribe any time.