Redshift: Generate a sequential range of numbers for time series analysis

One of our favorite features in PostgreSQL is the generate_series function. Given start, stop and step interval, PostgreSQL can generate a series of values, from start to stop with a step size of step. It can even work with dates or timestamps:

select generate_series('2017-01-01'::date, '2017-05-01'::date, '1 week'::interval);

Unfortunately, though Redshift supports the simpler variant to generate integer sequences, it does not support the date variant. We'll have to build an equivalent.

The common use-case for this function is to generate a sequential range of dates, and use a left join to figure out dates where you have no data. If you didn't do this, your timeseries will have gaps and your chart will be misleading.


Method 1: Create a table with sequential numbers

The simplest option is to create a table, for example, numbers and select from that. You can convert each number into the relevant date using Redshift's date manipulation functions:

select (getdate()::date - n)::date from numbers;

The numbers table can be created by using the row_number window function and the stl_connection_log system table that logs authentication attempts and connections/disconnections. For Redshift clusters with even the basic level of use, looping over the stl_connection_log table with a cross join should generate sufficient data:

insert into numbers
with x as (
    select 1 
    from stl_connection_log a, stl_connection_log b, stl_connection_log c
    -- limit 1000000
)
select row_number() over (order by 1) from x;

Method 2: Create a CTE counter

If you don't want to create a table before hand, you can create one on the fly – using Redshift's Common Table Expressions. A CTE works like a temporary table that only exists during the execution of the query.

with digit as (
    select 0 as d union all 
    select 1 union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9        
),
seq as (
    select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
    from digit a
        cross join
        digit b
        cross join
        digit c
        cross join
        digit d
    order by 1        
)
select (getdate()::date - seq.num)::date as "Date"
from seq;

The way this works is by first creating a temporary table digit with 10 rows containing values from 0 through 9. Next, we create all permutations of the digit table with the help of a couple of cross joins. By multiplying the digits by 1, 10, 100 and 1000, all values from 0 to 9999 are calculated.

We can finally use the sequential numbers to generate a continuous sequence of date using Redshift's date manipulation functions.


👋 No fuss, just SQL We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free. We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊

No spam, ever! Unsubscribe any time. See past emails here.

results matching ""

    No results matching ""