Redshift: Generate a sequential range of numbers for time series analysis
One of our favorite features in PostgreSQL is the
generate_series function. Given
step interval, PostgreSQL can generate a series of values, from
stop with a step size of
step. It can even work with dates or timestamps:
select generate_series('2017-01-01'::date, '2017-05-01'::date, '1 week'::interval);
Unfortunately, though Redshift supports the simpler variant to generate integer sequences, it does not support the date variant. We'll have to build an equivalent.
The common use-case for this function is to generate a sequential range of dates, and use a left join to figure out dates where you have no data. If you didn't do this, your timeseries will have gaps and your chart will be misleading.
Method 1: Create a table with sequential numbers
The simplest option is to create a table, for example,
numbers and select from that. You can convert each number into the relevant date using Redshift's date manipulation functions:
select (getdate()::date - n)::date from numbers;
numbers table can be created by using the
row_number window function and the
stl_connection_log system table that logs authentication attempts and connections/disconnections. For Redshift clusters with even the basic level of use, looping over the
stl_connection_log table with a cross join should generate sufficient data:
insert into numbers with x as ( select 1 from stl_connection_log a, stl_connection_log b, stl_connection_log c -- limit 1000000 ) select row_number() over (order by 1) from x;
Method 2: Create a CTE counter
If you don't want to create a table before hand, you can create one on the fly – using Redshift's Common Table Expressions. A CTE works like a temporary table that only exists during the execution of the query.
with digit as ( select 0 as d union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 ), seq as ( select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num from digit a cross join digit b cross join digit c cross join digit d order by 1 ) select (getdate()::date - seq.num)::date as "Date" from seq;
The way this works is by first creating a temporary table
digit with 10 rows containing values from 0 through 9. Next, we create all permutations of the
digit table with the help of a couple of cross joins. By multiplying the digits by 1, 10, 100 and 1000, all values from 0 to 9999 are calculated.
We can finally use the sequential numbers to generate a continuous sequence of date using Redshift's date manipulation functions.
No spam, ever! Unsubscribe any time. See past emails here.