Gap Analysis to find missing values in a sequence
There are times when you want to find all gaps in a sequence of numbers, dates or data with a logically consecutive nature. A related scenario is to find ranges of unbroken sequences. There are a variety of use-cases of gap analysis:
- If you require your employees to checkin to a timesheet everyday, a gap analysis can show you days where the employee was absent.
- If you have a fleet of vehicles doing deliveries, a gap analysis can show periods of time when all vehicles are not being used. Useful for maintenance or downtime.
- If you have a service level agreement for 24/7 coverage, a gap analysis can show you contractual breaches.
For illustration purposes, we are going to consider a simple table of integers from 1 to 10, with the numbers 3, 4 and 7 missing:
create table gap (counter integer); insert into gap (counter) values (1); insert into gap (counter) values (2); insert into gap (counter) values (5); insert into gap (counter) values (6); insert into gap (counter) values (8); insert into gap (counter) values (9); insert into gap (counter) values (10);
select counter from gap;
The "counter" in the example could be the list of checks from your bank's transaction log, and you want to find missing checks. Other data types such as dates or timestamps can be generalized into this format by mapping the timestamp into an epoch time.
1. Finding Gaps with an exclusion join
You can find missing values by joining with a table of all possible values. If you don't have such a table handy, you can generate one on the fly using PostgreSQL's
generate_series (see alternatives for MySQL and Redshift). The
generate_series function returns a continuous series as multiple rows.
select generate_series(1, 10, 1);
left join our
gap table with the generated series:
select series, gap.counter from generate_series(1, 10, 1) series left join gap on series = gap.counter;
By filtering out the rows that have a
null for the joined value, we get what we want:
select series, gap.counter from generate_series(1, 10, 1) series left join gap on series = gap.counter where counter is null;
You'll obviously want to use
max(gap.counter) as parameters to the
generate_series function to generalize this solution.
2. Finding ranges of missing gaps
The above query gives you individual values that are missing. More useful is to present the missing values as a range. The desired result is:
To find the start of a gap, we left join the table with itself on a join key offset by 1. This is similar to the query above:
select gap.counter + 1 as start from gap left join gap r on gap.counter = r.counter - 1 where r.counter is null;
(there's an extra row with a value of 11 at the end that is detected as a gap, which we'll fix later)
To find the end of a gap, we'll use a cross join to enumerate all the values of the
gap table to find the start of the next valid range.
select min(fr.counter) - 1 as stop from gap left join gap fr on gap.counter < fr.counter where fr.counter is not null group by gap.counter;
Putting these two queries together:
select gap.counter + 1 as start, min(fr.counter) - 1 as stop from gap left join gap r on gap.counter = r.counter - 1 left join gap fr on gap.counter < fr.counter where r.counter is null and fr.counter is not null group by gap.counter, r.counter;
3. Finding ranges of continous values
This solution is for the opposite problem, finding ranges of continous values outside of gaps.
The desired result is:
To find the start of a valid range, we look for numbers that have no previous value:
select * from gap left join gap s on s.counter = gap.counter - 1;
Filtering out rows that do not have a match:
select gap.counter as start, s.counter from gap left join gap s on s.counter = gap.counter - 1 where s.counter is null;
To find the end of a valid range,
select gap.counter as stop, e.counter from gap left join gap e on e.counter = gap.counter + 1 where e.counter is null;
Combining the two queries, gives us the desired result:
select gap.counter as start, (select a.counter as counter from gap a left join gap b on b.counter = a.counter + 1 where b.counter is null and a.counter >= gap.counter limit 1) as stop from gap left join gap s on s.counter = gap.counter - 1 where s.counter is null;
4. Gap analysis using window functions
The above queries can be rewritten using the SQL window functions
LAG. We'll revisit this in the future.
No spam, ever! Unsubscribe any time. See past emails here.