Gap Analysis to find missing values in a sequence

There are times when you want to find all gaps in a sequence of numbers, dates or data with a logically consecutive nature. A related scenario is to find ranges of unbroken sequences. There are a variety of use-cases of gap analysis:

  • If you require your employees to checkin to a timesheet everyday, a gap analysis can show you days where the employee was absent.
  • If you have a fleet of vehicles doing deliveries, a gap analysis can show periods of time when all vehicles are not being used. Useful for maintenance or downtime.
  • If you have a service level agreement for 24/7 coverage, a gap analysis can show you contractual breaches.

Example Data

For illustration purposes, we are going to consider a simple table of integers from 1 to 10, with the numbers 3, 4 and 7 missing:

create table gap (counter integer);

insert into gap (counter) values (1);
insert into gap (counter) values (2);
insert into gap (counter) values (5);
insert into gap (counter) values (6);
insert into gap (counter) values (8);
insert into gap (counter) values (9);
insert into gap (counter) values (10);

and,

select counter from gap;
counter
1
2
5
6
8
9
10

The "counter" in the example could be the list of checks from your bank's transaction log, and you want to find missing checks. Other data types such as dates or timestamps can be generalized into this format by mapping the timestamp into an epoch time.

1. Finding Gaps with an exclusion join

You can find missing values by joining with a table of all possible values. If you don't have such a table handy, you can generate one on the fly using PostgreSQL's generate_series (see alternatives for MySQL and Redshift). The generate_series function returns a continuous series as multiple rows.

select generate_series(1, 10, 1);
generate_series
1
2
3
4
5
6
7
8
9
10

We then left join our gap table with the generated series:

select series,
       gap.counter 
from generate_series(1, 10, 1) series
left join gap on series = gap.counter;
series counter
1 1
2 2
3 null
4 null
5 5
6 6
7 null
8 8
9 9
10 10

By filtering out the rows that have a null for the joined value, we get what we want:

select series,
       gap.counter 
from generate_series(1, 10, 1) series
left join gap on series = gap.counter
where counter is null;
series counter
3 null
4 null
7 null

You'll obviously want to use min(gap.counter) and max(gap.counter) as parameters to the generate_series function to generalize this solution.

2. Finding ranges of missing gaps

The above query gives you individual values that are missing. More useful is to present the missing values as a range. The desired result is:

start end
3 4
7 7

To find the start of a gap, we left join the table with itself on a join key offset by 1. This is similar to the query above:

select gap.counter + 1 as start
from gap
left join gap r on gap.counter = r.counter - 1
where r.counter is null;
start
3
7
11

(there's an extra row with a value of 11 at the end that is detected as a gap, which we'll fix later)

To find the end of a gap, we'll use a cross join to enumerate all the values of the gap table to find the start of the next valid range.

select min(fr.counter) - 1 as stop
from gap
left join gap fr on gap.counter < fr.counter
where fr.counter is not null
group by gap.counter;

Putting these two queries together:

select gap.counter + 1 as start, 
       min(fr.counter) - 1 as stop
from gap
left join gap r on gap.counter = r.counter - 1
left join gap fr on gap.counter < fr.counter
where r.counter is null
      and fr.counter is not null
group by gap.counter,
         r.counter;
start end
3 4
7 7

3. Finding ranges of continous values

This solution is for the opposite problem, finding ranges of continous values outside of gaps.

The desired result is:

start end
1 2
5 6
8 10

To find the start of a valid range, we look for numbers that have no previous value:

select *
from gap 
left join gap s on s.counter = gap.counter - 1;
counter counter_1
1 null
2 1
5 null
6 5
8 null
9 8
10 9

Filtering out rows that do not have a match:

select gap.counter as start, 
       s.counter
from gap 
left join gap s on s.counter = gap.counter - 1
where s.counter is null;
start counter
1 null
5 null
8 null

To find the end of a valid range,

select gap.counter as stop, e.counter 
from gap 
left join gap e on e.counter = gap.counter + 1
where e.counter is null;
stop counter
2 null
6 null
10 null

Combining the two queries, gives us the desired result:

select gap.counter as start,
    (select a.counter as counter
    from gap a
    left join gap b on b.counter = a.counter + 1
    where b.counter is null
        and a.counter >= gap.counter
    limit 1) as stop
from gap
left join gap s on s.counter = gap.counter - 1
where s.counter is null;
start end
1 2
5 6
8 10

4. Gap analysis using window functions

The above queries can be rewritten using the SQL window functions LEAD and LAG. We'll revisit this in the future.


👋 No fuss, just SQL We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free. We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊

No spam, ever! Unsubscribe any time. See past emails here.

results matching ""

    No results matching ""