# Gap Analysis to find missing values in a sequence

There are times when you want to find all gaps in a sequence of numbers, dates or data with a logically consecutive nature. A related scenario is to find ranges of unbroken sequences. There are a variety of use-cases of gap analysis:

- If you require your employees to checkin to a timesheet everyday, a gap analysis can show you days where the employee was absent.
- If you have a fleet of vehicles doing deliveries, a gap analysis can show periods of time when all vehicles are not being used. Useful for maintenance or downtime.
- If you have a service level agreement for 24/7 coverage, a gap analysis can show you contractual breaches.

## Example Data

For illustration purposes, we are going to consider a simple table of integers from 1 to 10, with the numbers 3, 4 and 7 missing:

```
create table gap (counter integer);
insert into gap (counter) values (1);
insert into gap (counter) values (2);
insert into gap (counter) values (5);
insert into gap (counter) values (6);
insert into gap (counter) values (8);
insert into gap (counter) values (9);
insert into gap (counter) values (10);
```

and,

```
select counter from gap;
```

counter |
---|

1 |

2 |

5 |

6 |

8 |

9 |

10 |

The "counter" in the example could be the list of checks from your bank's transaction log, and you want to find missing checks. Other data types such as dates or timestamps can be generalized into this format by mapping the timestamp into an epoch time.

## 1. Finding Gaps with an exclusion join

You can find missing values by joining with a table of all possible values. If you don't have such a table handy, you can generate one on the fly using PostgreSQL's `generate_series`

(see alternatives for MySQL and Redshift). The `generate_series`

function returns a continuous series as multiple rows.

```
select generate_series(1, 10, 1);
```

generate_series |
---|

1 |

2 |

3 |

4 |

5 |

6 |

7 |

8 |

9 |

10 |

We then `left join`

our `gap`

table with the generated series:

```
select series,
gap.counter
from generate_series(1, 10, 1) series
left join gap on series = gap.counter;
```

series | counter |
---|---|

1 | 1 |

2 | 2 |

3 | null |

4 | null |

5 | 5 |

6 | 6 |

7 | null |

8 | 8 |

9 | 9 |

10 | 10 |

By filtering out the rows that have a `null`

for the joined value, we get what we want:

```
select series,
gap.counter
from generate_series(1, 10, 1) series
left join gap on series = gap.counter
where counter is null;
```

series | counter |
---|---|

3 | null |

4 | null |

7 | null |

You'll obviously want to use `min(gap.counter)`

and `max(gap.counter)`

as parameters to the `generate_series`

function to generalize this solution.

## 2. Finding ranges of missing gaps

The above query gives you individual values that are missing. More useful is to present the missing values as a range. The desired result is:

start | end |
---|---|

3 | 4 |

7 | 7 |

To find the start of a gap, we left join the table with itself on a join key offset by 1. This is similar to the query above:

```
select gap.counter + 1 as start
from gap
left join gap r on gap.counter = r.counter - 1
where r.counter is null;
```

start |
---|

3 |

7 |

11 |

(there's an extra row with a value of 11 at the end that is detected as a gap, which we'll fix later)

To find the end of a gap, we'll use a cross join to enumerate all the values of the `gap`

table to find the start of the next valid range.

```
select min(fr.counter) - 1 as stop
from gap
left join gap fr on gap.counter < fr.counter
where fr.counter is not null
group by gap.counter;
```

Putting these two queries together:

```
select gap.counter + 1 as start,
min(fr.counter) - 1 as stop
from gap
left join gap r on gap.counter = r.counter - 1
left join gap fr on gap.counter < fr.counter
where r.counter is null
and fr.counter is not null
group by gap.counter,
r.counter;
```

start | end |
---|---|

3 | 4 |

7 | 7 |

## 3. Finding ranges of continous values

This solution is for the opposite problem, finding ranges of continous values outside of gaps.

The desired result is:

start | end |
---|---|

1 | 2 |

5 | 6 |

8 | 10 |

To find the start of a valid range, we look for numbers that have no previous value:

```
select *
from gap
left join gap s on s.counter = gap.counter - 1;
```

counter | counter_1 |
---|---|

1 | null |

2 | 1 |

5 | null |

6 | 5 |

8 | null |

9 | 8 |

10 | 9 |

Filtering out rows that do not have a match:

```
select gap.counter as start,
s.counter
from gap
left join gap s on s.counter = gap.counter - 1
where s.counter is null;
```

start | counter |
---|---|

1 | null |

5 | null |

8 | null |

To find the end of a valid range,

```
select gap.counter as stop, e.counter
from gap
left join gap e on e.counter = gap.counter + 1
where e.counter is null;
```

stop | counter |
---|---|

2 | null |

6 | null |

10 | null |

Combining the two queries, gives us the desired result:

```
select gap.counter as start,
(select a.counter as counter
from gap a
left join gap b on b.counter = a.counter + 1
where b.counter is null
and a.counter >= gap.counter
limit 1) as stop
from gap
left join gap s on s.counter = gap.counter - 1
where s.counter is null;
```

start | end |
---|---|

1 | 2 |

5 | 6 |

8 | 10 |

## 4. Gap analysis using window functions

The above queries can be rewritten using the SQL window functions `LEAD`

and `LAG`

. We'll revisit this in the future.

No spam, ever! Unsubscribe any time. See past emails here.