Redshift Vacuuming to maintain table performance

This is part 1 of a series on Amazon Redshift maintenance:

  1. Vacuuming for Table Performance
  2. Monitoring Disk Space
  3. Monitoring Query Performance
  4. Column Compression

Why does Redshift require vacuuming?

There are two cases where a Redshift cluster's performance can degrade:

First, if you're updating or deleting a lot of rows, it'll leave a lot of dead rows that will have to vacummed.

Second, even if you're working with Redshift by only appending new rows and not deleting anything, your cluster distributions can get imbalanced. A vacuuming will rebalance the data on the nodes of your cluster.

When not to vacuum

If you're rebuilding your Redshift cluster each day or not having much data churning, it's not necessary to vacuum your cluster. There would be nothing to vaccum!

You also have to be mindful of timing the vacuuming operation as it's very expensive on the cluster.

Load data in sort order

If you're working with a table with a natural sort order such as a time dimension, and you only append new data that is sorted on the time dimension – you might not have to vacuum your table at all. Amazon's documentation goes into more details on this optimization: Loading Your Data in Sort Key Order.

Customize the vacuum type

By default, Redshift's vacuum will run a full vacuum – reclaiming deleted rows, re-sorting rows and re-indexing your data.

If you want fine-grained control over the vacuuming operation, you can specify the type of vacuuming:

vacuum delete only table_name;
vacuum sort only table_name;
vacuum reindex table_name;

Next Generation SQL Data Analysis Write SQL, visualize data, and share your results. Onboarding forward-thinking teams now:

No spam, ever! Unsubscribe any time. Learn more about the product.