If you’re coming from a traditional SQL database background like Postgres or Oracle, you’d expect liberal use of database views. Views allow you present a consistent interface to the underlying schema and table. Moving over to Amazon Redshift brings subtle differences to views, which we talk about here…
1. How a View is created
A View creates a pseudo-table and from the perspective of a
SELECT statement, it appears exactly as a regular table. A view can be created from a subset of rows or columns of another table, or many tables via a
JOIN. In Postgres, views are created with the
CREATE VIEW statement:
CREATE VIEW <view_name> as SELECT <column1>, <column2>, ... FROM <table_name> WHERE <condition>;
The view is now available to be queried with a
SELECT statement. As this is not a real table, you cannot
UPDATE it. The underlying query is run every time you query the view.
If you want to store the result of the underlying query – you’d just have to use the
CREATE MATERIALIZED VIEW <view_name> as SELECT <column1>, <column2>, ... FROM <table_name> WHERE <condition>;
You should see performance improvements with a materialized view. You now control the upgrade schedule of the view and can be refreshed at your convenience:
REFRESH MATERIALIZED VIEW <view_name>;
2. What are the advantages of Views?
There are three main advantages to using views:
A materialized view is physically stored on disk and the underlying table is never touched when the view is queried. If your query takes a long time to run, a materialized view should act as a cache. This is pretty effective in the data warehousing case, where the underlying data is only updated periodically like every day. A few hours of stale data is OK.
The second advantage of views is that you can assign a different set of permissions to the view. A user might be able to query the view, but not the underlying table. Creating the view excluding the sensitive columns (or rows) should be useful in this scenario.
The third advantage of views is presenting a consistent interface to the data from an end-user perspective. You might have certain nuances of the underlying table which you could mask over when you create the views. The final reporting queries will be cleaner to read and write. Another side effect is you could denormalize high normalized schemas so that it’s easier to query.
3. Aside, why devs shy away from Views
For some reason beyond our comprehension, views have a bad reputation among our colleagues. We think it’s because:
- Most people are first exposed to databases through a PHP stack, usually paired with MySQL. MySQL has been slow adding standard SQL features and a whole generation of devs have not used anything else. Just a matter of familiarity.
- With web frameworks like Django and Rails, the standard way to access the database is through an ORM. ORMs have never had good support for maintaining views.
4. Views on Redshift
Views on Redshift mostly work as other databases with some specific caveats:
- you can’t create materialized views.
- views reference the internal names of tables and columns, and not what’s visible to the user. If you drop the underlying table, and recreate a new table with the same name, your view will still be broken.
- the Redshift query planner has trouble optimizing queries through a view.
Not only can you not gain the performance advantages of materialized views, it also ends up being slower that querying a regular table!
Instead, our recommendation is to create a real table instead:
CREATE TABLE <view_name> as SELECT <column1>, <column2>, ... FROM <table_name> WHERE <condition>;
Remember to drop and create the table every time your underlying data changes.