PostgreSQL surpasses other database products in ANSI SQL compliance. It cements its lead by adding constructs that range from convenient syntax shorthands to avant-garde features that break the bounds of traditional SQL. In this chapter, we’ll cover some SQL tidbits not often found in other databases. For this chapter, you should have a working knowledge of SQL; otherwise, you may not appreciate the labor-saving amuse-bouche that PostgreSQL brings to the table.
Well-designed relational databases store data in normalized form. To access this data across scattered tables, you write queries to join underlying tables. When you find yourself writing the same query over and over again, create a view. Simply put, a view is nothing more than a query permanently stored in the database.
Some purists have argued that one should always query a view, never tables. This means you must create a view for every table that you intend to query directly. The added layer of indirection eases management of permissions and facilitates abstraction of table data. We find this to be sound advice, but laziness gets the better of us.
Views in PostgreSQL have evolved over the years. Version 9.3 unveiled automatically updatable views. If your view draws from a single table and you include the primary key as an output column, you can issue an update command directly against your view. Data in the underlying table will follow suit.
Version 9.3 also introduced materialized views. When you mark a view as materialized, it
will requery the data only when you issue the REFRESH command. The upside is that
you’re not wasting resources running complex queries repeatedly; the
downside is that you might not have the most up-to-date data when you use
the view. Furthermore, under some
circumstances you are barred from access to the view during a
refresh.
Version 9.4 allows users to access materialized views during
refreshes. It also introduced the WITH CHECK OPTION modifier, which prevents
inserts and updates outside the scope of the view.
The simplest view draws from a single table. Always include the primary key if you intend to write data back to the table, as shown in Example 7-1.
CREATEORREPLACEVIEWcensus.vw_facts_2011ASSELECTfact_type_id,val,yr,tract_idFROMcensus.factsWHEREyr=2011;
As of version 9.3, you can alter the data in this view by using
INSERT, UPDATE, or DELETE
commands. Updates and deletes will abide by any WHERE
condition you have as part of your view. For example, the following
query will delete only records whose value is 0:
DELETEFROMcensus.vw_facts_2011WHEREval=0;
And the following will not update any records, because the view explicitly includes only records for 2011:
UPDATEcensus.vw_facts_2011SETval=1WHEREyr=2012;
Be aware that you can insert data that places it outside of the
view’s WHERE or update data so it is no
longer visible from the view as shown in Example 7-2.
UPDATEcensus.vw_facts_2011SETyr=2012WHEREyr=2011;
The update of Example 7-2
does not violate the WHERE condition. But, once executed,
you would have emptied your view. For the sake of sanity, you may find
it desirable to prevent updates or inserts that leave data invisible to
further queries. Version 9.4 introduced the WITH CHECK
OPTION to accomplish this. Include this modifier when creating
the view and PostgreSQL will forever balk at any attempts to add records
outside the view and to update records that will put them outside the
view. In our example view, our goal is to limit vw_facts_2011 to allow
inserts only of 2011 data and disallow updates of the yr to something
other than 2011. To add this restriction, we revise our view definition
as shown in Example 7-3.
CREATEORREPLACEVIEWcensus.vw_facts_2011ASSELECTfact_type_id,val,yr,tract_idFROMcensus.factsWHEREyr=2011WITHCHECKOPTION;
Now try to run an update such as:
UPDATEcensus.vw_facts_2011SETyr=2012WHEREval>2942;
You’ll get an error:
ERROR:NewrowviolatesWITHCHECKOPTIONforview"vw_facts_2011"DETAIL:Failingrowcontains(1,25001010500,2012,2985.000,100.00).
Views can encapsulate joins among tables. When a view draws from more than one table, updating the underlying data with a simple command is no longer possible. Drawing data from more than one table introduces inherent ambiguity when you’re trying to update the underlying data, and PostgreSQL is not about to make an arbitrary decision for you. For instance, if you have a view that joins a table of countries with a table of provinces, and then decide to delete one of the rows, PostgreSQL won’t know whether you intend to delete only a country, a province, or a particular country-province pairing. Nonetheless, you can still modify the underlying data through the view using triggers.
Let’s start by creating a view that pulls rows from the facts table and a lookup table, as shown in Example 7-4.
CREATEORREPLACEVIEWcensus.vw_factsASSELECTy.fact_type_id,y.category,y.fact_subcats,y.short_name,x.tract_id,x.yr,x.val,x.percFROMcensus.factsAsxINNERJOINcensus.lu_fact_typesAsyONx.fact_type_id=y.fact_type_id;
To make this view updatable with a trigger, you can define one or
more INSTEAD OF triggers. We first define the
trigger function to handle the trifecta: INSERT,
UPDATE, DELETE. In addition, PostgreSQL
supports triggers on the TRUNCATE event. You can use any language to write the function except
SQL, and you’re free to name it whatever you like. We chose
PL/pgSQL in Example 7-5.
CREATEORREPLACEFUNCTIONcensus.trig_vw_facts_ins_upd_del()RETURNStriggerAS$$BEGINIF(TG_OP='DELETE')THENDELETEFROMcensus.factsASfWHEREf.tract_id=OLD.tract_idANDf.yr=OLD.yrANDf.fact_type_id=OLD.fact_type_id;RETURNOLD;ENDIF;IF(TG_OP='INSERT')THENINSERTINTOcensus.facts(tract_id,yr,fact_type_id,val,perc)SELECTNEW.tract_id,NEW.yr,NEW.fact_type_id,NEW.val,NEW.perc;RETURNNEW;ENDIF;IF(TG_OP='UPDATE')THENIFROW(OLD.fact_type_id,OLD.tract_id,OLD.yr,OLD.val,OLD.perc)!=ROW(NEW.fact_type_id,NEW.tract_id,NEW.yr,NEW.val,NEW.perc)THENUPDATEcensus.factsASfSETtract_id=NEW.tract_id,yr=NEW.yr,fact_type_id=NEW.fact_type_id,val=NEW.val,perc=NEW.percWHEREf.tract_id=OLD.tract_idANDf.yr=OLD.yrANDf.fact_type_id=OLD.fact_type_id;RETURNNEW;ELSERETURNNULL;ENDIF;ENDIF;END;$$LANGUAGEplpgsqlVOLATILE;

Handles deletes. Delete only records with matching keys in the
OLD record.

Handles inserts.

Handles updates. Use the OLD record to determine
which records to update. NEW record has the new
data.

Update rows only if at least one of the columns from the facts table has changed.
Next, we bind the trigger function to the view, as shown in Example 7-6.
CREATETRIGGERtrig_01_vw_facts_ins_upd_delINSTEADOFINSERTORUPDATEORDELETEONcensus.vw_factsFOREACHROWEXECUTEPROCEDUREcensus.trig_vw_facts_ins_upd_del();
The binding syntax is uncharacteristically English-like.
Now when we update, delete, or insert into our view, we update the
underlying facts table instead:
UPDATEcensus.vw_factsSETyr=2012WHEREyr=2011ANDtract_id='25027761200';
Upon a successful update, PostgreSQL returns the following message:
Query returned successfully: 56 rows affected, 40 ms execution time.
If we try to update a field not in our update row comparison, the update will not take place:
UPDATEcensus.vw_factsSETshort_name='test';
With a message:
Query returned successfully: 0 rows affected, 931 ms execution time.
Although this example created a single trigger function to handle multiple events, we could have just as easily created a separate trigger and trigger function for each event.
PostgreSQL has another approach for updating views called rules, which predates the
introduction of INSTEAD OF triggers view support. You can
see an example using rules in Database Abstraction with Updatable
Views.
You can still use rules to update view data, but INSTEAD
OF triggers are preferred now. Internally PostgreSQL still uses
rules to define the view (a view is nothing but an INSTEAD OF
SELECT rule on a virtual table) and to implement single table
updatable views. The difference between using a trigger and a rule is
that a rule rewrites the underlying query and a trigger gets called for
each virtual row. As such, rules become overwhelmingly difficult to
write (and understand) when many tables are involved. Rules are also
limited because they can be written only in SQL, not in other procedural languages.
Materialized views cache the fetched data. This happens when you first create the view as well as
when you run the REFRESH MATERIALIZED VIEW command.
To use materialized views, you need at least version
9.3.
The most convincing cases for using materialized views are when the underlying query takes a long time and when having timely data is not critical. You often encounter these scenarios when building online analytical processing (OLAP) applications.
Unlike nonmaterialized views, you can add indexes to materialized views to speed up the read.
Example 7-7 demonstrates how to make a materialized version of the view in Example 7-1.
CREATEMATERIALIZEDVIEWcensus.vw_facts_2011_materializedASSELECTfact_type_id,val,yr,tract_idFROMcensus.factsWHEREyr=2011;
Create an index on a materialized view as you would do on a regular table, as shown in Example 7-8.
CREATEUNIQUEINDEXixONcensus.vw_facts_2011_materialized(tract_id,fact_type_id,yr);
For speedier access to a materialized view with a large number of
records, you may want to control the physical sort of the data. The
easiest way is to include an ORDER BY when you create the
view. Alternatively, you can add a cluster index to the view. First,
create an index in the physical sort order you want to have. Then
run the CLUSTER command,
passing it the index, as shown in Example 7-9.
CLUSTERcensus.vw_facts_2011_materializedUSINGix;CLUSTERcensus.vw_facts_2011_materialized;

Name the index to cluster on. Needed only during view creation.

Each time you refresh, you must recluster the data.
The advantage of using ORDER BY in the materialized view over
using the CLUSTER approach is that the sort is maintained with each
REFRESH MATERIALIZED VIEW call, alleviating the need to
recluster. The downside is that ORDER BY generally adds
more processing time to the REFRESH
step of the view. You should test the effect of ORDER BY on performance of REFRESH before using it. One way to test is
just to run the underlying query of the view with an ORDER BY clause.
To refresh the view in PostgreSQL 9.3, use:
REFRESHMATERIALIZEDVIEWcensus.vw_facts_2011_materialized;
The view cannot be queried while the REFRESH MATERIALIZED VIEW step is running.
In PostgreSQL 9.4, to allow the view to be queried while it’s refreshing, you can use:
REFRESHMATERIALIZEDVIEWCONCURRENTLYcensus.vw_facts_2011_materialized;
Current limitations of materialized views include:
You can’t use CREATE OR REPLACE to edit an
existing materialized view. You must drop and re-create the view
even for the most trivial of changes. Use DROP MATERIALIZED VIEW
. Annoyingly, you’ll
lose all your indexes.name_of_view
You need to run REFRESH MATERIALIZED VIEW to
rebuild the cache. PostgreSQL doesn’t perform automatic recaching of
any kind. You need to resort to mechanisms such as crontab, pgAgent jobs, or triggers to
automate any kind of refresh. We have an example using triggers in
Caching Data with Materialized
Views and Statement-Level Triggers.
Refreshing materialized views in version 9.3 is a blocking
operation, meaning that the view will not be accessible during the
refresh process. In version 9.4 you can lift this quarantine by
adding the CONCURRENTLY keyword to your REFRESH command, provided that you have
established a unique index on your view. The trade-off is concurrent
refreshes could take longer to complete.
In our many years of writing SQL, we have come to appreciate the little things that make better use of our typing. Only PostgreSQL offers some of the gems we present in this section. Often this means that the construction is not ANSI-compliant. If thy God demands strict observance to the ANSI SQL standards, abstain from the short-cuts that we’ll be showing.
One of our favorites is DISTINCT ON. It behaves
like DISTINCT, but with two enhancements: you
can specify which columns to consider as distinct and to sort the
remaining columns. One little word—ON—replaces numerous lines of additional code
to achieve the same result.
In Example 7-10, we demonstrate how to get the details of the first tract for each county.
SELECTDISTINCTON(left(tract_id,5))left(tract_id,5)Ascounty,tract_id,tract_nameFROMcensus.lu_tractsORDERBYcounty,tract_id;
county | tract_id | tract_name -------+-------------+--------------------------------------------------- 25001 | 25001010100 | Census Tract 101, Barnstable County, Massachusetts 25003 | 25003900100 | Census Tract 9001, Berkshire County, Massachusetts 25005 | 25005600100 | Census Tract 6001, Bristol County, Massachusetts 25007 | 25007200100 | Census Tract 2001, Dukes County, Massachusetts 25009 | 25009201100 | Census Tract 2011, Essex County, Massachusetts : (14 rows)
The ON modifier accepts multiple columns, considering
all of them to determine distinctness. The ORDER BY clause
has to start with the set of columns in the DISTINCT ON;
then you can follow with your preferred ordering.
LIMIT returns only the number of rows indicated; OFFSET indicates the number of
rows to skip. You can use them in tandem or separately. You almost
always use them in conjunction with an ORDER BY. In Example 7-11, we demonstrate use of a
positive offset. Leaving out the offset yields the same result as
setting the offset to zero.
Limits and offsets are not unique to PostgreSQL and are in fact copied from MySQL, although implementation differs widely among database products.
SELECTDISTINCTON(left(tract_id,5))left(tract_id,5)Ascounty,tract_id,tract_nameFROMcensus.lu_tractsORDERBYcounty,tract_idLIMIT3OFFSET2;
county | tract_id | tract_name -------+-------------+------------------------------------------------- 25005 | 25005600100 | Census Tract 6001, Bristol County, Massachusetts 25007 | 25007200100 | Census Tract 2001, Dukes County, Massachusetts 25009 | 25009201100 | Census Tract 2011, Essex County, Massachusetts (3 rows)
ANSI SQL defines a construct called CAST that allows you to morph one
data type to another. For example, CAST('2011-1-11'
AS date) casts the text 2011-1-1 to a date.
PostgreSQL has shorthand for doing this, using a pair of colons, as in
'2011-1-1'::date. This syntax is shorter and easier to
apply for cases in which you can’t directly cast from one type to
another and have to intercede with one or more intermediary types, such
as someXML::text::integer.
PostgreSQL supports the multirow constructor to insert more than one record at a time. Example 7-12 demonstrates how to use a multirow construction to insert data into the table we created in Example 6-3.
INSERTINTOlogs_2011(user_name,description,log_ts)VALUES('robe','logged in','2011-01-10 10:15 AM EST'),('lhsu','logged out','2011-01-11 10:20 AM EST');
The latter portion of the multirow constructor, starting with
the VALUES keyword, is often referred to as a
values list. A values list can stand alone and effectively creates a table on
the fly, as in Example 7-13.
SELECT*FROM(VALUES('robe','logged in','2011-01-10 10:15 AM EST'::timestamptz),('lhsu','logged out','2011-01-11 10:20 AM EST'::timestamptz))ASl(user_name,description,log_ts);
When you use VALUES as a stand-in for a virtual
table, you need to specify the names for the columns. You also need to
explicitly cast the values to the data types in the table if the parser
can’t infer the data type from the data. The multirow VALUES construct
also exists in MySQL and SQL Server.
PostgreSQL is case-sensitive. However, it does have mechanisms in place to ignore
casing. You can apply the upper function to both sides of the ANSI LIKE operator, or you can simply use
the ILIKE (~~*) operator:
SELECTtract_nameFROMcensus.lu_tractsWHEREtract_nameILIKE'%duke%';
tract_name ------------------------------------------------ Census Tract 2001, Dukes County, Massachusetts Census Tract 2002, Dukes County, Massachusetts Census Tract 2003, Dukes County, Massachusetts Census Tract 2004, Dukes County, Massachusetts Census Tract 9900, Dukes County, Massachusetts
PostgreSQL has a construct called ANY that can be used in conjunction with arrays, combined with a comparator operator or comparator keyword. If any element of the array matches a row, that row is returned.
Here is an example:
SELECTtract_nameFROMcensus.lu_tractsWHEREtract_nameILIKEANY(ARRAY['%99%duke%','%06%Barnstable%']::text[]);
tract_name ----------------------------------------------------- Census Tract 102.06, Barnstable County, Massachusetts Census Tract 103.06, Barnstable County, Massachusetts Census Tract 106, Barnstable County, Massachusetts Census Tract 9900, Dukes County, Massachusetts (4 rows)
The example just shown is a shorthand way of using multiple
ILIKE OR clauses. You can use ANY
with other comparators such as LIKE, =, and
~ (the regex like operator).
ANY can be used with any data types and comparison operators (operators that return
a Boolean), including ones you built yourself or installed via
extensions.
A set-returning function is a function that could return more than one row.
PostgreSQL allows set-returning functions to appear in the
SELECT clause of an SQL statement. This is not true of most
other databases, in which only scalar functions can appear in the
SELECT.
Interweaving some set-returning functions into an already
complicated query could produce results beyond what you expect, because
these functions usually result in the creation of new rows. You must
anticipate this if you’ll be using the results as a subquery. In Example 7-14, we demonstrate row
creation resulting from using a temporal version of generate_series. The example uses
a table that we construct with:
CREATETABLEinterval_periods(i_typeinterval);INSERTINTOinterval_periods(i_type)VALUES('5 months'),('132 days'),('4862 hours');
SELECTi_type,generate_series('2012-01-01'::date,'2012-12-31'::date,i_type)AsdtFROMinterval_periods;
i_type | dt -----------+----------------------- 5 months | 2012-01-01 00:00:00-05 5 months | 2012-06-01 00:00:00-04 5 months | 2012-11-01 00:00:00-04 132 days | 2012-01-01 00:00:00-05 132 days | 2012-05-12 00:00:00-04 132 days | 2012-09-21 00:00:00-04 4862 hours | 2012-01-01 00:00:00-05 4862 hours | 2012-07-21 15:00:00-04
When you query from a table that has child tables, the query automatically drills
down into the children, creating a union of all the child records
satisfying the query condition. DELETE and
UPDATE work the same way, drilling down the hierarchy for victims.
Sometimes this is not desirable because you want data to come only from
the table you specified, without the kids tagging along.
This is where the ONLY keyword comes in handy. We show an example of its use in Example 7-37, where we want to delete only those
records from the production table that haven’t migrated to the log
table. Without the ONLY modifier, we’d end up deleting
records from the child table that might have already been moved
previously.
Often, when you delete data from a table, you’ll want to delete the data based
on its presence in another set of data. Specify this additional set with
the USING predicate. Then, in the WHERE
clause, you can use both datasets in the USING and in the
FROM to define conditions for deletion. Multiple tables can
be included in USING, separated by commas. Example 7-15 deletes all records from census.facts
that correspond to a fact type of short_name =
's01'.
DELETEFROMcensus.factsUSINGcensus.lu_fact_typesAsftWHEREfacts.fact_type_id=ft.fact_type_idANDft.short_name='s01';
The standards-compliant way would be to use a clunkier
IN expression in the WHERE.
The RETURNING predicate is supported by ANSI SQL standards but not commonly found in
other relational databases. We show an example in Example 7-37, where we return the records
deleted. RETURNING can also be used
for inserts and updates. For inserts into tables with a serial key,
RETURNING is invaluable because it returns the key value of
the new rows—something you wouldn’t know prior to the query execution.
Although RETURNING is often
accompanied by * for all fields, you can limit the fields
as we do in Example 7-16.
UPDATEcensus.lu_fact_typesASfSETshort_name=replace(replace(lower(f.fact_subcats[4]),' ','_'),':','')WHEREf.fact_subcats[3]='Hispanic or Latino:'ANDf.fact_subcats[4]>''RETURNINGfact_type_id,short_name;
fact_type_id | short_name -------------+------------------------------------------------- 96 | white_alone 97 | black_or_african_american_alone 98 | american_indian_and_alaska_native_alone 99 | asian_alone 100 | native_hawaiian_and_other_pacific_islander_alone 101 | some_other_race_alone 102 | two_or_more_races
New in version 9.5 is the INSERT ON CONFLICT construct, which is often referred to as an UPSERT. This feature is useful if you don’t know a record already exists in a table and rather than having the insert fail, you want it to either update the existing record or do nothing.
This feature requires a unique key, primary key, unique index, or exclusion constraint in place, that when violated, you’d want different behavior like updating the existing record or not doing anything. To demonstrate, imagine we have a table of colors to create:
CREATE TABLE colors(color varchar(50) PRIMARY KEY, hex varchar(6));
INSERT INTO colors(color, hex)
VALUES('blue', '0000FF'), ('red', 'FF0000');We then get a new batch of colors to add to our table, but some
may be present already. If we do a regular insert, we’d get a primary
key violation when we tried to add colors already in the table. When we
run Example 7-17, we get only one
record inserted, the green that is
not already in our table, and each subsequent run would result in no records being inserted.
INSERT INTO colors(color, hex)
VALUES('blue', '0000FF'), ('red', 'FF0000'), ('green', '00FF00')
ON CONFLICT DO NOTHING ;Someone could come and put in a different case 'Blue'
in our system, and we’d then have two different cased blues. To remedy
this, we can put a unique index on our table:
CREATE UNIQUE INDEX uidx_colors_lcolor ON colors USING btree(lower(color));
As before, if we tried to insert a 'Blue', we’d be
prevented from doing so and the ON CONFLICT DO NOTHING would result in
nothing happening. If we really wanted to spell the colors as given to
us, we could use code like that given in Example 7-18.
INSERT INTO colors(color, hex)
VALUES('Blue', '0000FF'), ('Red', 'FF0000'), ('Green', '00FF00')
ON CONFLICT(lower(color))
DO UPDATE SET color = EXCLUDED.color, hex = EXCLUDED.hex;In Example 7-18 we specified the
conflict, which matches the expression of a constraint or unique index,
so using something like upper(color) would not work since
the colors table has no matching index for that expression.
In the case of INSERT ON CONFLICT DO UPDATE, you need to specify
the conflicting condition or CONSTRAINT name. If using a constraint,
you’d use ON CONFLICT ON CONSTRAINT as shown in Example 7-19.constraint_name_here
INSERT INTO colors(color, hex)
VALUES('Blue', '0000FF'), ('Red', 'FF0000'), ('Green', '00FF00')
ON CONFLICT ON CONSTRAINT colors_pkey
DO UPDATE SET color = EXCLUDED.color, hex = EXCLUDED.hex;;The DO part of the INSERT construct will only happen if there is a primary key, unique index, or unique key constraint error triggered. However, errors such as data type ones or check constraints will fail and never be processed by DO UPDATE.
PostgreSQL automatically creates data types of all tables. Because data types derived from tables contain other data types, they are often called composite data types, or just composites. The first time you see a query with composites, you might be surprised. In fact, you might come across their versatility by accident when making a typo in an SQL statement. Try the following query:
SELECTxFROMcensus.lu_fact_typesAsxLIMIT2;
At first glance, you might think that we left out a
.* by accident, but check out the result:
x
------------------------------------------------------------------
(86,Population,"{D001,Total:}",d001)
(87,Population,"{D002,Total:,""Not Hispanic or Latino:""}",d002)Instead of erroring out, the preceding example returns the
canonical representation of a lu_fact_type data type.
Composites can serve as input to several useful functions, among which
are array_agg and hstore (a function packaged
with the hstore extension that converts a row into a key-value
pair object).
If you are building web applications, you can take advantage of
the built-in JSON and JSONB support we covered in “JSON” and use a combination of array_agg and array_to_json
to output a query as a single JSON object as shown in Example 7-20. In PostgreSQL 9.4, you can
use json_agg. See Example 7-21.
SELECTarray_to_json(array_agg(f))AscatFROM(SELECTMAX(fact_type_id)Asmax_type,categoryFROMcensus.lu_fact_typesGROUPBYcategory)Asf;
This will give you an output of:
cats
----------------------------------------------------
[{"max_type":102,"category":"Population"},
{"max_type":153,"category":"Housing"}]
Defines a subquery with name f. f
can then be used to reference each row in the subquery.

Aggregate each row of subquerying using array_agg and then convert the array to json with array_to_json.
In version 9.3, the json_agg function replaces the
chain of array_to_json and array_agg, offering
both convenience and speed. In Example 7-21, we repeat Example 7-20 using json_agg, and
both examples will have the same output.
SELECTjson_agg(f)AscatsFROM(SELECTMAX(fact_type_id)Asmax_type,categoryFROMcensus.lu_fact_typesGROUPBYcategory)Asf;
In standard ANSI SQL, single quotes (') surround string literals. Should you have a single quote in the string itself, such as last names like O’Nan, possesives like mon’s place, or contractions like can’t, you need to escape it with another. The escape character is another single quote placed in front of the single quote you’re trying to escape. Say you’re writing an insert statement where you copied a large passage from a novel. Affixing yet another single quote to all existing single quotes is both tedious to add and challenging to read. After all, two single quotes look awfully like one double quote, which is another character entirely.
PostgreSQL lets you escape single quotes in strings of any length by surrounding them with two sequential dollar signs ($$), hence the name dollar quoting.
Dollar quoting is also useful in situations where you’re trying to
execute a piece of SQL dynamically, such as
exec(some sql). In
Example 7-5, we enclosed the
body of a trigger using dollar quoting.
If you are writing an SQL statement that glues two sentences with many single quotes, the ANSI standard way would be to escape as in the following:
SELECT'It''s O''Neil''s play. '||'It''ll start at two o''clock.'
With dollar quoting:
SELECT$$It's O'Neil's play. $$ || $$It'llstartattwoo'clock.$$
The pair of dollar signs replaces the single quote and escapes all single quotes within.
A variant of dollar quoting is named dollar quoting. We cover this in the following section.
The DO command allows you to inject a piece of procedural code into your SQL on the fly. You can
think of it as a one-time anonymous function. As an example, we’ll load
the data collected in Example 3-10 into
production tables from our staging table. We’ll use PL/pgSQL for our
procedural snippet, but you’re free to use other languages.
First, we’ll create the table:
setsearch_path=census;DROPTABLEIFEXISTSlu_fact_typesCASCADE;CREATETABLElu_fact_types(fact_type_idserial,categoryvarchar(100),fact_subcatsvarchar(255)[],short_namevarchar(50),CONSTRAINTpk_lu_fact_typesPRIMARYKEY(fact_type_id));
Then we’ll use DO to populate it as shown in Example 7-22. CASCADE will force the
drop of any related objects such as foreign key constraints and views,
so be cautious when using CASCADE.
Example 7-22 generates a
series of INSERT INTO SELECT statements. The SQL also
performs an unpivot operation to convert columnar data into rows.
Example 7-22 is only a
partial listing of the code needed to build
lu_fact_types. For the full code, refer to the
building_census_tables.sql file that is part of
the book code and data download.
DO language plpgsql
$$
DECLARE var_sql text;
BEGIN
var_sql := string_agg(
$sql$
INSERT INTO lu_fact_types(category, fact_subcats, short_name)
SELECT
'Housing',
array_agg(s$sql$ || lpad(i::text,2,'0')
|| ') As fact_subcats,'
|| quote_literal('s' || lpad(i::text,2,'0')) || ' As short_name
FROM staging.factfinder_import
WHERE s' || lpad(I::text,2,'0') || $sql$ ~ '^[a-zA-Z]+' $sql$, ';'
)
FROM generate_series(1,51) As I;
EXECUTE var_sql;
END
$$;
Use of dollar quoting, so we don’t need to escape ' in
Housing. Since the DO command is also wrapped in dollars, we need
to use a named $ delimiter inside. We chose
$sql$.

Use string_agg to form a set of SQL statements as a single string of the form
INSERT INTO lu_fact_type(...) SELECT ... WHERE s01 ~
'[a-zA-Z]+';

Execute the SQL.
In Example 7-22, we are using the dollar-quoting syntax covered in “Dollar Quoting” for the body of the DO function and some fragments of the SQL statements inside the function. Since we use dollar quoting to define the whole body of the DO as well as internally, we need to use named dollar quoting for at least one part. The same dollar-quoting nested approach can be used for functon definitions as well.
New in version 9.4 is the FILTER clause for
aggregates, recently standardized in ANSI SQL. This
replaces the standard CASE WHEN clause for reducing the
number of rows included in an aggregation. For example, suppose you used
CASE WHEN to break out average test scores by student, as
shown in Example 7-23.
SELECTstudent,AVG(CASEWHENsubject='algebra'THENscoreELSENULLEND)Asalgebra,AVG(CASEWHENsubject='physics'THENscoreELSENULLEND)AsphysicsFROMtest_scoresGROUPBYstudent;
The FILTER clause equivalent
for Example 7-23 is shown in Example 7-24.
SELECTstudent,AVG(score)FILTER(WHEREsubject='algebra')Asalgebra,AVG(score)FILTER(WHEREsubject='physics')AsphysicsFROMtest_scoresGROUPBYstudent;
In the case of averages and sums and many other aggregates, the
CASE and FILTER are equivalent. The benefit is
that FILTER is a little clearer in purpose and for large
datasets is faster. However, there are some aggregates—such as
array_agg, which considers NULL fields—where the CASE statement gives you extra NULL values you don’t want. In Example 7-25 we try to get the list of
scores for each subject of interest for each student using the
CASE .. WHEN.. approach.
SELECTstudent,array_agg(CASEWHENsubject='algebra'THENscoreELSENULLEND)Asalgebra,array_agg(CASEWHENsubject='physics'THENscoreELSENULLEND)AsphysicsFROMtest_scoresGROUPBYstudent;
student | algebra | physics
--------+---------------------------+-------------------------------
jojo | {74,NULL,NULL,NULL,74,..} | {NULL,83,NULL,NULL,NULL,79,..}
jdoe | {75,NULL,NULL,NULL,78,..} | {NULL,72,NULL,NULL,NULL,72..}
robe | {68,NULL,NULL,NULL,77,..} | {NULL,83,NULL,NULL,NULL,85,..}
lhsu | {84,NULL,NULL,NULL,80,..} | {NULL,72,NULL,NULL,NULL,72,..}
(4 rows)Observe that in Example 7-25 we get a bunch of NULL fields in our arrays. We could work
around this issue with some clever use of subselects, but most of those
will be more verbose and slower than the FILTER alternative shown in Example 7-26.
SELECTstudent,array_agg(score)FILTER(WHEREsubject='algebra')Asalgebra,array_agg(score)FILTER(WHEREsubject='physics')AsphysicsFROMtest_scoresGROUPBYstudent;
student | algebra | physics
--------+---------+--------
jojo | {74,74} | {83,79}
jdoe | {75,78} | {72,72}
robe | {68,77} | {83,85}
lhsu | {84,80} | {72,72}FILTER works for all aggregate functions, not just
aggregate functions built into PostgreSQL.
New in PostgreSQL 9.4 are statistical functions for computing percentile, median (aka .5 percentile), and mode. These functions are percentile_disc (percentile discrete), percentile_cont (percentile continuous), and mode.
The two percentile functions differ in how they handle even counts. For the discrete function, the first value encountered is taken, so the ordering of the data matters. For the continuous case, values within the same percentile are averaged.
Median is merely the .5 percentile; therefore, it does not deserve a separate function of its own. The mode function finds the most common value. Should there be more than one mode, the first one encountered is returned; therefore, ordering matters, as shown in Example 7-27.
SELECTstudent,percentile_cont(0.5)WITHINGROUP(ORDERBYscore)Ascont_median,percentile_disc(0.5)WITHINGROUP(ORDERBYscore)ASdisc_median,mode()WITHINGROUP(ORDERBYscore)ASmode,COUNT(*)Asnum_scoresFROMtest_scoresGROUPBYstudentORDERBYstudent;
student | cont_median | disc_median | mode | num_scores --------+-------------+-------------+------+------------ alex | 78 | 77 | 74 | 8 leo | 72 | 72 | 72 | 8 regina | 76 | 76 | 68 | 9 sonia | 73.5 | 72 | 72 | 8 (4 rows)
Example 7-27 computes both the discrete and the continuous median score, which could differ when students have an even number of scores.
The inputs of these functions differ from other aggregate
functions. The column being aggregated is the column in the ORDER BY clauses of the WITHIN GROUP modifiers. The column is not
direct input to the function, as we’re used to seeing.
The percentile functions have another variant that accepts an array of percentiles, letting you retrieve multiple percentiles all in one call. Example 7-28 computes the median, the 60 percentile, and the highest score.
SELECTstudent,percentile_cont('{0.5,0.60,1}'::float[])WITHINGROUP(ORDERBYscore)AScont_median,percentile_disc('{0.5,0.60,1}'::float[])WITHINGROUP(ORDERBYscore)ASdisc_median,COUNT(*)Asnum_scoresFROMtest_scoresGROUPBYstudentORDERBYstudent;
student | cont_median | disc_median | num_scores
--------+----------------+-------------+------------
alex | {78,79.2,84} | {77,79,84} | 8
leo | {72,73.6,84} | {72,72,84} | 8
regina | {76,76.8,90} | {76,77,90} | 9
sonia | {73.5,75.6,86} | {72,75,86} | 8
(4 rows)As with all aggregates, you can combine these functions with
modifiers. Example 7-29
combines WITHIN GROUP with FILTER.
SELECTstudent,percentile_disc(0.5)WITHINGROUP(ORDERBYscore)FILTER(WHEREsubject='algebra')ASalgebra,percentile_disc(0.5)WITHINGROUP(ORDERBYscore)FILTER(WHEREsubject='physics')ASphysicsFROMtest_scoresGROUPBYstudentORDERBYstudent;
student | algebra | physics --------+---------+-------- alex | 74 | 79 leo | 80 | 72 regina | 68 | 83 sonia | 75 | 72 (4 rows)
Window functions are a common ANSI SQL feature. A window function has the
prescience to see and use data beyond the current row; hence the term
window. A window defines which other
rows need to be considered in addition to the current row. Windows let you
add aggregate information to each row of your output where the aggregation
involves other rows in the same window. Window functions such as row_number and rank are
useful for ordering your data in sophisticated ways that use rows outside
the selected results but within a window.
Without window functions, you’d have to resort to using joins and subqueries to poll neighboring rows. On the surface, window functions violate the set-based principle of SQL, but we mollify the purist by claiming that they are merely shorthand. You can find more details and examples in Window Functions.
Example 7-30 gives you a quick start.
Using a window function, we can obtain both the detail data and the
average value for all records with fact_type_id of 86 in one
single SELECT. Note that the WHERE clause is always evaluated
before the window function.
SELECTtract_id,val,AVG(val)OVER()asval_avgFROMcensus.factsWHEREfact_type_id=86;
tract_id | val | val_avg ------------+----------+---------------------- 25001010100 | 2942.000 | 4430.0602165087956698 25001010206 | 2750.000 | 4430.0602165087956698 25001010208 | 2003.000 | 4430.0602165087956698 25001010304 | 2421.000 | 4430.0602165087956698 :
The OVER sets the boundary of the window. In this
example, because the parentheses contain no constraint, the window covers
all the rows in our WHERE. So the
average is calculated across all rows with fact_type_id = 86.
The clause also morphed our conventional AVG aggregate
function into a window aggregate function. For each row, PostgreSQL
submits all the rows in the window to the AVG aggregation and
outputs the value as part of the row. Because our window has multiple
rows, the result of the aggregation is repeated. Notice that with window
functions, we were able to perform an aggregation without GROUP
BY. Furthermore, we were able to rejoin the aggregated result back
with the other variables without using a formal join.
You can use all SQL aggregate functions as window functions. In
addition, you’ll find ROW, RANK,
LEAD, and others listed in Window Functions.
You can run a window function over rows containing particular values instead of using
the whole table. This requires the addition of a PARTITION
BY clause, which instructs PostgreSQL to take the aggregate over
the indicated rows. In Example 7-31, we repeat what we did in
Example 7-30 but partition our window by
county code, which is always the first five characters of the
tract_id column. Thus, the rows in each county code are
averaged separately.
SELECTtract_id,val,AVG(val)OVER(PARTITIONBYleft(tract_id,5))Asval_avg_countyFROMcensus.factsWHEREfact_type_id=2ORDERBYtract_id;
tract_id | val | val_avg_county ------------+----------+---------------------- 25001010100 | 1765.000 | 1709.9107142857142857 25001010206 | 1366.000 | 1709.9107142857142857 25001010208 | 984.000 | 1709.9107142857142857 : 25003900100 | 1920.000 | 1438.2307692307692308 25003900200 | 1968.000 | 1438.2307692307692308 25003900300 | 1211.000 | 1438.2307692307692308 :
Window functions also allow an ORDER BY in the OVER clause. Without getting too abstruse, the
best way to think about this is that all the rows in the window will be
ordered as indicated by ORDER BY, and the window function
will consider only rows that range from the first row in the window up
to and including the current row in the window or partition. The classic
example uses the ROW_NUMBER function to sequentially
number rows. In Example 7-32, we
demonstrate how to number our census tracts in alphabetical order. To
arrive at the row number, ROW_NUMBER counts all rows up to
and including the current row based on the order dictated by the
ORDER BY.
SELECTROW_NUMBER()OVER(ORDERBYtract_name)Asrnum,tract_nameFROMcensus.lu_tractsORDERBYrnumLIMIT4;
rnum | tract_name -----+------------------------------------------------- 1 | Census Tract 1, Suffolk County, Massachusetts 2 | Census Tract 1001, Suffolk County, Massachusetts 3 | Census Tract 1002, Suffolk County, Massachusetts 4 | Census Tract 1003, Suffolk County, Massachusetts
In Example 7-32, we also have an
ORDER BY for the entire query. Don’t get confused between
this and the ORDER BY that’s specific to the window
function.
You can combine ORDER BY with PARTITION
BY, restarting the ordering for each partition. Example 7-33 returns to our example
of county codes.
SELECTtract_id,val,SUM(val)OVER(PARTITIONBYleft(tract_id,5)ORDERBYval)Assum_county_orderedFROMcensus.factsWHEREfact_type_id=2ORDERBYleft(tract_id,5),val;
tract_id | val | sum_county_ordered -------------+----------+----------------- 25001014100 | 226.000 | 226.000 25001011700 | 971.000 | 1197.000 25001010208 | 984.000 | 2181.000 : 25003933200 | 564.000 | 564.000 25003934200 | 593.000 | 1157.000 25003931300 | 606.000 | 1763.000 :
The key observation to make in the output is how the sum changes
from row to row. The ORDER BY clause means that the sum
will be taken only from the beginning of the partition to the current
row, giving you a running total, where the location of the current row
in the list is dictated by the ORDER BY clause. For
instance, if your row is in the fifth row in the third partition, the
sum will cover only the first five rows in the third partition. We put
an ORDER BY left(tract_id,5), val at the end of the query
so you can easily see the pattern, but keep in mind that the ORDER
BY of the query is independent of the ORDER BY in
each OVER clause.
You can explicitly control the rows under consideration by adding
a RANGE or ROWS clause: ROWS BETWEEN
CURRENT ROW AND 5 FOLLOWING.
PostgreSQL also supports window naming, which is useful if you
have the same window for each of your window columns. Example 7-34 demonstrates how to name windows
as well as how to use the LEAD and
LAG window functions to show a record value before and after for a
given partition.
SELECT*FROM(SELECTROW_NUMBER()OVER(wt)Asrnum,substring(tract_id,1,5)Ascounty_code,tract_id,LAG(tract_id,2)OVERwtAstract_2_before,LEAD(tract_id)OVERwtAstract_afterFROMcensus.lu_tractsWINDOWwtAS(PARTITIONBYsubstring(tract_id,1,5)ORDERBYtract_id))AsxWHERErnumBETWEEN2and3ANDcounty_codeIN('25007','25025')ORDERBYcounty_code,rnum;
rnum | county_code | tract_id | tract_2_before | tract_after -----+-------------+-------------+----------------+------------ 2 | 25007 | 25007200200 | | 25007200300 3 | 25007 | 25007200300 | 25007200100 | 25007200400 2 | 25025 | 25025000201 | | 25025000202 3 | 25025 | 25025000202 | 25025000100 | 25025000301
Both LEAD and LAG take an optional step
argument that defines how many rows to skip forward or backward; the
step can be positive or negative. LEAD and LAG
return NULL when trying to retrieve
rows outside the window partition. This is a possibility that you always
have to account for.
In PostgreSQL, any aggregate function you create can be used as a
window function. Other databases tend to limit window functions to using
built-in aggregates such as AVG, SUM,
MIN, and MAX.
Essentially, common table expressions (CTEs) allow you to define a query that can be reused in a larger query. CTEs act as temporary tables defined within the scope of the statement; they’re gone once the enclosing statement has finished executing.
There are three ways to use CTEs:
This is your plain-vanilla CTE, used to make your SQL more readable or to encourage the planner to materialize a costly intermediate result for better performance.
This is an extension of the basic CTE with
UPDATE, INSERT, and DELETE
commands. A common final step in the CTE is to return changed
rows.
This puts an entirely new whirl on standard CTE. The rows returned by a recursive CTE vary during the execution of the query.
PostgreSQL allows you to have a CTE that is both writable and recursive.
The basic CTE looks like Example 7-35. The
WITH keyword introduces the CTE.
WITHcteAS(SELECTtract_id,substring(tract_id,1,5)Ascounty_code,COUNT(*)OVER(PARTITIONBYsubstring(tract_id,1,5))Ascnt_tractsFROMcensus.lu_tracts)SELECTMAX(tract_id)Aslast_tract,county_code,cnt_tractsFROMcteWHEREcnt_tracts>100GROUPBYcounty_code,cnt_tracts;
cte is the name of the CTE in Example 7-35, defined using a SELECT
statement to contain three columns: tract_id,
county_code, and cnt_tracts. The main
SELECT refers to the CTE.
You can stuff as many CTEs as you like, separated by commas, into
the WITH clause, as shown in Example 7-36. The order of the CTEs matters in that CTEs
defined later can call CTEs defined earlier, but not vice versa.
WITHcte1AS(SELECTtract_id,substring(tract_id,1,5)Ascounty_code,COUNT(*)OVER(PARTITIONBYsubstring(tract_id,1,5))Ascnt_tractsFROMcensus.lu_tracts),cte2AS(SELECTMAX(tract_id)Aslast_tract,county_code,cnt_tractsFROMcte1WHEREcnt_tracts<8GROUPBYcounty_code,cnt_tracts)SELECTc.last_tract,f.fact_type_id,f.valFROMcensus.factsAsfINNERJOINcte2cONf.tract_id=c.last_tract;
The writable CTE extends the CTE to allow for update, delete, and insert statements. We’ll revisit our logs tables that we created in Example 6-3, adding another child table and populating it:
CREATETABLElogs_2011_01_02(PRIMARYKEY(log_id),CONSTRAINTchkCHECK(log_ts>='2011-01-01'ANDlog_ts<'2011-03-01'))INHERITS(logs_2011);
In Example 7-37, we move data from
our parent 2011 table to our new child Jan-Feb 2011 table. The
ONLY keyword is described in “Restricting DELETE, UPDATE, and SELECT from Inherited
Tables”
and the RETURNING keyword in “Returning Affected Records to the User”.
WITHtAS(DELETEFROMONLYlogs_2011WHERElog_ts<'2011-03-01'RETURNING*)INSERTINTOlogs_2011_01_02SELECT*FROMt;
The official
documentation for PostgreSQL describes it best: “The optional RECURSIVE modifier
changes CTE from a mere syntactic convenience into a feature that
accomplishes things not otherwise possible in standard SQL.” A more
interesting CTE is one that uses a recursively defining construct to
build an expression. PostgreSQL recursive CTEs utilize UNION
ALL to combine tables, a kind of combination that can be done
repeatedly as the query adds the tables over and over.
To turn a basic CTE to a recursive one, add the
RECURSIVE modifier after the WITH. WITH
RECURSIVE can contain a mix of recursive and nonrecursive table
expressions. In most other databases, the RECURSIVE keyword
is not necessary to denote recursion.
A common use of recursive CTEs is to represent message threads and other tree-like structures. We have an example of this in Recursive CTE to Display Tree Structures.
In Example 7-38, we query the system catalog to list the cascading table relationships we have in our database.
WITHRECURSIVEtblsAS(SELECTc.oidAstableoid,n.nspnameASschemaname,c.relnameAStablenameFROMpg_classcLEFTJOINpg_namespacenONn.oid=c.relnamespaceLEFTJOINpg_tablespacetONt.oid=c.reltablespaceLEFTJOINpg_inheritsAsthONth.inhrelid=c.oidWHEREth.inhrelidISNULLANDc.relkind='r'::"char"ANDc.relhassubclassUNIONALLSELECTc.oidAstableoid,n.nspnameASschemaname,tbls.tablename||'->'||c.relnameAStablenameFROMtblsINNERJOINpg_inheritsAsthONth.inhparent=tbls.tableoidINNERJOINpg_classcONth.inhrelid=c.oidLEFTJOINpg_namespacenONn.oid=c.relnamespaceLEFTJOINpg_tablespacetONt.oid=c.reltablespace)SELECT*FROMtblsORDERBYtablename;
tableoid | schemaname | tablename ---------+------------+--------------------------------------- 3152249 | public | logs 3152260 | public | logs->logs_2011 3152272 | public | logs->logs_2011->logs_2011_01_02

Get a list of all tables that have child tables but no parent table.

This is the recursive part; it gets all children of tables in
tbls.

The names of the child tables start with the parental name.

Return parents and all child tables. Because we sort by table name, which prepends the parent name, all child tables will follow their parents in their output.
LATERAL is a new ANSI SQL
construction in version 9.3. Here’s the motivation behind it: suppose you
perform joins on two tables or subqueries; normally, the pair
participating in the join are independent units and can’t read data from
each other. For example, the following interaction would generate an error
because l.yr = 2011 is not a column on
the righthand side of the join:
SELECT*FROMcensus.factsLINNERJOIN(SELECT*FROMcensus.lu_fact_typesWHEREcategory=CASEWHENL.yr=2011THEN'Housing'ELSEcategoryEND)RONL.fact_type_id=R.fact_type_id;
Now add the LATERAL keyword, and
the error is gone:
SELECT*FROMcensus.factsLINNERJOINLATERAL(SELECT*FROMcensus.lu_fact_typesWHEREcategory=CASEWHENL.yr=2011THEN'Housing'ELSEcategoryEND)RONL.fact_type_id=R.fact_type_id;
LATERAL lets you share data in columns across two
tables in a FROM clause. However, it
works only in one direction: the righthand side can draw from the lefthand
side, but not vice versa.
There are situations when you should avail yourself of LATERAL to avoid extremely convoluted syntax. In
Example 7-39, a column on the left
serves as a parameter in the generate_series function on the
right:
CREATETABLEinterval_periods(i_typeinterval);INSERTINTOinterval_periods(i_type)VALUES('5 months'),('132 days'),('4862 hours');
SELECTi_type,dtFROMinterval_periodsCROSSJOINLATERALgenerate_series('2012-01-01'::date,'2012-12-31'::date,i_type)ASdtWHERENOT(dt='2012-01-01'ANDi_type='132 days'::interval);
i_type | dt ------------+----------------------- 5 mons | 2012-01-01 00:00:00-05 5 mons | 2012-06-01 00:00:00-04 5 mons | 2012-11-01 00:00:00-04 132 days | 2012-05-12 00:00:00-04 132 days | 2012-09-21 00:00:00-04 4862:00:00 | 2012-01-01 00:00:00-05 4862:00:00 | 2012-07-21 15:00:00-04
Lateral is also helpful for using values from the lefthand side to
limit the number of rows returned from the righthand side. Example 7-40 uses LATERAL
to return, for each superuser who has used our site within the last 100
days, the last five logins and what they were up to. Tables used in this
example were created in “TYPE OF” and “Basic Table Creation”.
SELECTu.user_name,l.description,l.log_tsFROMsuper_usersASuCROSSJOINLATERAL(SELECTdescription,log_tsFROMlogsWHERElog_ts>CURRENT_TIMESTAMP-interval'100 days'ANDlogs.user_name=u.user_nameORDERBYlog_tsDESCLIMIT5)ASl;
Although you can achieve the same results by using window functions, lateral joins yield faster results with more succinct syntax.
You can use multiple lateral joins in your SQL and even chain them
in sequence as you would when joining more than two subqueries. You can
sometimes get away with omitting the LATERAL keyword; the
query parser is smart enough to figure out a lateral join if you have a
correlated expression. But we advise that you always include the keyword
for the sake of clarity. Also, you’ll get an error if you write your
statement assuming the use of a lateral join but run the statement on a
prelateral version PostgreSQL. Without the keyword, PostgreSQL might end
up performing a join with unintended results.
Other database products also offer lateral joins, although they
don’t abide by the ANSI moniker. In Oracle, you’d use a table pipeline
construct. In SQL Server, you’d use CROSS APPLY or OUTER
APPLY.
Introduced in version 9.4, the WITH ORDINALITY clause
is an SQL ANSI standard construct. WITH
ORDINALITY adds a sequential number column to a set-returning
function result.
Although you can’t use WITH ORDINALITY with tables
and subqueries, you can achieve the same result for those by using the
window function ROW_NUMBER.
You’ll find WITH ORDINALITY often used with functions
like generate_series, unnest, and other
functions that expand out composite types and arrays. It can be used with
any set-returning function, including ones you create
yourself.
Example 7-41 demonstrates WITH
ORDINALITY used in conjunction with the temporal variant of
the generate_series function.
SELECTdt.*FROMgenerate_series('2016-01-01'::date,'2016-12-31'::date,interval'1 month')WITHORDINALITYAsdt;
dt | ordinality -----------------------+----------- 2016-01-01 00:00:00-05 | 1 2016-02-01 00:00:00-05 | 2 2016-03-01 00:00:00-05 | 3 2016-04-01 00:00:00-04 | 4 2016-05-01 00:00:00-04 | 5 2016-06-01 00:00:00-04 | 6 2016-07-01 00:00:00-04 | 7 2016-08-01 00:00:00-04 | 8 2016-09-01 00:00:00-04 | 9 2016-10-01 00:00:00-04 | 10 2016-11-01 00:00:00-04 | 11 2016-12-01 00:00:00-05 | 12 (12 rows)
WITH ORDINALITY always adds an additional column at the
end of the result called ordinality, and WITH
ORDINALITY can only appear in the FROM clause of an SQL statement. You are
free to rename the ordinality column.
You’ll often find WITH ORDINALITY paired with the
LATERAL construct. In Example 7-42 we repeat the LATERAL in
Example 7-39, but add on a sequential
number to each set.
SELECTd.ord,i_type,d.dtFROMinterval_periodsCROSSJOINLATERALgenerate_series('2012-01-01'::date,'2012-12-31'::date,i_type)WITHORDINALITYASd(dt,ord)WHERENOT(dt='2012-01-01'ANDi_type='132 days'::interval);
ord | i_type | dt ----+------------+----------------------- 1 | 5 mons | 2012-01-01 00:00:00-05 2 | 5 mons | 2012-06-01 00:00:00-04 3 | 5 mons | 2012-11-01 00:00:00-04 2 | 132 days | 2012-05-12 00:00:00-04 3 | 132 days | 2012-09-21 00:00:00-04 1 | 4862:00:00 | 2012-01-01 00:00:00-05 2 | 4862:00:00 | 2012-07-21 15:00:00-04 (7 rows)
In Example 7-42, WITH
ORDINALITY gets applied to the result of the set-returning
function. It always gets applied before the WHERE condition.
As a result, there is a gap in numbering in the final result (the number 1
is lacking for the 132 day interval), because the number was filtered out
by our WHERE condition.
If we didn’t have the WHERE condition excluding the
2012-01-01, 132 day record, we would have 8 rows with the 4th row
being 1 | 132 days | 2012-01-01
00:00:00-04
If you’ve ever tried to create a summary report that includes both totals and subtotals, you’ll appreciate the capability to partition your data on the fly. Grouping sets let you do exactly that.
For our table of test scores, if we need to find both the overall average per student and the average per student by subject, we could write a query as shown in Example 7-43, taking advantage of grouping sets.
SELECTstudent,subject,AVG(score)::numeric(10,2)FROMtest_scoresWHEREstudentIN('leo','regina')GROUPBYGROUPINGSETS((student),(student,subject))ORDERBYstudent,subjectNULLSLAST;
student | subject | avg ---------+-----------+------- leo | algebra | 82.00 leo | calculus | 65.50 leo | chemistry | 75.50 leo | physics | 72.00 leo | NULL | 73.75 regina | algebra | 72.50 regina | calculus | 64.50 regina | chemistry | 73.50 regina | economics | 90.00 regina | physics | 84.00 regina | NULL | 75.44 (11 rows)
In a single query, Example 7-43 gives us both the average of each student across all subjects and his or her average in each subject.
We can even include a total for each subject across all students by having multiple grouping sets as shown in Example 7-44.
SELECTstudent,subject,AVG(score)::numeric(10,2)FROMtest_scoresWHEREstudentIN('leo','regina')GROUPBYGROUPINGSETS((student,subject),(student),(subject))ORDERBYstudentNULLSLAST,subjectNULLSLAST;
student | subject | avg ---------+-----------+------- leo | algebra | 82.00 leo | calculus | 65.50 leo | chemistry | 75.50 leo | physics | 72.00 leo | NULL | 73.75 regina | algebra | 72.50 regina | calculus | 64.50 regina | chemistry | 73.50 regina | economics | 90.00 regina | physics | 84.00 regina | NULL | 75.44 NULL | algebra | 77.25 NULL | calculus | 65.00 NULL | chemistry | 74.50 NULL | economics | 90.00 NULL | physics | 78.00 (16 rows)
What if we wanted to have total breakdowns for student, student plus
subject, and overall average? We could revise our query to add a universal
grouping set GROUPING SETS ((student),(student, subject),()).
This is equivalent to the shorthand ROLLUP (student, subject). See Example 7-45.
SELECTstudent,subject,AVG(score)::numeric(10,2)FROMtest_scoresWHEREstudentIN('leo','regina')GROUPBYROLLUP(student,subject)ORDERBYstudentNULLSLAST,subjectNULLSLAST;
student | subject | avg ---------+-----------+------- leo | algebra | 82.00 leo | calculus | 65.50 leo | chemistry | 75.50 leo | physics | 72.00 leo | NULL | 73.75 regina | algebra | 72.50 regina | calculus | 64.50 regina | chemistry | 73.50 regina | economics | 90.00 regina | physics | 84.00 regina | NULL | 75.44 NULL | NULL | 74.65 (12 rows)
If we reverse the order of columns in ROLLUP, we get the score for each student/subject pair, average for each subject, and overall average as shown in Example 7-46.
SELECTstudent,subject,AVG(score)::numeric(10,2)FROMtest_scoresWHEREstudentIN('leo','regina')GROUPBYROLLUP(subject,student)ORDERBYstudentNULLSLAST,subjectNULLSLAST;
student | subject | avg ---------+-----------+------- leo | algebra | 82.00 leo | calculus | 65.50 leo | chemistry | 75.50 leo | physics | 72.00 regina | algebra | 72.50 regina | calculus | 64.50 regina | chemistry | 73.50 regina | economics | 90.00 regina | physics | 84.00 NULL | algebra | 77.25 NULL | calculus | 65.00 NULL | chemistry | 74.50 NULL | economics | 90.00 NULL | physics | 78.00 NULL | NULL | 74.65 (15 rows)
If we also wanted to include subtotals for just the subject and just
the student, we’d use GROUPING SETS ( (student), (student, subject),
(subject), () ), or the shorthand CUBE (student, subject) in Example 7-47.
SELECTstudent,subject,AVG(score)::numeric(10,2)FROMtest_scoresWHEREstudentIN('leo','regina')GROUPBYCUBE(student,subject)ORDERBYstudentNULLSLAST,subjectNULLSLAST;
student | subject | avg ---------+-----------+------- leo | algebra | 82.00 leo | calculus | 65.50 leo | chemistry | 75.50 leo | physics | 72.00 leo | NULL | 73.75 regina | algebra | 72.50 regina | calculus | 64.50 regina | chemistry | 73.50 regina | economics | 90.00 regina | physics | 84.00 regina | NULL | 75.44 NULL | algebra | 77.25 NULL | calculus | 65.00 NULL | chemistry | 74.50 NULL | economics | 90.00 NULL | physics | 78.00 NULL | NULL | 74.65 (17 rows)