In PostgreSQL, as in most databases, you can string a series of SQL statements together and treat them as a unit, even customizing each run by passing arguments. Different databases ascribe different names for this unit: stored procedures, user-defined functions, and so on. PostgreSQL simply refers to them as functions.
Aside from marshalling SQL statements, functions often add the capability to control the execution of the SQL using PLs. PostgreSQL offers a rich choice of languages for writing functions. SQL, C, PL/pgSQL, PL/Perl, and PL/Python are often packaged with installers. You’ll also find PL/V8, which allows you to write procedural functions in JavaScript. PL/V8 is a favorite for web developers and a darling companion to the built-in JSON and JSONB data types covered in “JSON”.
You can also install additional languages such as PL/R, PL/Java, PL/sh, PL/TSQL, and even experimental ones geared for high-end data processing and artificial intelligence, such as PL/Scheme or PL/OpenCL. You can find a listing of available languages in Procedural Languages.
PostgreSQL functions fall into the categories of basic function, aggregate function, window function, and trigger function. We’ll start by detailing the basic anatomy of a function and then go into detail about how the various kinds of specialized function types extends from this.
Regardless of which languages you choose for writing functions, all functions share a similar structure, as shown in Example 8-1.
CREATE OR REPLACE FUNCTIONfunc_name
(arg1
arg1_datatype
DEFAULTarg1_default
) RETURNSsome type | set of some type | TABLE (..)
AS $$BODY of function
$$ LANGUAGElanguage_of_function
Arguments can have default values, which allow the caller of the function to omit them. Optional arguments must be positioned after nonoptional arguments in the function definition.
Argument names are optional but are useful because they let you refer to an argument by name inside the function body. For example, think of a function that is defined to take three input arguments (two being optional):
big_elephant
(
ear_size
numeric
,
skin_color
text
DEFAULT
'blue'
,
name
text
DEFAULT
'Dumbo'
)
You can refer to the arguments by
name (ear_size
, skin_color
, etc.) inside the
body of the function. If they are not named, you need to refer to the
arguments inside the function by their order in the argument list: $1,
$2, and $3.
If you name the arguments, you also have the option of using named notation when calling the function:
big_elephant
(
name
=>
'Wooly'
,
ear_size
=>
1
.
2
)
You can always use the positional notation big_elephant(1.2, 'blue',
'Wooly')
even if function arguments are named. Named notation is
useful if you have a function that takes several arguments and many of
the arguments are optional. By using named notation, you can override a
default value and keep other defaults regardless of the order in which
the arguments are defined. You also don’t need to state the arguments in
the order they appear in the function definition. In the
big_elephant
example we were able to accept the default
skin color of blue and override the default name, even though
name
appears last in the argument list. If we were to call
the function simply by the order of arguments, we couldn’t skip over
skin_color
if we wanted to override the name
argument.
In PostgreSQL 9.5 and above, the named notation convention is
name => 'Wooly'
. In 9.4 and below you would use
name := 'Wooly'
. For backward compatibility, the old
syntax of arg1_name := arg1_value
is still supported in
9.5 and above, but may be removed in the future.
Functional definitions often include additional qualifiers to optimize execution and to enforce security:
LANGUAGE
The language must be one installed in your database. Obtain a
list with the SELECT lanname FROM pg_language;
query.
VOLATILITY
This setting clues the query planner as to whether outputs can be cached and used across multiple calls. Your choices are:
IMMUTABLE
The function will always return the same output for the same input. Think of arithmetic functions. Only immutable functions can be used in the definition of indexes.
STABLE
The function will return the same value for the same inputs within the same query.
VOLATILE
The function can return different values with each call, even with the same inputs. Think of functions that change data or depend on environment settings like system time. This is the default.
Keep in mind that the volatility setting is merely a hint to
the planner. The default value of VOLATILE
ensures that the planner will always recompute the result. If you
use one of the other values, the planner can still choose to forgo
caching should it decide that recomputing is more
cost-effective.
STRICT
A function marked with this qualifier will always return NULL if any inputs are NULL. The planner skips evaluating the function altogether with any NULL inputs. When writing SQL functions, be cautious
when marking a function as STRICT, because it could prevent the planner from taking advantage of indexes. Read our article STRICT on SQL Functions for more details.
COST
This is a relative measure of computational intensiveness. SQL and
PL/pgSQL functions default to 100
and C functions to
1
. This affects the order that the planner will
follow when evaluating the function in a WHERE
clause, and the likelihood of caching. The higher you set the
cost, the more computation the planner will assume the function
needs.
ROWS
Applies only to functions returning sets of records. The value provides an estimate of how many rows will be returned. The planner will take this value into consideration when coming up with the best strategy.
SECURITY DEFINER
This causes execution to take place within the security context of the owner of the function. If omitted, the function executes under the context of the user calling the function. This qualifier is useful for giving people rights to update a table via a function when they do not have direct update privileges.
PARALLEL
New in PostgreSQL 9.6. This qualifier allows the planner to run in parallel mode. By default, a function is marked as PARALLEL UNSAFE, which prevents any queries containing the function from being distributed into separate work processes. Refer to Parallel Safety. Your choices are:
SAFE
This allows parallel use, and is generally a safe choice for IMMUTABLE functions or functions that don’t update data or change transaction state or other variables.
UNSAFE
Functions that change nontemp table data, access sequences, or state should be marked as UNSAFE. They prevent the query from being run in parallel mode and therefore risking the corruption of the tables or other system state.
RESTRICTED
You may want to use this value for functions that use temporary tables, prepared statements, or client connection state. This value does not prevent a query from running in parallel mode, but processing of these functions can happen only on the lead query.
In many of the examples in this chapter, we’ll be including PARALLEL mode options. If you are running lower than version 9.6, leave out the parallel clauses.
No worthy database should lack triggers, which automatically detect and handle changes in data. PostgreSQL allows you to attach triggers to tables, views, and even DDL events like creation of a new table.
Triggers can actuate at both the statement level and the row
level. Statement triggers run once per SQL statement, whereas row
triggers run for each row affected by the SQL. For example, if you
execute an UPDATE
statement that affects 1,500 rows, a
statement-level update trigger will fire only once, whereas the
row-level trigger can fire up to 1,500 times.
You can further refine the timing of the trigger by making a
distinction between BEFORE
, AFTER
, and
INSTEAD OF
triggers. A BEFORE
trigger
fires prior to the execution of the statement, giving you
a chance to cancel or back up data before the change. An
AFTER
trigger fires after statement execution, giving you a chance to
retrieve the new data values. AFTER
triggers are often used
for logging or replication purposes. INSTEAD OF
triggers
execute in lieu of the statement. You can attach
BEFORE
and AFTER
triggers only to tables and
events, and INSTEAD OF
triggers only to views.
Trigger functions that change values of a row should be called
only in the BEFORE
event, because in the
AFTER
event, all updates to the
NEW
record will be ignored.
You can also adorn a trigger with a WHEN
condition
to control which rows being updated will fire the trigger,
or an UPDATE OF
clause to have the
trigger fire only if certain columns are updated. To gain a more nuanced
understanding of the interplay between triggers and the underlying
statement, see the official documentation: Overview of Trigger Behavior. We
also demonstrated a view-based trigger in Example 7-5.columns_list
PostgreSQL offers specialized functions to handle triggers. These are called trigger functions and behave like any other function and have the same basic structure. Where they differ is in the input parameter and the output type. A trigger function never takes an argument, because internally the function already has access to the data and can modify it.
A trigger function always outputs a data type called a
trigger
. Because PostgreSQL trigger functions are no
different from any other function, you can reuse the same trigger
function across different triggers. This is usually not the case for
other databases, where each trigger is wedded to its own handler
code.
In PostgreSQL, each trigger must have exactly one associated triggering function to handle the firing. To apply multiple triggering functions, you must create multiple triggers against the same event. The alphabetical order of the trigger name determines the order of firing. Each trigger will have access to the revised data from the previous trigger. If any trigger issues a rollback, all data amended by earlier triggers fired by the same event will roll back.
You can use almost any language to create trigger functions, with SQL being the notable exception. PL/pgSQL is by far the most popular language. We demonstrate writing trigger functions using PL/pgSQL in “Writing Trigger Functions in PL/pgSQL”.
Most other databases limit you to ANSI SQL built-in aggregate functions such as MIN
,
MAX
, AVG
, SUM
, and
COUNT
. In PostgreSQL, you don’t have this limitation. If
you need a more esoteric aggregate function, you’re welcome to write
your own. Because you can use any aggregate function in PostgreSQL as a
window function (see “Window Functions”), you
get twice the use out of any aggregate function that you author.
You can write aggregates in almost any language, SQL included. An aggregate is generally comprised of one or more functions. It must have at least a state transition function to perform the computation; usually this function runs repeatedly to create one output row from two input rows. You can also specify optional functions to manage initial and final states. You can also use a different language for each of the subfunctions. We have various examples of building aggregates using PL/pgSQL, PL/Python, and SQL in the article PostgreSQL Aggregates.
Regardless of which language you use to code the functions, the
glue that brings them all together is the CREATE AGGREGATE
command:
CREATE AGGREGATEmy_agg
(input data type
) ( SFUNC=state function name
, STYPE=state type
, FINALFUNC=final function name
, INITCOND=initial state value
, SORTOP=sort_operator
);
The final function is optional, but if specified, it must take as input the result of the state function. The state function always takes a data type as the input along with the result of the last call to the state function. Sometimes this result is what you want as the result of the aggregate function, and sometimes you want to run a final function to massage the result. The initial condition is also optional. When the initial condition value is present, the command uses it to initialize the state value.
The optional sort operator can serve as the associated sort operator for a MIN
- or MAX
-like aggregate. It is used to take
advantage of indexes. It is just an operator name such as
>
and <
. It should be used only when the
two following statements are equivalent:
SELECTagg
(col
) FROMsometable
;
SELECTcol
FROMsometable
ORDER BYcol
USINGsortop
LIMIT 1;
The PostgreSQL 9.4 CREATE AGGREGATE
structure was
expanded to include support for creating moving aggregates, which are
useful with window functions that move the window. See PostgreSQL 9.4: CREATE AGGREGATE
for details.
In PostgreSQL 9.6, aggregates were expanded to include support
for parallelization. This was accomplished through the
parallel
property, which can take the values of
safe
, unsafe
, or restricted
. If
the parallel
property is left out, the aggregate is
marked as parallel unsafe. In addition to the parallel
setting, combinefunc
, serialfunc
, and
deserialfunc
properties were added to support parallel
aggregates. Refer to SQL
Create Aggregate for details.
Aggregates need not depend on a single column. If you need more than one column for your aggregate (an example is a built-in covariance function), see How to Create Multi-Column Aggregates for guidance.
SQL language functions are easy to write. You don’t have fancy control flow commands to worry about, and you probably have a good grasp of SQL to begin with. When it comes to writing aggregates, you can get pretty far with the SQL language alone. We demonstrate aggregates in “Writing SQL Aggregate Functions”.
Function languages can be either trusted or untrusted. Many—but not all—languages offer both a trusted and untrusted version. The term trusted connotes that the language can do no harm to the underlying operating system by denying it access to the key OS operations. In short:
A trusted language lacks access to the server’s filesystem beyond the data cluster. It therefore cannot execute OS commands. Users of any level can create functions in a trusted language. Languages such as SQL, PL/pgSQL, PL/Perl, and PL/V8 are trusted.
An untrusted language can interact with the OS. It can execute OS functions and call web services. Only superusers have the privilege of authoring functions in an untrusted language. However, a superuser can grant permission to another role to run an untrusted function. By convention, languages that are untrusted end in the letter U (PL/PerlU, PL/PythonU, etc.). But ending in U is not a requirement. For example, PL/R is such an exception.
Although SQL is mostly a language for issuing queries, it can also be used to write functions. In PostgreSQL, using an existing piece of SQL for the function is fast and easy: take your existing SQL statements, add a functional header and footer, and you’re done. But the ease comes at a price. You can’t use control features like conditional branches, looping, or defining variables. More restrictively, you can’t run dynamic SQL statements that you assemble on the fly using arguments passed into the function.
On the positive side, the query planner can peek into an SQL function and optimize execution—a process called inlining. Query planners treat other languages as black boxes. Only SQL functions can be inlined, which lets them take advantage of indexes and collapse repetitive computations.
Example 8-2 shows a primitive SQL function that inserts a row into a table and returns a scalar value.
CREATE
OR
REPLACE
FUNCTION
write_to_log
(
param_user_name
varchar
,
param_description
text
)
RETURNS
integer
AS
$$
INSERT
INTO
logs
(
user_name
,
description
)
VALUES
(
$
1
,
$
2
)
RETURNING
log_id
;
$$
LANGUAGE
'sql'
VOLATILE
;
To call the function, execute something like:
SELECT
write_to_log
(
'alex'
,
'Logged in at 11:59 AM.'
)
As
new_id
;
Similarly, you can update data with an SQL function and return a scalar or void, as shown in Example 8-3.
CREATE
OR
REPLACE
FUNCTION
update_logs
(
log_id
int
,
param_user_name
varchar
,
param_description
text
)
RETURNS
void
AS
$$
UPDATE
logs
SET
user_name
=
$
2
,
description
=
$
3
,
log_ts
=
CURRENT_TIMESTAMP
WHERE
log_id
=
$
1
;
$$
LANGUAGE
'sql'
VOLATILE
;
To execute:
SELECT
update_logs
(
12
,
'alex'
,
'Fell back asleep.'
);
Functions, in almost all languages, can return sets. SQL
functions are no exception. There are three common
approaches to doing this: the ANSI SQL standard RETURNS TABLE
syntax,
OUT
parameters, and composite data types. The
RETURNS TABLE
approach is closest to what you’ll find
in other database products. In Example 8-4, we demonstrate how to write the
same function three ways.
Using RETURNS TABLE
:
CREATE
OR
REPLACE
FUNCTION
select_logs_rt
(
param_user_name
varchar
)
RETURNS
TABLE
(
log_id
int
,
user_name
varchar
(
50
),
description
text
,
log_ts
timestamptz
)
AS
$$
SELECT
log_id
,
user_name
,
description
,
log_ts
FROM
logs
WHERE
user_name
=
$
1
;
$$
LANGUAGE
'sql'
STABLE
PARALLEL
SAFE
;
Using OUT
parameters:
CREATE
OR
REPLACE
FUNCTION
select_logs_out
(
param_user_name
varchar
,
OUT
log_id
int
,
OUT
user_name
varchar
,
OUT
description
text
,
OUT
log_ts
timestamptz
)
RETURNS
SETOF
record
AS
$$
SELECT
*
FROM
logs
WHERE
user_name
=
$
1
;
$$
LANGUAGE
'sql'
STABLE
PARALLEL
SAFE
;
CREATE
OR
REPLACE
FUNCTION
select_logs_so
(
param_user_name
varchar
)
RETURNS
SETOF
logs
AS
$$
SELECT
*
FROM
logs
WHERE
user_name
=
$
1
;
$$
LANGUAGE
'sql'
STABLE
PARALLEL
SAFE
;
Call all these functions using:
SELECT * FROM select_logs_xxx
('alex');
Yes! In PostgreSQL you are able to author your own aggregate functions to expand
beyond the usual aggregates MIN, MA, COUNT, AVG, etc. We demonstrate by
creating an aggregate function to compute the geometric mean. A geometric
mean is the nth root of a product of n positive numbers
((x1*x2*x3...xn)(1/n)
). It has
various uses in finance, economics, and statistics. A geometric mean
substitutes for the more common arithmetic mean when the numbers range
across vastly different scales. A more suitable computational formula
uses logarithms to transform a multiplicative process to an additive one
(EXP(SUM(LN(x))/n)
). We’ll be using this method in our
example.
To build our geometric mean aggregate, we need two subfunctions: a state transition function to sum the logs (see Example 8-5) and a final function to exponentiate the logs. We’ll also specify an initial condition of zero when we assemble everything together.
CREATE
OR
REPLACE
FUNCTION
geom_mean_state
(
prev
numeric
[
2
],
next
numeric
)
RETURNS
numeric
[
2
]
AS
$$
SELECT
CASE
WHEN
$
2
IS
NULL
OR
$
2
=
0
THEN
$
1
ELSE
ARRAY
[
COALESCE
(
$
1
[
1
],
0
)
+
ln
(
$
2
),
$
1
[
2
]
+
1
]
END
;
$$
LANGUAGE
sql
IMMUTABLE
PARALLEL
SAFE
;
Our state transition function takes two inputs: the previous state
passed in as an array with two elements, and the next
added
in the summation. If the next
argument evaluates to
NULL
or zero, the state function
returns the prior state. Otherwise, it returns a new array in which the
first element is the sum of the logs and the second element is the
running count.
We also need a final function, shown in Example 8-6, that divides the sum from the state transition by the count.
CREATE
OR
REPLACE
FUNCTION
geom_mean_final
(
numeric
[
2
])
RETURNS
numeric
AS
$$
SELECT
CASE
WHEN
$
1
[
2
]
>
0
THEN
exp
(
$
1
[
1
]
/
$
1
[
2
])
ELSE
0
END
;
$$
LANGUAGE
sql
IMMUTABLE
PARALLEL
SAFE
;
Now we stitch all the subfunctions together in our aggregate definition, as shown in Example 8-7. (Note that our aggregate has an initial condition that is the same data type as the one returned by our state function.)
CREATE
AGGREGATE
geom_mean
(
numeric
)
(
SFUNC
=
geom_mean_state
,
STYPE
=
numeric
[],
FINALFUNC
=
geom_mean_final
,
PARALLEL
=
safe
,
INITCOND
=
'{0,0}'
);
Let’s take our new function for a test drive. In Example 8-8, we compute a heuristic rating for racial diversity and list the top five most racially diverse counties in Massachusetts.
SELECT
left
(
tract_id
,
5
)
As
county
,
geom_mean
(
val
)
As
div_county
FROM
census
.
vw_facts
WHERE
category
=
'Population'
AND
short_name
!=
'white_alone'
GROUP
BY
county
ORDER
BY
div_county
DESC
LIMIT
5
;
county | div_county -------+--------------------- 25025 | 85.1549046212833364 25013 | 79.5972921427888918 25017 | 74.7697097102419689 25021 | 73.8824162064128504 25027 | 73.5955049035237656
Let’s go into overdrive and engage our new function as a window aggregate, as shown in Example 8-9.
WITH
X
AS
(
SELECT
tract_id
,
left
(
tract_id
,
5
)
As
county
,
geom_mean
(
val
)
OVER
(
PARTITION
BY
tract_id
)
As
div_tract
,
ROW_NUMBER
()
OVER
(
PARTITION
BY
tract_id
)
As
rn
,
geom_mean
(
val
)
OVER
(
PARTITION
BY
left
(
tract_id
,
5
))
As
div_county
FROM
census
.
vw_facts
WHERE
category
=
'Population'
AND
short_name
!=
'white_alone'
)
SELECT
tract_id
,
county
,
div_tract
,
div_county
FROM
X
WHERE
rn
=
1
ORDER
BY
div_tract
DESC
,
div_county
DESC
LIMIT
5
;
tract_id | county | div_tract | div_county ------------+--------+----------------------+--------------------- 25025160101 | 25025 | 302.6815688785928786 | 85.1549046212833364 25027731900 | 25027 | 265.6136902148147729 | 73.5955049035237656 25021416200 | 25021 | 261.9351057509603296 | 73.8824162064128504 25025130406 | 25025 | 260.3241378371627137 | 85.1549046212833364 25017342500 | 25017 | 257.4671462282508267 | 74.7697097102419689
When your functional needs outgrow SQL, turning to PL/pgSQL is a common practice.
PL/pgSQL surpasses SQL in that you can declare local variables using DECLARE
and you can
incorporate control flow.
To demonstrate syntax differences from SQL, in Example 8-10 we rewrite Example 8-4 as a PL/pgSQL function.
CREATE
FUNCTION
select_logs_rt
(
param_user_name
varchar
)
RETURNS
TABLE
(
log_id
int
,
user_name
varchar
(
50
),
description
text
,
log_ts
timestamptz
)
AS
$$
BEGIN
RETURN
QUERY
SELECT
log_id
,
user_name
,
description
,
log_ts
FROM
logs
WHERE
user_name
=
param_user_name
;
END
;
$$
LANGUAGE
'plpgsql'
STABLE
;
Because you can’t write trigger functions in SQL, PL/pgSQL is your next best bet. In this section, we’ll demonstrate how to write a basic trigger function in PL/pgSQL.
We proceed in two steps. First, we write the trigger function. Second, we explicitly attach the trigger function to the appropriate trigger. The second step is a powerful feature of PostgreSQL that decouples the function handling the trigger from the trigger itself. You can attach the same trigger function to multiple triggers, adding another level of reuse not found in other databases. Because each trigger function can stand on its own, you have your choice of languages, and mixing is completely OK. For a single triggering event, you can set up multiple triggers, each with functions written in a different language. For example, you can have a trigger email a client written in PL/PythonU or PL/PerlU and another trigger write to a log table with PL/pgSQL.
A basic trigger function and accompanying trigger is demonstrated in Example 8-11.
CREATE
OR
REPLACE
FUNCTION
trig_time_stamper
(
)
RETURNS
trigger
AS
$
$
BEGIN
NEW
.
upd_ts
:
=
CURRENT_TIMESTAMP
;
RETURN
NEW
;
END
;
$
$
LANGUAGE
plpgsql
VOLATILE
;
CREATE
TRIGGER
trig_1
BEFORE
INSERT
OR
UPDATE
OF
session_state
,
session_id
ON
web_sessions
FOR
EACH
ROW
EXECUTE
PROCEDURE
trig_time_stamper
(
)
;
Defines the trigger function. This function can be used on any
table that has a upd_ts
column. It updates the
upd_ts
field to the current time before returning the
changed record.
This is a new feature introduced in version 9.0 that allows us
to limit the firing of the trigger so it happens only if specified
columns have changed. Prior to version 9.0, the trigger would fire
on any update and you would need to perform a column-wise comparison
using OLD.some_column
and NEW.some_column
to determine what changed. (This feature is not supported for INSTEAD OF
triggers.)
Python is a slick language with a vast number of available libraries. PostgreSQL is the only database we know of that lets you compose functions using Python. PostgreSQL supports both Python 2 and Python 3.
Although you can install both plpython2u
and
plpython3u
in the same database, you can’t use both
during the same session. This means that you can’t write a query that
calls both plpython2u
and
plpython3u
functions. You may encounter a third
extension called plpythonu
; this is an alias for
plpython2u
and is left around for backward
compatibility.
In order to use PL/Python, you first need to install Python on your server. For Windows and Mac, Python installers are available. For Linux/Unix systems, Python binaries are usually available via the various distributions. For details, see PL/Python. After installing Python, install the PostgreSQL Python extension:
CREATE
EXTENSION
plpython2u
;
CREATE
EXTENSION
plpython3u
;
Make absolutely sure that you have Python properly running on your server before attempting to install the extension, or else you will run into errors that could be difficult to troubleshoot.
The extensions are compiled against a specific minor version of
Python. You should install the minor version of Python that matches what
your plpythonu
extensions were compiled against. For
example, if your plpython2u
was compiled against Python
2.7, you should install Python 2.7.
PostgreSQL automatically converts PostgreSQL data types to Python data types and back. PL/Python is capable of returning arrays and composite types. You can use PL/Python to write triggers and create aggregate functions. We’ve demonstrated some of these in the Postgres OnLine Journal, in PL/Python Examples.
Python allows you to perform feats that aren’t possible in PL/pgSQL. In Example 8-12, we demonstrate how to write a PL/Python function to do a text search of the online PostgreSQL document site.
CREATE
OR
REPLACE
FUNCTION
postgresql_help_search
(
param_search
text
)
RETURNS
text
AS
$
$
import
urllib
,
re
response
=
urllib
.
urlopen
(
'http://www.postgresql.org/search/?u=%2Fdocs%2Fcurrent%2F&q='
+
param_search
)
raw_html
=
response
.
read
(
)
result
=
raw_html
[
raw_html
.
find
(
"<!-- docbot goes here -->"
)
:
raw_html
.
find
(
"<!-- pgContentWrap -->"
)
-
1
]
result
=
re
.
sub
(
'<[^<]+?>'
,
''
,
result
)
.
strip
(
)
return
result
$
$
LANGUAGE
plpython2u
SECURITY
DEFINER
STABLE
;
Imports the libraries we’ll be using.
Performs a search after concatenating the search term.
Reads the response and saves the retrieved HTML to a variable
called raw_html
.
Saves the part of the raw_html
that starts with
<!-- docbot goes here -->
and ends just before the beginning of <!-- pgContentWrap -->
into a new
variable called result
.
Removes leading and trailing HTML symbols and whitespace.
Returns result
.
Calling Python functions is no different from calling functions written in other languages. In Example 8-13, we use the function we created in Example 8-12 to output the result with three search terms.
SELECT
search_term
,
left
(
postgresql_help_search
(
search_term
),
125
)
AS
result
FROM
(
VALUES
(
'regexp_match'
),(
'pg_trgm'
),(
'tsvector'
))
As
x
(
search_term
);
Recall that PL/Python is an untrusted language, without a trusted counterpart. This
means only superusers can write functions using PL/Python, and the
function can interact with the filesystem of the OS. Example 8-14 takes advantage of the untrusted
nature of PL/Python to retrieve file listings from a directory. Keep in
mind that from the perspective of the OS, a PL/Python function runs
under the context of the postgres
user account created during installation, so you need to be
sure that this account has adequate access to the relevant
directories.
CREATE
OR
REPLACE
FUNCTION
list_incoming_files
()
RETURNS
SETOF
text
AS
$$
import
os
return
os
.
listdir
(
'/incoming'
)
$$
LANGUAGE
'plpython2u'
VOLATILE
SECURITY
DEFINER
;
Run the function in Example 8-14 with the following query:
SELECT
filename
FROM
list_incoming_files
()
As
filename
WHERE
filename
ILIKE
'%.csv'
PL/V8 (aka PL/JavaScript) is a trusted language built atop the Google V8 engine. It allows you to write functions in JavaScript and interface with the JSON data type. It is not part of the core PostgreSQL offering, so you won’t find it in all popular PostgreSQL distributions. You can always compile it from source. For Windows, we’ve built PL/V8 extension windows binaries. You can download them from our Postgres OnLine Journal site for PostgreSQL 9.6 (both 32-bit and 64-bit).
When you add PL/V8 binaries to your PostgreSQL setup, you get not one, but three JavaScript-related languages:
This is the basic language that serves as the basis for the other two JavaScript languages.
This language lets you write functions in CoffeeScript. CoffeeScript is JavaScript with a more succinct syntax structure that resembles Python. Like Python, it relies on indentation to impart context but does away with annoying curly braces.
PL/LiveScript allows you to write functions in LiveScript, a fork of CoffeeScript. LiveScript is similar to CoffeeScript but with some added syntactic condiments. This article promotes LiveScript as a superior alternative to CoffeeScript: 10 Reasons to Switch from CoffeeScript to LiveScript. If anything, LiveScript does have more Python, F#, and Haskell features than CoffeeScript. If you’re looking for a language that has a lighter footprint than PL/Python and is trusted, you might want to give LiveScript a try.
PL/CoffeeScript and PL/LiveScript are compiled using the same PL/V8 library. Their functionality is therefore identical to that of PL/V8. In fact, you can easily convert back to PL/V8 if they don’t suit your taste buds. All three languages are trusted. This means they can’t access OS filesystems, but they can be used by nonsuperusers to create functions.
Example 8-15 has the commands to install the three languages using extensions. For each database where you’d like to install the support, you must run these lines. You need not install all three if you choose not to.
CREATE
EXTENSION
plv8
;
CREATE
EXTENSION
plcoffee
;
CREATE
EXTENSION
plls
;
The PL/V8 family of languages has many key qualities that make them stand apart from PL/pgSQL, some of which you’ll find only in other high-end procedural languages like PL/R:
Generally faster numeric processing than SQL and PL/pgSQL.
The ability to create window functions. You can’t do this using SQL, PL/pgSQL, or PL/Python. (You can in PL/R and C, though.)
The ability to create triggers and aggregate functions.
Support for prepared statements, subtransactions, inner functions, classes, and try-catch error handling.
The ability to dynamically generate executable code using an
eval
function.
JSON support, allowing for looping over and filtering of JSON objects.
Access to functions from DO
commands.
Compatibility with Node.js. Node.js users, and other users who want to use Javascript for
building network applications, will appreciate that PL/V8 and Node.js
are built on the same Google V8 engine and that many of the libraries
available for Node.js will work largely unchanged when used in PL/V8.
There is an extension called plv8x
that makes using Node.js
modules and modules you build easier to reuse in PL/V8.
You can find several examples on our site of PL/V8 use. Some involved copying fairly large bodies of JavaScript code that we pulled from the web and wrapped in a PL/V8 wrapper, as detailed in Using PLV8 to Build JSON Selectors. The PL/V8 family mates perfectly with web applications because much of the same client-side JavaScript logic can be reused. More important, it makes a great all-purpose language for developing numeric functions, updating data, and so on.
One of the great benefits of PL/V8 is that you can use any JavaScript function in your PL/V8 functions with minimal change. For example, you’ll find many JavaScript examples on the web to validate email addresses. We arbitrarily picked one and made a PL/V8 out of it in Example 8-16.
CREATE
OR
REPLACE
FUNCTION
validate_email
(
text
)
returns
boolean
as
$$
var
re
=
/
\
S
+@
\
S
+
\
.
\
S
+/
;
return
re
.
test
(
);
$$
LANGUAGE
plv8
IMMUTABLE
STRICT
PARALLEL
SAFE
;
Our code uses a JavaScript regex object to check the email address. To use the function, see Example 8-17.
SELECT
,
validate_email
(
)
AS
is_valid
FROM
(
VALUES
(
'alexgomezq@gmail.com'
)
,(
'alexgomezqgmail.com'
),(
'alexgomezq@gmailcom'
))
AS
x
(
);
which outputs:
email | is_valid ----------------------+---------- alexgomezq@gmail.com | t alexgomezqgmail.com | f alexgomezq@gmailcom | f
Although you can code the same function using PL/pgSQL and PostgreSQL’s own regular expression support, we guiltlessly poached someone else’s time-tested code and wasted no time of our own. If you’re a web developer and find yourself having to validate data on both the client side and the database side, using PL/V8 could halve your development efforts, pretty much by cutting and pasting.
You can store a whole set of these validation functions in a
modules table. You can then inject results onto the page but also use
the validation functions directly in the database, as described in
Andrew Dunstan’s “Loading Useful Modules in PLV8”.
This is possible because the eval
function is part of the PL/V8
JavaScript language. The built-in function allows you to compile
functions at startup for later use.
We fed Example 8-17
through an online converter
and added a return
statement to generate its CoffeeScript
counterpart in Example 8-18.
CREATE
OR
REPLACE
FUNCTION
validate_email
(
text
)
returns
boolean
as
$$
re
=
/
\
S
+@
\
S
+
\
.
\
S
+/
return
re
.
test
$$
LANGUAGE
plcoffee
IMMUTABLE
STRICT
PARALLEL
SAFE
;
CoffeeScript doesn’t look all that different from JavaScript,
except for the lack of parentheses, curly braces, and semicolons. The
LiveScript version looks exactly like the CoffeeScript except with a
LANGUAGE plls
specifier.
In Examples 8-19 and 8-20, we reformulate the state transition and final function of the geometric mean aggregate function (see “Writing SQL Aggregate Functions”) using PL/V8.
CREATE
OR
REPLACE
FUNCTION
geom_mean_state
(
prev
numeric
[
2
],
next
numeric
)
RETURNS
numeric
[
2
]
AS
$$
return
(
next
==
null
||
next
==
0
)
?
prev
:
[(
prev
[
0
]
==
null
)
?
0
:
prev
[
0
]
+
Math
.
log
(
next
),
prev
[
1
]
+
1
];
$$
LANGUAGE
plv8
IMMUTABLE
PARALLEL
SAFE
;
CREATE
OR
REPLACE
FUNCTION
geom_mean_final
(
in_num
numeric
[
2
])
RETURNS
numeric
AS
$$
return
in_num
[
1
]
>
0
?
Math
.
exp
(
in_num
[
0
]
/
in_num
[
1
])
:
0
;
$$
LANGUAGE
plv8
IMMUTABLE
PARALLEL
SAFE
;
The final CREATE AGGREGATE
puts all the pieces together and looks more or less the same in all
languages. Our PL/V8 variant is shown in Example 8-21.
CREATE
AGGREGATE
geom_mean
(
numeric
)
(
SFUNC
=
geom_mean_state
,
STYPE
=
numeric
[],
FINALFUNC
=
geom_mean_final
,
PARALLEL
=
safe
,
INITCOND
=
'{0,0}'
);
When you run Example 8-9, calling our new PL/V8 function, you get the same answers as the version written in SQL, but the PL/V8 version is two to three times faster. Generally, for mathematical operations, you’ll find that PL/V8 functions are 10 to 20 times faster than their SQL counterparts.
PostgreSQL has many built-in window functions, as discussed in “Window Functions”. Any aggregate function, including the ones you create, can be used as window aggregate functions. These two points alone make PostgreSQL stand out from most other relational databases. Even more impressive is that PostgreSQL allows you to create your own window functions.
The only caveat is that most PLs you can install in PostgreSQL will not allow you to create window functions. If you need to write a window function in PostgreSQL, you cannot do it with built-in PL/PGSQL or SQL languages. Nor can you do it in other popular PLs like PL/Python or PL/Perl. You can do it in C, but that requires compilation. You can also to some extent do it in a language like PL/R. PL/V8, on the other hand, fully supports writing window functions and is fairly efficient (in many cases just as fast as a window function written in C), but unlike C, doesn’t require compilation of your function code.
What makes writing window functions in PL/V8 possible is that
PL/V8 comes packaged with a plv8.window_object()
helper
function that returns a handle to the current window object. This
object includes methods for inspecting and accessing elements within the
window.
In Example 8-22, we’ll create a
window function that, for each row, returns true if it’s the beginning
of a run, and false otherwise. Runs, or streaks, are sequences of
identical outcome. The function lets the caller decide how many rows
constitute a “run” through the ofs
argument.
CREATE
FUNCTION
run_begin
(
arg
anyelement
,
ofs
int
)
RETURNS
boolean
AS
$$
var
winobj
=
plv8
.
get_window_object
();
var
result
=
true
;
/** Get current value **/
var
cval
=
winobj
.
get_func_arg_in_partition
(
0
,
0
,
winobj
.
SEEK_CURRENT
,
false
);
for
(
i
=
1
;
i
<
ofs
;
i
++
)
{
/** get next value **/
nval
=
winobj
.
get_func_arg_in_partition
(
0
,
i
,
winobj
.
SEEK_CURRENT
,
false
);
result
=
(
cval
==
nval
)
?
true
:
false
;
if
(
!
result
)
{
break
;
}
/** next current value is our last value **/
cval
=
nval
;
}
return
result
;
$$
LANGUAGE
plv8
WINDOW
;
To declare a function as a window function, it must have a
WINDOW
designator in the function envelope as in the last
line of Example 8-22.
The body of the function must inspect elements of the window set
of data and use them. PL/V8 has a handle to this window and helper
methods outlined in the PL/V8 documentation PL/V8
Window function API. Our function needs to look forward in the
window for values from the current position in the window through
ofs
values. If these values are all the same, it will
return true, otherwise false. The function method that PL/V8 provides
for scanning values of a window is
get_func_arg_in_partition
. We use that to look forward and
exit with false, as soon as the pattern of equality fails or we’ve
reached the last value.
We’ll use this function to find the winner in a simple game of coin toss. Each player gets four tosses, and the winner must have a run of three heads, as shown in Example 8-23.
SELECT
id
,
player
,
toss
,
run_begin
(
toss
,
3
)
OVER
(
PARTITION
BY
player
ORDER
BY
id
)
AS
rb
FROM
coin_tosses
ORDER
BY
player
,
id
;
id | player | toss | rb ----+--------+------+---- 4 | alex | H | t 8 | alex | H | t 12 | alex | H | f 16 | alex | H | f 2 | leo | T | f 6 | leo | H | f 10 | leo | H | f 14 | leo | T | f 1 | regina | H | f 5 | regina | H | f 9 | regina | T | f 13 | regina | T | f 3 | sonia | T | t 7 | sonia | T | t 11 | sonia | T | f 15 | sonia | T | f (16 rows)
For other examples of writing PL/V8 functions in PL/V8, check out the PL/V8 window regression script, which demonstrates how to create many of the built-in PostgreSQL window functions (lead, lag, row_number, cume_dist, and first_value, last_value) in PL/V8.