Chapter 8. Writing Functions

In PostgreSQL, as in most databases, you can string a series of SQL statements together and treat them as a unit, even customizing each run by passing arguments. Different databases ascribe different names for this unit: stored procedures, user-defined functions, and so on. PostgreSQL simply refers to them as functions.

Aside from marshalling SQL statements, functions often add the capability to control the execution of the SQL using PLs. PostgreSQL offers a rich choice of languages for writing functions. SQL, C, PL/pgSQL, PL/Perl, and PL/Python are often packaged with installers. You’ll also find PL/V8, which allows you to write procedural functions in JavaScript. PL/V8 is a favorite for web developers and a darling companion to the built-in JSON and JSONB data types covered in “JSON”.

You can also install additional languages such as PL/R, PL/Java, PL/sh, PL/TSQL, and even experimental ones geared for high-end data processing and artificial intelligence, such as PL/Scheme or PL/OpenCL. You can find a listing of available languages in Procedural Languages.

Anatomy of PostgreSQL Functions

PostgreSQL functions fall into the categories of basic function, aggregate function, window function, and trigger function. We’ll start by detailing the basic anatomy of a function and then go into detail about how the various kinds of specialized function types extends from this.

Function Basics

Regardless of which languages you choose for writing functions, all functions share a similar structure, as shown in Example 8-1.

Example 8-1. Basic function structure

CREATE OR REPLACE FUNCTION func_name(arg1 arg1_datatype DEFAULT arg1_default)
RETURNS some type | set of some type | TABLE (..) AS
$$
BODY of function
$$
LANGUAGE language_of_function

Arguments can have default values, which allow the caller of the function to omit them. Optional arguments must be positioned after nonoptional arguments in the function definition.

Argument names are optional but are useful because they let you refer to an argument by name inside the function body. For example, think of a function that is defined to take three input arguments (two being optional):

big_elephant(ear_size numeric, skin_color text DEFAULT 'blue', 
name text DEFAULT 'Dumbo')

You can refer to the arguments by name (ear_size, skin_color, etc.) inside the body of the function. If they are not named, you need to refer to the arguments inside the function by their order in the argument list: $1, $2, and $3.

If you name the arguments, you also have the option of using named notation when calling the function:

big_elephant(name => 'Wooly', ear_size => 1.2)

You can always use the positional notation big_elephant(1.2, 'blue', 'Wooly') even if function arguments are named. Named notation is useful if you have a function that takes several arguments and many of the arguments are optional. By using named notation, you can override a default value and keep other defaults regardless of the order in which the arguments are defined. You also don’t need to state the arguments in the order they appear in the function definition. In the big_elephant example we were able to accept the default skin color of blue and override the default name, even though name appears last in the argument list. If we were to call the function simply by the order of arguments, we couldn’t skip over skin_color if we wanted to override the name argument.

Tip

In PostgreSQL 9.5 and above, the named notation convention is name => 'Wooly'. In 9.4 and below you would use name := 'Wooly'. For backward compatibility, the old syntax of arg1_name := arg1_value is still supported in 9.5 and above, but may be removed in the future.

Functional definitions often include additional qualifiers to optimize execution and to enforce security:

LANGUAGE

The language must be one installed in your database. Obtain a list with the SELECT lanname FROM pg_language; query.

VOLATILITY

This setting clues the query planner as to whether outputs can be cached and used across multiple calls. Your choices are:

IMMUTABLE: The function will always return the same output for the same input. Think of arithmetic functions. Only immutable functions can be used in the definition of indexes.
STABLE: The function will return the same value for the same inputs within the same query.
VOLATILE: The function can return different values with each call, even with the same inputs. Think of functions that change data or depend on environment settings like system time. This is the default.

Keep in mind that the volatility setting is merely a hint to the planner. The default value of VOLATILE ensures that the planner will always recompute the result. If you use one of the other values, the planner can still choose to forgo caching should it decide that recomputing is more cost-effective.

STRICT

A function marked with this qualifier will always return NULL if any inputs are NULL. The planner skips evaluating the function altogether with any NULL inputs. When writing SQL functions, be cautious

when marking a function as STRICT, because it could prevent the planner from taking advantage of indexes. Read our article STRICT on SQL Functions for more details.

COST

This is a relative measure of computational intensiveness. SQL and PL/pgSQL functions default to 100 and C functions to 1. This affects the order that the planner will follow when evaluating the function in a WHERE clause, and the likelihood of caching. The higher you set the cost, the more computation the planner will assume the function needs.

ROWS

Applies only to functions returning sets of records. The value provides an estimate of how many rows will be returned. The planner will take this value into consideration when coming up with the best strategy.

SECURITY DEFINER

This causes execution to take place within the security context of the owner of the function. If omitted, the function executes under the context of the user calling the function. This qualifier is useful for giving people rights to update a table via a function when they do not have direct update privileges.

PARALLEL

New in PostgreSQL 9.6. This qualifier allows the planner to run in parallel mode. By default, a function is marked as PARALLEL UNSAFE, which prevents any queries containing the function from being distributed into separate work processes. Refer to Parallel Safety. Your choices are:

SAFE: This allows parallel use, and is generally a safe choice for IMMUTABLE functions or functions that don’t update data or change transaction state or other variables.
UNSAFE: Functions that change nontemp table data, access sequences, or state should be marked as UNSAFE. They prevent the query from being run in parallel mode and therefore risking the corruption of the tables or other system state.
RESTRICTED: You may want to use this value for functions that use temporary tables, prepared statements, or client connection state. This value does not prevent a query from running in parallel mode, but processing of these functions can happen only on the lead query.

In many of the examples in this chapter, we’ll be including PARALLEL mode options. If you are running lower than version 9.6, leave out the parallel clauses.

Triggers and Trigger Functions

No worthy database should lack triggers, which automatically detect and handle changes in data. PostgreSQL allows you to attach triggers to tables, views, and even DDL events like creation of a new table.

Triggers can actuate at both the statement level and the row level. Statement triggers run once per SQL statement, whereas row triggers run for each row affected by the SQL. For example, if you execute an UPDATE statement that affects 1,500 rows, a statement-level update trigger will fire only once, whereas the row-level trigger can fire up to 1,500 times.

You can further refine the timing of the trigger by making a distinction between BEFORE, AFTER, and INSTEAD OF triggers. A BEFORE trigger fires prior to the execution of the statement, giving you a chance to cancel or back up data before the change. An AFTER trigger fires after statement execution, giving you a chance to retrieve the new data values. AFTER triggers are often used for logging or replication purposes. INSTEAD OF triggers execute in lieu of the statement. You can attach BEFORE and AFTER triggers only to tables and events, and INSTEAD OF triggers only to views.

Trigger functions that change values of a row should be called only in the BEFORE event, because in the AFTER event, all updates to the NEW record will be ignored.

You can also adorn a trigger with a WHEN condition to control which rows being updated will fire the trigger, or an UPDATE OF columns_list clause to have the trigger fire only if certain columns are updated. To gain a more nuanced understanding of the interplay between triggers and the underlying statement, see the official documentation: Overview of Trigger Behavior. We also demonstrated a view-based trigger in Example 7-5.

PostgreSQL offers specialized functions to handle triggers. These are called trigger functions and behave like any other function and have the same basic structure. Where they differ is in the input parameter and the output type. A trigger function never takes an argument, because internally the function already has access to the data and can modify it.

A trigger function always outputs a data type called a trigger. Because PostgreSQL trigger functions are no different from any other function, you can reuse the same trigger function across different triggers. This is usually not the case for other databases, where each trigger is wedded to its own handler code.

In PostgreSQL, each trigger must have exactly one associated triggering function to handle the firing. To apply multiple triggering functions, you must create multiple triggers against the same event. The alphabetical order of the trigger name determines the order of firing. Each trigger will have access to the revised data from the previous trigger. If any trigger issues a rollback, all data amended by earlier triggers fired by the same event will roll back.

You can use almost any language to create trigger functions, with SQL being the notable exception. PL/pgSQL is by far the most popular language. We demonstrate writing trigger functions using PL/pgSQL in “Writing Trigger Functions in PL/pgSQL”.

Aggregates

Most other databases limit you to ANSI SQL built-in aggregate functions such as MIN, MAX, AVG, SUM, and COUNT. In PostgreSQL, you don’t have this limitation. If you need a more esoteric aggregate function, you’re welcome to write your own. Because you can use any aggregate function in PostgreSQL as a window function (see “Window Functions”), you get twice the use out of any aggregate function that you author.

You can write aggregates in almost any language, SQL included. An aggregate is generally comprised of one or more functions. It must have at least a state transition function to perform the computation; usually this function runs repeatedly to create one output row from two input rows. You can also specify optional functions to manage initial and final states. You can also use a different language for each of the subfunctions. We have various examples of building aggregates using PL/pgSQL, PL/Python, and SQL in the article PostgreSQL Aggregates.

Regardless of which language you use to code the functions, the glue that brings them all together is the CREATE AGGREGATE command:

CREATE AGGREGATE my_agg (input data type) (
SFUNC=state function name,
STYPE=state type,
FINALFUNC=final function name,
INITCOND=initial state value, SORTOP=sort_operator
);

The final function is optional, but if specified, it must take as input the result of the state function. The state function always takes a data type as the input along with the result of the last call to the state function. Sometimes this result is what you want as the result of the aggregate function, and sometimes you want to run a final function to massage the result. The initial condition is also optional. When the initial condition value is present, the command uses it to initialize the state value.

The optional sort operator can serve as the associated sort operator for a MIN- or MAX-like aggregate. It is used to take advantage of indexes. It is just an operator name such as > and <. It should be used only when the two following statements are equivalent:

SELECT agg(col) FROM sometable;

SELECT col FROM sometable ORDER BY col USING sortop LIMIT 1;

Tip

The PostgreSQL 9.4 CREATE AGGREGATE structure was expanded to include support for creating moving aggregates, which are useful with window functions that move the window. See PostgreSQL 9.4: CREATE AGGREGATE for details.

Tip

In PostgreSQL 9.6, aggregates were expanded to include support for parallelization. This was accomplished through the parallel property, which can take the values of safe, unsafe, or restricted. If the parallel property is left out, the aggregate is marked as parallel unsafe. In addition to the parallel setting, combinefunc, serialfunc, and deserialfunc properties were added to support parallel aggregates. Refer to SQL Create Aggregate for details.

Aggregates need not depend on a single column. If you need more than one column for your aggregate (an example is a built-in covariance function), see How to Create Multi-Column Aggregates for guidance.

SQL language functions are easy to write. You don’t have fancy control flow commands to worry about, and you probably have a good grasp of SQL to begin with. When it comes to writing aggregates, you can get pretty far with the SQL language alone. We demonstrate aggregates in “Writing SQL Aggregate Functions”.

Trusted and Untrusted Languages

Function languages can be either trusted or untrusted. Many—but not all—languages offer both a trusted and untrusted version. The term trusted connotes that the language can do no harm to the underlying operating system by denying it access to the key OS operations. In short:

Trusted: A trusted language lacks access to the server’s filesystem beyond the data cluster. It therefore cannot execute OS commands. Users of any level can create functions in a trusted language. Languages such as SQL, PL/pgSQL, PL/Perl, and PL/V8 are trusted.
Untrusted: An untrusted language can interact with the OS. It can execute OS functions and call web services. Only superusers have the privilege of authoring functions in an untrusted language. However, a superuser can grant permission to another role to run an untrusted function. By convention, languages that are untrusted end in the letter U (PL/PerlU, PL/PythonU, etc.). But ending in U is not a requirement. For example, PL/R is such an exception.

Writing Functions with SQL

Although SQL is mostly a language for issuing queries, it can also be used to write functions. In PostgreSQL, using an existing piece of SQL for the function is fast and easy: take your existing SQL statements, add a functional header and footer, and you’re done. But the ease comes at a price. You can’t use control features like conditional branches, looping, or defining variables. More restrictively, you can’t run dynamic SQL statements that you assemble on the fly using arguments passed into the function.

On the positive side, the query planner can peek into an SQL function and optimize execution—a process called inlining. Query planners treat other languages as black boxes. Only SQL functions can be inlined, which lets them take advantage of indexes and collapse repetitive computations.

Basic SQL Function

Example 8-2 shows a primitive SQL function that inserts a row into a table and returns a scalar value.

Example 8-2. SQL function that returns the identifier of an inserted record

CREATE OR REPLACE FUNCTION write_to_log(param_user_name varchar, 
param_description text)
RETURNS integer AS
$$
INSERT INTO logs(user_name, description) VALUES($1, $2)
RETURNING log_id;
$$
LANGUAGE 'sql' VOLATILE;

To call the function, execute something like:

SELECT write_to_log('alex', 'Logged in at 11:59 AM.') As new_id;

Example 8-3. SQL function to update a record

CREATE OR REPLACE FUNCTION
update_logs(log_id int, param_user_name varchar, param_description text)
RETURNS void AS
$$
UPDATE logs SET user_name = $2, description = $3
 , log_ts = CURRENT_TIMESTAMP WHERE log_id = $1;
$$
LANGUAGE 'sql' VOLATILE;

To execute:

SELECT update_logs(12, 'alex', 'Fell back asleep.');

Functions, in almost all languages, can return sets. SQL functions are no exception. There are three common approaches to doing this: the ANSI SQL standard RETURNS TABLE syntax, OUT parameters, and composite data types. The RETURNS TABLE approach is closest to what you’ll find in other database products. In Example 8-4, we demonstrate how to write the same function three ways.

Example 8-4. Examples of function returning sets

Using RETURNS TABLE:

CREATE OR REPLACE FUNCTION select_logs_rt(param_user_name varchar)
RETURNS TABLE (log_id int, user_name varchar(50), 
description text, log_ts timestamptz) AS
$$
SELECT log_id, user_name, description, log_ts FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE PARALLEL SAFE;

Using OUT parameters:

CREATE OR REPLACE FUNCTION select_logs_out(param_user_name varchar, OUT log_id int
 , OUT user_name varchar, OUT description text, OUT log_ts timestamptz)
RETURNS SETOF record AS
$$
SELECT * FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE PARALLEL SAFE;

Using a composite type:

CREATE OR REPLACE FUNCTION select_logs_so(param_user_name varchar)
RETURNS SETOF logs AS
$$
SELECT * FROM logs WHERE user_name = $1;
$$
LANGUAGE 'sql' STABLE PARALLEL SAFE;

Call all these functions using:

SELECT * FROM select_logs_xxx('alex');

Writing SQL Aggregate Functions

Yes! In PostgreSQL you are able to author your own aggregate functions to expand beyond the usual aggregates MIN, MA, COUNT, AVG, etc. We demonstrate by creating an aggregate function to compute the geometric mean. A geometric mean is the nth root of a product of n positive numbers ((x1*x2*x3...xn)^(1/n)). It has various uses in finance, economics, and statistics. A geometric mean substitutes for the more common arithmetic mean when the numbers range across vastly different scales. A more suitable computational formula uses logarithms to transform a multiplicative process to an additive one (EXP(SUM(LN(x))/n)). We’ll be using this method in our example.

To build our geometric mean aggregate, we need two subfunctions: a state transition function to sum the logs (see Example 8-5) and a final function to exponentiate the logs. We’ll also specify an initial condition of zero when we assemble everything together.

Example 8-5. Geometric mean aggregate: state function

CREATE OR REPLACE FUNCTION geom_mean_state(prev numeric[2], next numeric)
RETURNS numeric[2] AS
$$
SELECT
  CASE
    WHEN $2 IS NULL OR $2 = 0 THEN $1
    ELSE ARRAY[COALESCE($1[1],0) + ln($2), $1[2] + 1]
  END;
$$
LANGUAGE sql IMMUTABLE PARALLEL SAFE;

Our state transition function takes two inputs: the previous state passed in as an array with two elements, and the next added in the summation. If the next argument evaluates to NULL or zero, the state function returns the prior state. Otherwise, it returns a new array in which the first element is the sum of the logs and the second element is the running count.

We also need a final function, shown in Example 8-6, that divides the sum from the state transition by the count.

Example 8-6. Geometric mean aggregate: final function

CREATE OR REPLACE FUNCTION geom_mean_final(numeric[2])
RETURNS numeric AS
$$
SELECT CASE WHEN $1[2] > 0 THEN exp($1[1]/$1[2]) ELSE 0 END;
$$
LANGUAGE sql IMMUTABLE PARALLEL SAFE;

Now we stitch all the subfunctions together in our aggregate definition, as shown in Example 8-7. (Note that our aggregate has an initial condition that is the same data type as the one returned by our state function.)

Example 8-7. Geometric mean aggregate: assembling the pieces

CREATE AGGREGATE geom_mean(numeric) (
SFUNC=geom_mean_state,
STYPE=numeric[],
FINALFUNC=geom_mean_final,
PARALLEL = safe,
INITCOND='{0,0}'
);

Let’s take our new function for a test drive. In Example 8-8, we compute a heuristic rating for racial diversity and list the top five most racially diverse counties in Massachusetts.

Example 8-8. Top five most racially diverse counties using geometric mean

SELECT left(tract_id,5) As county, geom_mean(val) As div_county
FROM census.vw_facts
WHERE category = 'Population' AND short_name != 'white_alone'
GROUP BY county
ORDER BY div_county DESC LIMIT 5;

county |     div_county
-------+---------------------
25025  | 85.1549046212833364
25013  | 79.5972921427888918
25017  | 74.7697097102419689
25021  | 73.8824162064128504
25027  | 73.5955049035237656

Let’s go into overdrive and engage our new function as a window aggregate, as shown in Example 8-9.

Example 8-9. Top five most racially diverse census tracts with averages

WITH X AS (SELECT
  tract_id,
  left(tract_id,5) As county,
  geom_mean(val) OVER (PARTITION BY tract_id) As div_tract,
  ROW_NUMBER() OVER (PARTITION BY tract_id) As rn,
  geom_mean(val) OVER(PARTITION BY left(tract_id,5)) As div_county
FROM census.vw_facts WHERE category = 'Population' AND short_name != 'white_alone'
)
SELECT tract_id, county, div_tract, div_county
FROM X
WHERE rn = 1
ORDER BY div_tract DESC, div_county DESC LIMIT 5;

tract_id    | county |      div_tract       |     div_county
------------+--------+----------------------+---------------------
25025160101 | 25025  | 302.6815688785928786 | 85.1549046212833364
25027731900 | 25027  | 265.6136902148147729 | 73.5955049035237656
25021416200 | 25021  | 261.9351057509603296 | 73.8824162064128504
25025130406 | 25025  | 260.3241378371627137 | 85.1549046212833364
25017342500 | 25017  | 257.4671462282508267 | 74.7697097102419689

Writing PL/pgSQL Functions

When your functional needs outgrow SQL, turning to PL/pgSQL is a common practice. PL/pgSQL surpasses SQL in that you can declare local variables using DECLARE and you can incorporate control flow.

Basic PL/pgSQL Function

To demonstrate syntax differences from SQL, in Example 8-10 we rewrite Example 8-4 as a PL/pgSQL function.

Example 8-10. Function to return a table using PL/pgSQL

CREATE FUNCTION select_logs_rt(param_user_name varchar)
RETURNS TABLE (log_id int, user_name varchar(50), 
description text, log_ts timestamptz) AS
$$
BEGIN
	RETURN QUERY
    SELECT log_id, user_name, description, log_ts FROM logs
     WHERE user_name = param_user_name;
END;
$$
LANGUAGE 'plpgsql' STABLE;

Writing Trigger Functions in PL/pgSQL

Because you can’t write trigger functions in SQL, PL/pgSQL is your next best bet. In this section, we’ll demonstrate how to write a basic trigger function in PL/pgSQL.

We proceed in two steps. First, we write the trigger function. Second, we explicitly attach the trigger function to the appropriate trigger. The second step is a powerful feature of PostgreSQL that decouples the function handling the trigger from the trigger itself. You can attach the same trigger function to multiple triggers, adding another level of reuse not found in other databases. Because each trigger function can stand on its own, you have your choice of languages, and mixing is completely OK. For a single triggering event, you can set up multiple triggers, each with functions written in a different language. For example, you can have a trigger email a client written in PL/PythonU or PL/PerlU and another trigger write to a log table with PL/pgSQL.

A basic trigger function and accompanying trigger is demonstrated in Example 8-11.

Example 8-11. Trigger function to timestamp new and changed records

CREATE OR REPLACE FUNCTION trig_time_stamper() RETURNS trigger AS 
$$
BEGIN
    NEW.upd_ts := CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$
LANGUAGE plpgsql VOLATILE;

CREATE TRIGGER trig_1
BEFORE INSERT OR UPDATE OF session_state, session_id 
ON web_sessions
FOR EACH ROW EXECUTE PROCEDURE trig_time_stamper();

: Defines the trigger function. This function can be used on any table that has a upd_ts column. It updates the upd_ts field to the current time before returning the changed record.
: This is a new feature introduced in version 9.0 that allows us to limit the firing of the trigger so it happens only if specified columns have changed. Prior to version 9.0, the trigger would fire on any update and you would need to perform a column-wise comparison using OLD.some_column and NEW.some_column to determine what changed. (This feature is not supported for INSTEAD OF triggers.)

Writing PL/Python Functions

Python is a slick language with a vast number of available libraries. PostgreSQL is the only database we know of that lets you compose functions using Python. PostgreSQL supports both Python 2 and Python 3.

Caution

Although you can install both plpython2u and plpython3u in the same database, you can’t use both during the same session. This means that you can’t write a query that calls both plpython2u and plpython3u functions. You may encounter a third extension called plpythonu; this is an alias for plpython2u and is left around for backward compatibility.

In order to use PL/Python, you first need to install Python on your server. For Windows and Mac, Python installers are available. For Linux/Unix systems, Python binaries are usually available via the various distributions. For details, see PL/Python. After installing Python, install the PostgreSQL Python extension:

CREATE EXTENSION plpython2u;
CREATE EXTENSION plpython3u;

Make absolutely sure that you have Python properly running on your server before attempting to install the extension, or else you will run into errors that could be difficult to troubleshoot.

The extensions are compiled against a specific minor version of Python. You should install the minor version of Python that matches what your plpythonu extensions were compiled against. For example, if your plpython2u was compiled against Python 2.7, you should install Python 2.7.

Basic Python Function

PostgreSQL automatically converts PostgreSQL data types to Python data types and back. PL/Python is capable of returning arrays and composite types. You can use PL/Python to write triggers and create aggregate functions. We’ve demonstrated some of these in the Postgres OnLine Journal, in PL/Python Examples.

Python allows you to perform feats that aren’t possible in PL/pgSQL. In Example 8-12, we demonstrate how to write a PL/Python function to do a text search of the online PostgreSQL document site.

Example 8-12. Searching PostgreSQL documents using PL/Python

CREATE OR REPLACE FUNCTION postgresql_help_search(param_search text)
RETURNS text AS
$$
import urllib, re 
response = urllib.urlopen(
    'http://www.postgresql.org/search/?u=%2Fdocs%2Fcurrent%2F&q=' + param_search
) 
raw_html = response.read() 
result =
	raw_html[raw_html.find("<!-- docbot goes here -->") :
	raw_html.find("<!-- pgContentWrap -->") - 1] 
result = re.sub('<[^<]+?>', '', result).strip() 
return result 
$$
LANGUAGE plpython2u SECURITY DEFINER STABLE;

: Imports the libraries we’ll be using.
: Performs a search after concatenating the search term.
: Reads the response and saves the retrieved HTML to a variable called raw_html.
: Saves the part of the raw_html that starts with  and ends just before the beginning of  into a new variable called result.
: Removes leading and trailing HTML symbols and whitespace.
: Returns result.

Calling Python functions is no different from calling functions written in other languages. In Example 8-13, we use the function we created in Example 8-12 to output the result with three search terms.

Example 8-13. Using Python functions in a query

SELECT search_term, left(postgresql_help_search(search_term),125) AS result
FROM (VALUES ('regexp_match'),('pg_trgm'),('tsvector')) As x(search_term);

Recall that PL/Python is an untrusted language, without a trusted counterpart. This means only superusers can write functions using PL/Python, and the function can interact with the filesystem of the OS. Example 8-14 takes advantage of the untrusted nature of PL/Python to retrieve file listings from a directory. Keep in mind that from the perspective of the OS, a PL/Python function runs under the context of the postgres user account created during installation, so you need to be sure that this account has adequate access to the relevant directories.

Example 8-14. Listing files in directories

CREATE OR REPLACE FUNCTION list_incoming_files()
RETURNS SETOF text AS
$$
import os
return os.listdir('/incoming')
$$
LANGUAGE 'plpython2u' VOLATILE SECURITY DEFINER;

Run the function in Example 8-14 with the following query:

SELECT filename
FROM list_incoming_files() As filename
WHERE filename ILIKE '%.csv'

Writing PL/V8, PL/CoffeeScript, and PL/LiveScript Functions

PL/V8 (aka PL/JavaScript) is a trusted language built atop the Google V8 engine. It allows you to write functions in JavaScript and interface with the JSON data type. It is not part of the core PostgreSQL offering, so you won’t find it in all popular PostgreSQL distributions. You can always compile it from source. For Windows, we’ve built PL/V8 extension windows binaries. You can download them from our Postgres OnLine Journal site for PostgreSQL 9.6 (both 32-bit and 64-bit).

When you add PL/V8 binaries to your PostgreSQL setup, you get not one, but three JavaScript-related languages:

PL/V8 (plv8): This is the basic language that serves as the basis for the other two JavaScript languages.
PL/CoffeeScript (plcoffee): This language lets you write functions in CoffeeScript. CoffeeScript is JavaScript with a more succinct syntax structure that resembles Python. Like Python, it relies on indentation to impart context but does away with annoying curly braces.
PL/LiveScript (plls): PL/LiveScript allows you to write functions in LiveScript, a fork of CoffeeScript. LiveScript is similar to CoffeeScript but with some added syntactic condiments. This article promotes LiveScript as a superior alternative to CoffeeScript: 10 Reasons to Switch from CoffeeScript to LiveScript. If anything, LiveScript does have more Python, F#, and Haskell features than CoffeeScript. If you’re looking for a language that has a lighter footprint than PL/Python and is trusted, you might want to give LiveScript a try.

PL/CoffeeScript and PL/LiveScript are compiled using the same PL/V8 library. Their functionality is therefore identical to that of PL/V8. In fact, you can easily convert back to PL/V8 if they don’t suit your taste buds. All three languages are trusted. This means they can’t access OS filesystems, but they can be used by nonsuperusers to create functions.

Example 8-15 has the commands to install the three languages using extensions. For each database where you’d like to install the support, you must run these lines. You need not install all three if you choose not to.

Example 8-15. Installing PL/V8 family of languages

CREATE EXTENSION plv8;
CREATE EXTENSION plcoffee;
CREATE EXTENSION plls;

The PL/V8 family of languages has many key qualities that make them stand apart from PL/pgSQL, some of which you’ll find only in other high-end procedural languages like PL/R:

Generally faster numeric processing than SQL and PL/pgSQL.
The ability to create window functions. You can’t do this using SQL, PL/pgSQL, or PL/Python. (You can in PL/R and C, though.)
The ability to create triggers and aggregate functions.
Support for prepared statements, subtransactions, inner functions, classes, and try-catch error handling.
The ability to dynamically generate executable code using an eval function.
JSON support, allowing for looping over and filtering of JSON objects.
Access to functions from DO commands.
Compatibility with Node.js. Node.js users, and other users who want to use Javascript for building network applications, will appreciate that PL/V8 and Node.js are built on the same Google V8 engine and that many of the libraries available for Node.js will work largely unchanged when used in PL/V8. There is an extension called plv8x that makes using Node.js modules and modules you build easier to reuse in PL/V8.

You can find several examples on our site of PL/V8 use. Some involved copying fairly large bodies of JavaScript code that we pulled from the web and wrapped in a PL/V8 wrapper, as detailed in Using PLV8 to Build JSON Selectors. The PL/V8 family mates perfectly with web applications because much of the same client-side JavaScript logic can be reused. More important, it makes a great all-purpose language for developing numeric functions, updating data, and so on.

Basic Functions

One of the great benefits of PL/V8 is that you can use any JavaScript function in your PL/V8 functions with minimal change. For example, you’ll find many JavaScript examples on the web to validate email addresses. We arbitrarily picked one and made a PL/V8 out of it in Example 8-16.

Example 8-16. Using PL/V8 to validate an email address

CREATE OR REPLACE FUNCTION
validate_email(email text) returns boolean as
$$
 var re = /\S+@\S+\.\S+/;
 return re.test(email);
$$ LANGUAGE plv8 IMMUTABLE STRICT PARALLEL SAFE;

Our code uses a JavaScript regex object to check the email address. To use the function, see Example 8-17.

Example 8-17. Calling the PL/V8 email validator

SELECT email, validate_email(email) AS is_valid
 FROM (VALUES ('alexgomezq@gmail.com')
 ,('alexgomezqgmail.com'),('alexgomezq@gmailcom')) AS x (email);

which outputs:

        email         | is_valid
----------------------+----------
 alexgomezq@gmail.com | t
 alexgomezqgmail.com  | f
 alexgomezq@gmailcom  | f

Although you can code the same function using PL/pgSQL and PostgreSQL’s own regular expression support, we guiltlessly poached someone else’s time-tested code and wasted no time of our own. If you’re a web developer and find yourself having to validate data on both the client side and the database side, using PL/V8 could halve your development efforts, pretty much by cutting and pasting.

You can store a whole set of these validation functions in a modules table. You can then inject results onto the page but also use the validation functions directly in the database, as described in Andrew Dunstan’s “Loading Useful Modules in PLV8”. This is possible because the eval function is part of the PL/V8 JavaScript language. The built-in function allows you to compile functions at startup for later use.

We fed Example 8-17 through an online converter and added a return statement to generate its CoffeeScript counterpart in Example 8-18.

Example 8-18. PL/Coffee validation of email function

CREATE OR REPLACE FUNCTION
validate_email(email text) returns boolean as
$$
    re = /\S+@\S+\.\S+/
    return re.test email
$$
LANGUAGE plcoffee IMMUTABLE STRICT PARALLEL SAFE;

CoffeeScript doesn’t look all that different from JavaScript, except for the lack of parentheses, curly braces, and semicolons. The LiveScript version looks exactly like the CoffeeScript except with a LANGUAGE plls specifier.

Writing Aggregate Functions with PL/V8

In Examples 8-19 and 8-20, we reformulate the state transition and final function of the geometric mean aggregate function (see “Writing SQL Aggregate Functions”) using PL/V8.

Example 8-19. PL/V8 geometric mean aggregate: state transition function

CREATE OR REPLACE FUNCTION geom_mean_state(prev numeric[2], next numeric)
RETURNS numeric[2] AS
$$
    return (next == null || next == 0) ? prev :
    [(prev[0] == null)? 0: prev[0] + Math.log(next), prev[1] + 1];
$$
LANGUAGE plv8 IMMUTABLE PARALLEL SAFE;

Example 8-20. PL/V8 geometric mean aggregate: final function

CREATE OR REPLACE FUNCTION geom_mean_final(in_num numeric[2])
RETURNS numeric AS
$$
  return in_num[1] > 0 ? Math.exp(in_num[0]/in_num[1]) : 0;
$$
LANGUAGE plv8 IMMUTABLE PARALLEL SAFE;

The final CREATE AGGREGATE puts all the pieces together and looks more or less the same in all languages. Our PL/V8 variant is shown in Example 8-21.

Example 8-21. PL/V8 geometric mean aggregate: putting all the pieces together

CREATE AGGREGATE geom_mean(numeric) (
  SFUNC=geom_mean_state,
  STYPE=numeric[],
  FINALFUNC=geom_mean_final,
  PARALLEL = safe,
  INITCOND='{0,0}'
);

When you run Example 8-9, calling our new PL/V8 function, you get the same answers as the version written in SQL, but the PL/V8 version is two to three times faster. Generally, for mathematical operations, you’ll find that PL/V8 functions are 10 to 20 times faster than their SQL counterparts.

Writing Window Functions in PL/V8

PostgreSQL has many built-in window functions, as discussed in “Window Functions”. Any aggregate function, including the ones you create, can be used as window aggregate functions. These two points alone make PostgreSQL stand out from most other relational databases. Even more impressive is that PostgreSQL allows you to create your own window functions.

The only caveat is that most PLs you can install in PostgreSQL will not allow you to create window functions. If you need to write a window function in PostgreSQL, you cannot do it with built-in PL/PGSQL or SQL languages. Nor can you do it in other popular PLs like PL/Python or PL/Perl. You can do it in C, but that requires compilation. You can also to some extent do it in a language like PL/R. PL/V8, on the other hand, fully supports writing window functions and is fairly efficient (in many cases just as fast as a window function written in C), but unlike C, doesn’t require compilation of your function code.

What makes writing window functions in PL/V8 possible is that PL/V8 comes packaged with a plv8.window_object() helper function that returns a handle to the current window object. This object includes methods for inspecting and accessing elements within the window.

In Example 8-22, we’ll create a window function that, for each row, returns true if it’s the beginning of a run, and false otherwise. Runs, or streaks, are sequences of identical outcome. The function lets the caller decide how many rows constitute a “run” through the ofs argument.

Example 8-22. PL/V8 window function to flag repeating data values

CREATE FUNCTION run_begin(arg anyelement, ofs int) RETURNS boolean AS $$
  var winobj = plv8.get_window_object();
  var result = true;
  /** Get current value **/
  var cval = winobj.get_func_arg_in_partition(0,
                                              0,
                                              winobj.SEEK_CURRENT,
                                              false);
  for (i = 1; i < ofs; i++){
       /** get next value **/
       nval =  winobj.get_func_arg_in_partition(0,
                                                i,
                                                winobj.SEEK_CURRENT,
                                                false);
      result = (cval == nval) ? true : false;
      if (!result){
            break;
      }
      /** next current value is our last value **/
      cval = nval;

   }
  return result;
$$ LANGUAGE plv8 WINDOW;

To declare a function as a window function, it must have a WINDOW designator in the function envelope as in the last line of Example 8-22.

The body of the function must inspect elements of the window set of data and use them. PL/V8 has a handle to this window and helper methods outlined in the PL/V8 documentation PL/V8 Window function API. Our function needs to look forward in the window for values from the current position in the window through ofs values. If these values are all the same, it will return true, otherwise false. The function method that PL/V8 provides for scanning values of a window is get_func_arg_in_partition. We use that to look forward and exit with false, as soon as the pattern of equality fails or we’ve reached the last value.

We’ll use this function to find the winner in a simple game of coin toss. Each player gets four tosses, and the winner must have a run of three heads, as shown in Example 8-23.

Example 8-23. PL/V8 window function example usage

SELECT id, player, toss,
 run_begin(toss,3) OVER (PARTITION BY player ORDER BY id) AS rb
    FROM coin_tosses
ORDER BY player, id;

 id | player | toss | rb
----+--------+------+----
  4 | alex   | H    | t
  8 | alex   | H    | t
 12 | alex   | H    | f
 16 | alex   | H    | f
  2 | leo    | T    | f
  6 | leo    | H    | f
 10 | leo    | H    | f
 14 | leo    | T    | f
  1 | regina | H    | f
  5 | regina | H    | f
  9 | regina | T    | f
 13 | regina | T    | f
  3 | sonia  | T    | t
  7 | sonia  | T    | t
 11 | sonia  | T    | f
 15 | sonia  | T    | f
(16 rows)

For other examples of writing PL/V8 functions in PL/V8, check out the PL/V8 window regression script, which demonstrates how to create many of the built-in PostgreSQL window functions (lead, lag, row_number, cume_dist, and first_value, last_value) in PL/V8.