Chapter 2. Database Administration

This chapter covers what we consider basic administration of a PostgreSQL server: managing roles and permissions, creating databases, installing extensions, and backing up and restoring data. Before continuing, you should have already installed PostgreSQL and have administration tools at your disposal.

Configuration Files

Three main configuration files control operations of a PostgreSQL server:

postgresql.conf: Controls general settings, such as memory allocation, default storage location for new databases, the IP addresses that PostgreSQL listens on, location of logs, and plenty more.
pg_hba.conf: Controls access to the server, dictating which users can log in to which databases, which IP addresses can connect, and which authentication scheme to accept.
pg_ident.conf: If present, this file maps an authenticated OS login to a PostgreSQL user. People sometimes map the OS root account to the PostgresSQL superuser account, postgres.

Note

PostgreSQL officially refers to users as roles. Not all roles need to have login privileges. For example, group roles often do not. We use the term user to refer to a role with login privileges.

If you accepted default installation options, you will find these configuration files in the main PostgreSQL data folder. You can edit them using any text editor or the Admin Pack in pgAdmin. Instructions for editing with pgAdmin are in “Editing postgresql.conf and pg_hba.conf from pgAdmin3”. If you are unable to find the physical location of these files, run the Example 2-1 query as a superuser while connected to any database.

Example 2-1. Location of configuration files

SELECT name, setting FROM pg_settings WHERE category = 'File Locations';

       name        |                 setting
-------------------+------------------------------------------
 config_file       | /etc/postgresql/9.6/main/postgresql.conf
 data_directory    | /var/lib/postgresql/9.6/main
 external_pid_file | /var/run/postgresql/9.6-main.pid
 hba_file          | /etc/postgresql/9.6/main/pg_hba.conf
 ident_file        | /etc/postgresql/9.6/main/pg_ident.conf
(5 rows)

Making Configurations Take Effect

Some configuration changes require a PostgreSQL service restart, which closes any active connections from clients. Other changes require just a reload. New users connecting after a reload will receive the new setting. Extant users with active connections will not be affected during a reload. If you’re not sure whether a configuration change requires a reload or restart, look under the context setting associated with a configuration. If the context is postmaster, you’ll need a restart. If the context is user, a reload will suffice.

Reloading

A reload can be done in several ways. One way is to open a console window and run this command:

pg_ctl reload -D your_data_directory_here

If you have PostgreSQL installed as a service in RedHat Enterprise Linux, CentOS, or Ubuntu, enter instead:

service postgresql-9.5 reload

postgresql-9.5 is the name of your service. (For older versions of PostgreSQL, the service is sometimes called postgresql sans version number.)

You can also log in as a superuser to any database and execute the following SQL:

SELECT pg_reload_conf();

Finally, you can reload from pgAdmin; see “Editing postgresql.conf and pg_hba.conf from pgAdmin3”.

Restarting

More fundamental configuration changes require a restart. You can perform a restart by stopping and restarting the postgres service (daemon). Yes, power cycling will do the trick as well.

You can’t restart with a PostgreSQL command, but you can trigger a restart from the operating system shell. On Linux/Unix with a service, enter:

service postgresql-9.6 restart

For any PostgreSQL instance not installed as a service:

pg_ctl restart -D your_data_directory_here

On Windows you can also just click Restart on the PostgreSQL service in the Services Manager.

The postgresql.conf File

postgresql.conf controls the life-sustaining settings of the PostgreSQL server. You can override many settings at the database, role, session, and even function levels. You’ll find many details on how to finetune your server by tweaking settings in the article Tuning Your PostgreSQL Server.

Version 9.4 introduced an important change: instead of editing postgresql.conf directly, you should override settings using an additional file called postgresql.auto.conf. We further recommend that you don’t touch the postgresql.conf and place any custom settings in postgresql.auto.conf.

Checking postgresql.conf settings

An easy way to read the current settings without opening the configuration files is to query the view named pg_settings. We demonstrate in Example 2-2.

Example 2-2. Key settings

SELECT
    name,
    context ,
    unit ,
    setting, boot_val, reset_val 
FROM pg_settings
WHERE name IN ('listen_addresses','deadlock_timeout','shared_buffers',
    'effective_cache_size','work_mem','maintenance_work_mem')
ORDER BY context, name;

name                 | context    | unit | setting | boot_val  | reset_val
---------------------+------------+------+-------- +-----------+----------
listen_addresses     | postmaster |      | *       | localhost | *
shared_buffers       | postmaster | 8kB  | 131584  | 1024      | 131584
deadlock_timeout     | superuser  | ms   | 1000    | 1000      | 1000
effective_cache_size | user       | 8kB  | 16384   | 16384     | 16384
maintenance_work_mem | user       | kB   | 16384   | 16384     | 16384
work_mem             | user       | kB   | 5120    | 1024      | 5120

The context is the scope of the setting. Some settings have a wider effect than others, depending on their context.

User settings can be changed by each user to affect just that user’s sessions. If set by the superuser, the setting becomes a default for all users who connect after a reload.

Superuser settings can be changed only by a superuser, and will apply to all users who connect after a reload. Users cannot individually override the setting.

Postmaster settings affect the entire server (postmaster represents the PostgreSQL service) and take effect only after a restart.

Settings with user or superuser context can be set for a specific database, user, session, and function level. For example, you might want to set work_mem higher for an SQL guru-level user who writes mind-boggling queries. Similarly, if you have one function that is sort-intensive, you could raise work_mem just for it. Settings set at database, user, session, and function levels do not require a reload. Settings set at the database level take effect on the next connect to the database. Settings set for the session or function take effect right away.

Be careful checking the units of measurement used for memory. As you can see in Example 2-2, some are reported in 8-KB blocks and some just in kilobytes. Regardless of how a setting displays, you can use any unit of choice when setting; 128 MB is a versatile choice for most memory settings.

Showing units as 8 KB is annoying at best and is destabilizing at worst. The SHOW command in SQL offers display settings in labeled and more intuitive units. For example, running:

SHOW shared_buffers;

returns 1028MB. Similarly, running:

SHOW deadlock_timeout;

returns 1s. If you want to see the units for all settings, enter SHOW ALL.

setting is the current setting; boot_val is the default setting; reset_val is the new setting if you were to restart or reload the server. Make sure that setting and reset_val match after you make a change. If not, the server needs a restart or reload.

New in version 9.5 is a system view called pg_file_settings, which you can use to query settings. Its output lists the source file where the settings can be found. The applied tells you whether the setting is in effect; if the setting has an f in that column you need to reload or restart to make it take effect. In cases where a particular setting is present in both postgresql.conf and postgresql.auto.conf, the postgresql.auto.conf one will take precedent and you’ll see the other files with applied set to false (f). The applied is shown in Example 2-3.

Example 2-3. Querying pg_file_settings

SELECT name, sourcefile, sourceline, setting, applied
FROM pg_file_settings
WHERE name IN ('listen_addresses','deadlock_timeout','shared_buffers',
    'effective_cache_size','work_mem','maintenance_work_mem')
ORDER BY name;

name                 | sourcefile                    | sourceline | setting | applied
---------------------+-------------------------------+------------+---------+--------
effective_cache_size | E:/data96/postgresql.auto.conf|         11 | 8GB     | t
listen_addresses     | E:/data96/postgresql.conf     |         59 | *       | t
maintenance_work_mem | E:/data96/postgresql.auto.conf|          3 | 16MB    | t
shared_buffers       | E:/data96/postgresql.conf     |        115 | 128MB   | f
shared_buffers       | E:/data96/postgresql.auto.conf|          5 | 131584  | t

Pay special attention to the following network settings in postgresql.conf or postgresql.auto.conf, because an incorrect entry here will prevent clients from connecting. Changing their values requires a service restart:

listen_addresses: Informs PostgreSQL which IP addresses to listen on. This usually defaults to local (meaning a socket on the local system), or localhost, meaning the IPv6 or IPv4 localhost IP address. But many people change the setting to *, meaning all available IP addresses.
port: Defaults to 5432. You may wish to change this well-known port to something else for security or if you are running multiple PostgreSQL services on the same server.
max_connections: The maximum number of concurrent connections allowed.
log_destination: This setting is somewhat a misnomer. It specifies the format of the logfiles rather than their physical location. The default is stderr. If you intend to perform extensive analysis on your logs, we suggest changing it to csvlog, which is easier to export to third-party analytic tools. Make sure you have the logging_collection set to on if you want logging.

The following settings affect performance. Defaults are rarely the optimal value for your installation. As soon as you gain enough confidence to tweak configuration settings, you should tune these values:

shared_buffers

Allocated amount of memory shared among all connections to store recently accessed pages. This setting profoundly affects the speed of your queries. You want this setting to be fairly high, probably as much as 25% of your RAM. However, you’ll generally see diminishing returns after more than 8 GB. Changes require a restart.

effective_cache_size

An estimate of how much memory PostgreSQL expects the operating system to devote to it. This setting has no effect on actual allocation, but the query planner figures in this setting to guess whether intermediate steps and query output would fit in RAM. If you set this much lower than available RAM, the planner may forgo using indexes. With a dedicated server, setting the value to half of your RAM is a good starting point. Changes require a reload.

work_mem

Controls the maximum amount of memory allocated for each operation such as sorting, hash join, and table scans. The optimal setting depends on how you’re using the database, how much memory you have to spare, and whether your server is dedicated to PostgreSQL. If you have many users running simple queries, you want this setting to be relatively low to be democratic; otherwise, the first user may hog all the memory. How high you set this also depends on how much RAM you have to begin with. A good article to read for guidance is Understanding work_mem. Changes require a reload.

maintenance_work_mem

The total memory allocated for housekeeping activities such as vacuuming (pruning records marked for deletion). You shouldn’t set it higher than about 1 GB. Reload after changes.

max_parallel_workers_per_gather

This is a new setting introduced in 9.6 for parallelism. The setting determines the maximum parallel worker threads that can be spawned for each gather operation. The default setting is 0, which means parallelism is completely turned off. If you have more than one CPU core, you will want to elevate this. Parallel processing is new in version 9.6, so you may have to experiment with this setting to find what works best for your server. Also note that the number you have here should be less than max_worker_processes, which defaults to 8 because the parallel background worker processes are a subset of the maximum allowed processes.

In version 10, there is an additional setting called max_parallel_workers, which controls the subset of max_worker_processes allocated for parallelization.

Changing the postgresql.conf settings

PostgreSQL 9.4 introduced the ability to change settings using the ALTER SYSTEM SQL command. For example, to set the work_mem globally, enter the following:

ALTER SYSTEM SET work_mem = '500MB';

This command is wise enough to not directly edit postgres.conf but will make the change in postgres.auto.conf.

Depending on the particular setting changed, you may need to restart the service. If you just need to reload it, here’s a convenient command:

SELECT pg_reload_conf();

If you have to track many settings, consider organizing them into multiple configuration files and then linking them back using the include or include_if_exists directive within the postgresql.conf. The exact syntax is as follows:

include 'filename'

The filename argument can be an absolute path or a relative path from the postgresql.conf file.

“I edited my postgresql.conf and now my server won’t start.”

The easiest way to figure out what you screwed up is to look at the logfile, located at the root of the data folder, or in the pg_log subfolder. Open the latest file and read what the last line says. The error raised is usually self-explanatory.

A common culprit is setting shared_buffers too high. Another suspect is an old postmaster.pid left over from a failed shutdown. You can safely delete this file, located in the data cluster folder, and try restarting again.

The pg_hba.conf File

The pg_hba.conf file controls which IP addresses and users can connect to the database. Furthermore, it dictates the authentication protocol that the client must follow. Changes to the file require at least a reload to take effect. A typical pg_hba.conf looks like Example 2-4.

Example 2-4. Sample pg_hba.conf

# TYPE  DATABASE USER ADDRESS         METHOD
host    all      all  127.0.0.1/32    ident 
host    all      all  ::1/128         trust 
host    all      all  192.168.54.0/24 md5 
hostssl all      all  0.0.0.0/0       md5 

# TYPE DATABASE    USER     ADDRESS      METHOD
# Allow replication connections from localhost, 
# by a user with replication privilege. 
#host  replication postgres 127.0.0.1/32 trust
#host  replication postgres ::1/128      trust

Authentication method. The usual choices are ident, trust, md5, peer, and password.

IPv6 syntax for defining network range. This applies only to servers with IPv6 support and may prevent pg_hba.conf from loading if you add this section without actually having IPv6 networking enabled on the server.

IPv4 syntax for defining network range. The first part is the network address followed by the bit mask; for instance: 192.168.54.0/24. PostgreSQL will accept connection requests from any IP address within the range.

SSL connection rule. In our example, we allow anyone to connect to our server outside of the allowed IP range as long as they can connect using SSL.

SSL configuration settings can be found in postgres.conf or postgres.auto.conf: ssl, ssl_cert_file, ssl_key_file. Once the server confirms that the client is able to support SSL, it will honor the connection request and all transmissions will be encrypted using the key information.

Range of IP addresses allowed to replicate with this server.

For each connection request, pg_hba.conf is checked from the top down. As soon as a rule granting access is encountered, a connection is allowed and the server reads no further in the file. As soon as a rule rejecting access is encountered, the connection is denied and the server reads no further in the file. If the end of the file is reached without any matching rules, the connection is denied. A common mistake people make is to put the rules in the wrong order. For example, if you added 0.0.0.0/0 reject before 127.0.0.1/32 trust, local users won’t be able to connect, even though a rule is in place allowing them to.

New in version 10 is the pg_hba_file_rules system view that lists all the contents of the pg_hba.conf file.

“I edited my pg_hba.conf and now my server is broken.”

Don’t worry. This happens quite often, but is easy to recover from. This error is generally caused by typos or by adding an unavailable authentication scheme. When the postgres service can’t parse pg_hba.conf, it blocks all access just to be safe. Sometimes, it won’t even start up. The easiest way to figure out what you did wrong is to read the logfile located in the root of the data folder or in the pg_log subfolder. Open the latest file and read the last line. The error message is usually self-explanatory. If you’re prone to slippery fingers, back up the file prior to editing.

Authentication methods

PostgreSQL gives you many choices for authenticating users—probably more than any other database product. Most people are content with the popular ones: trust, peer, ident, md5, and password. And don’t forget about reject, which immediately denies access. Also keep in mind that pg_hba.conf offers settings at many other levels as the gatekeeper to the entire PostgreSQL server. Users or devices must still satisfy role and database access restrictions after being admitted by pg_hba.conf.

We describe the common authentication methods here:

trust: This is the least secure authentication, essentially no password is needed. As long as the user and database exist in the system and the request comes from an IP within the allowed range, the user can connect. You should implement trust only for local connections or private network connections. Even then it’s possible for someone to spoof IP addresses, so the more security-minded among us discourage its use entirely. Nevertheless, it’s the most common for PostgreSQL installed on a desktop for single-user local access where security is not a concern.
md5: Very common, requires an md5-encrypted password to connect.
password: Uses clear-text password authentication.
ident: Uses pg_ident.conf to check whether the OS account of the user trying to connect has a mapping to a PostgreSQL account. The password is not checked. ident is not available on Windows.
peer: Uses the OS name of the user from the kernel. It is available only for Linux, BSD, macOS, and Solaris, and only for local connections on these systems.
cert: Stipulates that connections use SSL. The client must have a registered certificate. cert uses an ident file such as pg_ident to map the certificate to a PostgreSQL user and is available on all platforms where SSL connection is enabled.

More esoteric options abound, such as gss, radius, ldap, and pam. Some may not always be installed by default.

You can elect more than one authentication method, even for the same database. Keep in mind that pg_hba.conf is processed from top to bottom.

Managing Connections

More often than not, someone else (never you, of course) will execute an inefficient query that ends up hogging resources. They could also run a query that’s taking much longer than what they have patience for. Cancelling the query, terminating the connection, or both will put an end to the offending query.

Cancelling and terminating are far from graceful and should be used sparingly. Your client application should prevent queries from going haywire in the first place. Out of politeness, you probably should alert the connected role that you’re about to terminate its connection, or wait until after hours to do the dirty deed.

There are few scenarios where you should cancel all active update queries: before backing up the database and before restoring the database.

To cancel running queries and terminate connections, follow these steps:

Retrieve a listing of recent connections and process IDs (PIDs):
```
SELECT * FROM pg_stat_activity;
```
pg_stat_activity is a view that lists the last query running on each connection, the connected user (usename), the database (datname) in use, and the start times of the queries. Review the list to identify the PIDs of connections you wish to terminate.
Cancel active queries on a connection with PID 1234:
```
SELECT pg_cancel_backend(1234);
```
This does not terminate the connection itself, though.
Terminate the connection:
```
SELECT pg_terminate_backend(1234);
```
You may need to take the additional step of terminating the client connection. This is especially important prior to a database restore. If you don’t terminate the connection, the client may immediately reconnect after restore and run the offending query anew. If you did not already cancel the queries on the connection, terminating the connection will cancel all of its queries.

PostgreSQL lets you embed functions within a regular SELECT statement. Even though pg_terminate_backend and pg_cancel_backend act on only one connection at a time, you can kill multiple connections by wrapping them in a SELECT. For example, let’s suppose you want to kill all connections belonging to a role with a single blow. Run this SQL command:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity 
WHERE usename = 'some_role';

You can set certain operational parameters at the server, database, user, session, or function level. Any queries that exceed the parameter will automatically be cancelled by the server. Setting a parameter to 0 disables the parameter:

deadlock_timeout

This is the amount of time a deadlocked query should wait before giving up. This defaults to 1000 ms. If your application performs a lot of updates, you may want to increase this value to minimize contention.

Instead of relying on this setting, you can include a NOWAIT clause in your update SQL: SELECT FOR UPDATE NOWAIT ....

The query will be automatically cancelled upon encountering a deadlock.

In PostgreSQL 9.5, you have another choice: SELECT FOR UPDATE SKIP LOCKED will skip over locked rows.

statement_timeout

This is the amount of time a query can run before it is forced to cancel. This defaults to 0, meaning no time limit. If you have long-running functions that you want cancelled if they exceed a certain time, set this value in the definition of the function rather than globally. Cancelling a function cancels the query and the transaction that’s calling it.

lock_timeout

This is the amount of time a query should wait for a lock before giving up, and is most applicable to update queries. Before data updates, the query must obtain an exclusive lock on affected records. The default is 0, meaning that the query will wait infinitely. This setting is generally used at the function or session level. lock_timeout should be lower than statement_timeout, otherwise statement_timeout will always occur first, making lock_timeout irrelevant.

idle_in_transaction_session_timeout

This is the amount of time a transaction can stay in an idle state before it is terminated. This defaults to 0, meaning it can stay alive infinitely. This setting is new in PostgreSQL 9.6. It’s useful for preventing queries from holding on to locks on data indefinitely or eating up a connection.

Check for Queries Being Blocked

The pg_stat_activity view has changed considerably since version 9.1 with the renaming, dropping, and addition of new columns. Starting from version 9.2, procpid was renamed to pid.

pg_stat_activity changed in PostgreSQL 9.6 to provide more detail about waiting queries. In prior versions of PostgreSQL, there was a field called waiting that could take the value true or false. true denoted a query that was being blocked waiting some resource, but the resource being waited for was never stated. In PostgreSQL 9.6, waiting was removed and replaced with wait_event_type and wait_event to provide more information about what resource a query was waiting for. Therefore, prior to PostgreSQL 9.6, use waiting = true to determine what queries are being blocked. In PostgreSQL 9.6 or higher, use wait_event IS NOT NULL.

In addition to the change in structure, PostgreSQL 9.6 will now track additional wait locks that did not get set to waiting=true in prior versions. As a result, you may find lighter lock waits being listed for queries than you saw in prior versions. For a list of different wait_event types, refer to PostgreSQL Manual: wait_event names and types.

Roles

PostgreSQL handles credentialing using roles. Roles that can log in are called login roles. Roles can also be members of other roles; the roles that contain other roles are called group roles. (And yes, group roles can be members of other group roles and so on, but don’t go there unless you have a knack for hierarchical thinking.) Group roles that can log in are called group login roles. However, for security, group roles generally cannot log in. A role can be designated as a superuser. These roles have unfettered access to the PostgreSQL service and should be assigned with discretion.

Warning

Recent versions of PostgreSQL no longer use the terms users and groups. You will still run into these terms; just know that they mean login roles and group roles, respectively. For backward compatibility, CREATE USER and CREATE GROUP still work in current versions, but shun them and use CREATE ROLE instead.

Creating Login Roles

When you initialize the data cluster during setup, PostgreSQL creates a single login role with the name postgres. (PostgreSQL also creates a namesake database called postgres.) You can bypass the password setting by mapping an OS root user to the new role and using ident, peer, or trust for authentication. After you’ve installed PostgreSQL, before you do anything else, you should log in as postgres and create other roles. pgAdmin has a graphical section for creating user roles, but if you want to create one using SQL, execute an SQL command like the one shown in Example 2-5.

Example 2-5. Creating login roles

CREATE ROLE leo LOGIN PASSWORD 'king' VALID UNTIL 'infinity' CREATEDB;

Specifying VALID UNTIL is optional. If omitted, the role remains active indefinitely. CREATEDB grants database creation privilege to the new role.

To create a user with superuser privileges, follow Example 2-6. Naturally, you must be a superuser to create other superusers.

Example 2-6. Creating superuser roles

CREATE ROLE regina LOGIN PASSWORD 'queen' VALID UNTIL '2020-1-1 00:00' SUPERUSER;

Both of the previous examples create roles that can log in. To create roles that cannot log in, omit the LOGIN PASSWORD clause.

Creating Group Roles

Group roles generally cannot log in. Rather, they serve as containers for other roles. This is merely a best-practice suggestion. Nothing stops you from creating a role that can log in as well as contain other roles.

Create a group role using the following SQL:

CREATE ROLE royalty INHERIT;

Note the use of the modifier INHERIT. This means that any member of royalty will automatically inherit privileges of the royalty role, except for the superuser privilege. For security, PostgreSQL never passes down the superuser privilege. INHERIT is the default, but we recommend that you always include the modifier for clarity.

To refrain from passing privileges from the group to its members, create the role with the NOINHERIT modifier.

To add members to a group role, you would do:

GRANT royalty TO leo;
GRANT royalty TO regina;

Some privileges can’t be inherited. For example, although you can create a group role that you mark as superuser, this doesn’t make its member roles superusers. However, those users can impersonate their group role by using the SET ROLE command, thereby gaining superuser privileges for the duration of the session. For example:

Let’s give the royalty role superuser rights with the command:

ALTER ROLE royalty SUPERUSER;

Although leo is a member of the royalty group and he inherits most rights of royalty, when he logs in, he still will not have superuser rights. He can gain superuser rights by doing:

SET ROLE royalty;

His superuser rights will last only for his current session.

This feature, though peculiar, is useful if you want to prevent yourself from unintentionally doing superuser things while you are logged in.

SET ROLE is a command available to all users, but a more powerful command called SET SESSION AUTHORIZATION is available to people who log in as superusers. In order to understand the differences, we’ll first introduce two global variables that PostgreSQL has called: current_user and session_user. You can see these values when you log in by running the SQL statement:

SELECT session_user, current_user;

When you first log in, the values of these two variables are the same. SET ROLE changes the current_user, while SET SESSION AUTHORIZATION changes both the current_user and session_user variables.

Here are the salient properties of SET ROLE:

SET ROLE does not require superuser rights.
SET ROLE changes the current_user variable, but not the session_user variable.
A session_user that has superuser rights can SET ROLE to any other role.
Nonsuperusers can SET ROLE only to the role the session_user is or the roles the session_user belongs to.
When you do SET ROLE you gain all privileges of the impersonated user except for SET SESSION_AUTHORIZATION and SET ROLE.

A more powerful command, SET SESSION AUTHORIZATION, is available as well. Key features of SET SESSION AUTHORIZATION are as follows:

Only a user that logs in as a superuser has permission to do SET SESSION AUTHORIZATION to another role.
The SET SESSION AUTHORIZATION privilege is in effect for the life of the session, meaning that even if you SET SESSION AUTHORIZATION to a user that is not a superuser, you still have the SET SESSION AUTHORIZATION privilege for the life of your session.
SET SESSION AUTHORIZATION changes the values of the current_user and session_user variables to those of the user being impersonated.
A session_user that has superuser rights can SET ROLE to any other role.

We’ll do a set of exercises that illustrate the differences between SET ROLE and SET SESSION AUTHORIZATION by first logging in as leo and then running the code in Example 2-7.

Example 2-7. SET ROLE and SET AUTHORIZATION

SELECT session_user, current_user;

 session_user | current_user
--------------+--------------
 leo          | leo
(1 row)

SET SESSION AUTHORIZATION regina;

ERROR:  permission denied to set session authorization

SET ROLE regina;

ERROR:  permission denied to set role "regina"

ALTER ROLE leo SUPERUSER;

ERROR:  must be superuser to alter superusers

SET ROLE royalty;
SELECT session_user, current_user;

 session_user | current_user
--------------+--------------
 leo          | royalty
(1 row)

SET ROLE regina;

ERROR:  permission denied to set role "regina"

ALTER ROLE leo SUPERUSER;
SET ROLE regina;
SELECT session_user, current_user;

 session_user | current_user
--------------+--------------
 leo          | regina
(1 row)

SET SESSION AUTHORIZATION regina;

ERROR:  permission denied to set session authorization

-- After ending session and logging back in as leo
SELECT session_user, current_user;
SET SESSION AUTHORIZATION regina;
SELECT session_user, current_user;

session_user | current_user
--------------+--------------
leo | leo
(1 row)
SET SESSION AUTHORIZATION
session_user | current_user
--------------+--------------
regina | regina
(1 row)

In Example 2-7 leo was unable to use SET SESSION AUTHORIZATION because he’s not a superuser. He was also unable to SET ROLE to regina because he is not in the regina group. However, he was able to SET ROLE royalty since he is a member of the royalty group (he’s a king consort). Even though royalty has superuser rights, he still wasn’t able to impersonate the queen, regina, because his SET ROLE abilities are still based on being the powerless leo. Since royalty is a group that has superuser rights, he was able to promote his own account leo to be a superuser. Once leo is promoted to power, he can then impersonate regina. He is now able to completely take over her session_user and current_user persona with SET SESSION AUTHORIZATION.

Database Creation

The minimum SQL command to create a database is:

CREATE DATABASE mydb;

This creates a copy of the template1 database. Any role with CREATEDB privilege can create new databases.

Template Databases

A template database is, as the name suggests, a database that serves as a skeleton for new databases. When you create a new database, PostgreSQL copies all the database settings and data from the template database to the new database.

The default PostgreSQL installation comes with two template databases: template0 and template1. If you don’t specify a template database to follow when you create a database, template1 is used.

Warning

You should never alter template0 because it is the immaculate model that you’ll need to copy from if you screw up your templates. Make your customizations to template1 or a new template database you create. You can’t change the encoding and collation of a database you create from template1 or any other template database you create. So if you need a different encoding or collation from those in template1, create the database from template0.

The basic syntax to create a database modeled after a specific template is:

CREATE DATABASE my_db TEMPLATE my_template_db;

You can pick any database to serve as the template. This could come in quite handy when making replicas. You can also mark any database as a template database. Once you do, the database is no longer editable and deletable. Any role with the CREATEDB privilege can use a template database. To make any database a template, run the following SQL as a superuser:

UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'mydb';

If ever you need to edit or drop a template database, first set the datistemplate attribute to FALSE. Don’t forget to change the value back after you’re done with edits.

Using Schemas

Schemas organize your database into logical groups. If you have more than two dozen tables in your database, consider cubbyholing them into schemas. Objects must have unique names within a schema but need not be unique across the database. If you cram all your tables into the default public schema, you’ll run into name clashes sooner or later. It’s up to you how to organize your schemas. For example, if you are an airline, you can place all tables of planes you own and their maintenance records into a planes schema. Place all your crew and staff into an employees schema and place all passenger-related information into a passengers schema.

Another common way to organize schemas is by roles. We found this to be particularly handy with applications that serve multiple clients whose data must be kept separate.

Suppose that you started a dog beauty management business (doggie spa). You start with a table in public called dogs to track all the dogs you hope to groom. You convince your two best friends to become customers. Whimsical government privacy regulation passes, and now you have to put in iron-clad assurances that one customer cannot see dog information from another. To comply, you set up one schema per customer and create the same dogs table in each as follows:

CREATE SCHEMA customer1;
CREATE SCHEMA customer2;

You then move the dog records into the schema that corresponds with the client. The final touch is to create different login roles for each schema with the same name as the schema. Dogs are now completely isolated in their respective schemas. When customers log in to your database to make appointments, they will be able to access only information pertaining to their own dogs.

Wait, it gets better. Because we named our roles to match their respective schemas, we’re blessed with another useful technique. But we must first introduce the search_path database variable.

As we mentioned earlier, object names must be unique within a schema, but you can have same-named objects in different schemas. For example, you have the same table called dogs in all 12 of your schemas. When you execute something like SELECT * FROM dogs, how does PostgreSQL know which schema you’re referring to? The simple answer is to always prepend the schema name onto the table name with a dot, such as in SELECT * FROM customer1.dogs. Another method is to set the search_path variable to be something like customer1, public. When the query executes, the planner searches for the dogs table first in the customer1 schema. If not found, it continues to the public schema and stops there.

PostgreSQL has a little-known variable called user that retrieves the role currently logged in. SELECT user returns this name. user is just an alias for current_user, so you can use either.

Recall how we named our customers’ schemas to be the same as their login roles. We did this so that we can take advantage of the default search path set in postgresql.conf:

search_path = "$user", public;

Now, if role customer1 logs in, all queries will first look in the customer1 schema for the tables before moving to public. Most importantly, the SQL remains the same for all customers. Even if the business grows to have thousands or hundreds of thousands of dog owners, none of the SQL scripts need to change. Commonly shared tables such as common lookup tables can be put in the public schema.

Another practice that we strongly encourage is to create schemas to house extensions (“Step 2: Installing into a database”). When you install an extension, new tables, functions, data types, and plenty of other relics join your server. If they all swarm into the public schema, it gets cluttered. For example, the entire PostGIS suite of extensions will together add thousands of functions. If you’ve already created a few tables and functions of your own in the public schema, imagine how maddening it would be to scan a list of tables and functions trying to find your own among the thousands.

Before you install any extensions, create a new schema:

CREATE SCHEMA my_extensions;

Then add your new schema to the search path:

ALTER DATABASE mydb SET search_path='$user', public, my_extensions;

When you install extensions, be sure to indicate your new schema as their new home.

Warning

ALTER DATABASE .. SET search_path will not take effect for existing connections. You’ll need to reconnect.

Privileges

Privileges (often called permissions) can be tricky to administer in PostgreSQL because of the granular control at your disposal. Security can bore down to the column and row level. Yes! You can assign different privileges to each data point of your table, if that ever becomes necessary.

Note

Row-level security (RLS) first appeared in PostgreSQL 9.5. Although RLS is available on all PostgreSQL installations, when used in SELinux, certain advanced features are enabled.

Teaching you all there is to know about privileges could take a few chapters. What we’ll aim for in this section instead is to give you enough information to get up and running and to guide you around some of the more nonintuitive land mines that could either lock you out completely or expose your server inappropriately.

Privilege management in PostgreSQL is no cakewalk. The pgAdmin graphical administration tool can ease some of the tasks or, at the very least, paint you a picture of your privilege settings. You can accomplish most, if not all, of your privilege assignment tasks in pgAdmin. If you’re saddled with the task of administering privileges and are new to PostgreSQL, start with pgAdmin. Jump to “Creating Database Assets and Setting Privileges” if you can’t wait.

Types of Privileges

PostgreSQL has a few dozen privileges, some of which you may never need to worry about. The more mundane privileges are SELECT, INSERT, UPDATE, ALTER, EXECUTE, DELETE, and TRUNCATE.

Most privileges must have a context. For example, a role having an ALTER privilege is meaningless unless qualified with a database object such as ALTER privilege on tables1, SELECT privilege on table2, EXECUTE privilege on function1, and so on. Not all privileges apply to all objects: an EXECUTE privilege for a table is nonsense.

Some privileges make sense without a context. CREATEDB and CREATE ROLE are two privileges where context is irrelevant.

Note

Privileges in other database products might be called rights or permissions.

Getting Started

So you successfully installed PostgreSQL; you should have one superuser, whose password you know by heart. Now you should take the following steps to set up additional roles and assign privileges:

PostgreSQL creates one superuser and one database for you at installation, both named postgres. Log in to your server as postgres.
Before creating your first database, create a role that will own the database and can log in, such as:
```
CREATE ROLE mydb_admin LOGIN PASSWORD 'something';
```

Create the database and set the owner:

CREATE DATABASE mydb WITH owner = mydb_admin;

Now log in as the mydb_admin user and start setting up additional schemas and tables.

GRANT

The GRANT command is the primary means to assign privileges. Basic usage is:

GRANT some_privilege TO some_role;

A few things to keep in mind when it comes to GRANT:

Obviously, you need to have the privilege you’re granting. And, you must have the GRANT privilege yourself. You can’t give away what you don’t have.
Some privileges always remain with the owner of an object and can never be granted away. These include DROP and ALTER.
The owner of an object retains all privileges. Granting an owner privilege in what it already owns is unnecessary. Keep in mind, though, that ownership does not drill down to child objects. For instance, if you own a database, you may not necessarily own all the schemas within it.
When granting privileges, you can add WITH GRANT OPTION. This means that the grantee can grant her own privileges to others, passing them on:
```
GRANT ALL ON ALL TABLES IN SCHEMA public TO mydb_admin WITH GRANT OPTION;
```
To grant specific privileges on ALL objects of a specific type use ALL instead of the specific object name, as in:
```
GRANT SELECT, REFERENCES, TRIGGER ON 
ALL TABLES IN SCHEMA my_schema TO 
PUBLIC;
```
Note that ALL TABLES includes regular tables, foreign tables, and views.
To grant privileges to all roles, you can use the PUBLIC alias, as in:
```
GRANT USAGE ON SCHEMA my_schema TO PUBLIC;
```

The GRANT command is covered in detail in GRANT. We strongly recommend that you take the time to study this document before you inadvertently knock a big hole in your security wall.

Some privileges are, by default, granted to PUBLIC. These are CONNECT and CREATE TEMP TABLE for databases and EXECUTE for functions. In many cases you might consider revoking some of the defaults with the REVOKE command, as in:

REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA my_schema FROM PUBLIC;

Default Privileges

Default privileges ease privilege management by letting you set privileges before their creation.

Warning

Adding or changing default privileges won’t affect privilege settings on existing objects.

Let’s suppose we want all users of our database to have EXECUTE and SELECT privileges access to any future tables and functions in a particular schema. We can define privileges as shown in Example 2-8. All roles of a PostgreSQL server are members of the group PUBLIC.

Example 2-8. Defining default privileges on a schema

GRANT USAGE ON SCHEMA my_schema TO PUBLIC; 

ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema
GRANT SELECT, REFERENCES ON TABLES TO PUBLIC; 

ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema
GRANT ALL ON TABLES TO mydb_admin WITH GRANT OPTION;  

ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema  
GRANT SELECT, UPDATE ON SEQUENCES TO public;

ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema 
GRANT ALL ON FUNCTIONS TO mydb_admin WITH GRANT OPTION;

ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema 
GRANT USAGE ON TYPES TO PUBLIC;

: Allows all users that can connect to the database to also be able to use and create objects in a schema if they have rights to those objects in the schema. GRANT USAGE on a schema is the first step to granting access to objects in the schema. If a user has rights to select from a table in a schema but no USAGE on the schema, then he will not be able to query the table.
: Grant read and reference rights (the ability to create foreign key constraints against columns in a table) for all future tables created in a schema to all users that have USAGE of the schema.
: GRANT ALL permissions on future tables to role mydb_admin. In addition, allow members in mydb_admin to be able to grant a subset or all privileges to other users to future tables in this schema. GRANT ALL gives permission to add/update/delete/truncate rows, add triggers, and create constraints on the tables.
: GRANT permissions on future sequences, functions, and types.

To read more about default privileges, see ALTER DEFAULT PRIVILEGES.

Privilege Idiosyncrasies

Before we unleash you to explore privileges on your own, we do want to point out a few quirks that may not be apparent.

Unlike in other database products, being the owner of a PostgreSQL database does not give you access to all objects in the database. Another role could conceivably create a table in your database and deny you access to it! However, the privilege to drop the entire database could never be wrestled away from you.

After granting privileges to tables and functions with a schema, don’t forget to grant usage on the schema itself.

Extensions

Extensions, formerly called contribs, are add-ons that you can install in a PostgreSQL database to extend functionality beyond the base offerings. They exemplify the best of open source software: people collaborating, building, and freely sharing new features. Since version 9.1, the new extension model has made adding extensions a cinch.

Tip

Older add-ons outside the extension model are still called contribs, but with an eye toward the future, we’ll call them all extensions.

Not all extensions need to be in all databases. You should install extensions to your individual database on an as-needed basis. If you want all your databases to have a certain set of extensions, you can develop a template database, as discussed in “Template Databases”, with all the extensions installed, and then beget future databases from that template.

Occasionally prune extensions that you no longer need to avoid bloat. Leaving old extensions you don’t need may cause problems during an in-place upgrade since all extensions you have installed must be also installed in the new PostgreSQL version you are upgrading to.

To see which extensions you have already installed in a database, connect to the database and run the query in Example 2-9. Your list could vary significantly from ours.

Example 2-9. Extensions installed in a database

SELECT name, default_version, installed_version, left(comment,30) As comment
FROM pg_available_extensions
WHERE installed_version IS NOT NULL
ORDER BY name;

     name      | default_version | installed_version |            comment
---------------+-----------------+-------------------+--------------------------------
 btree_gist    | 1.5             | 1.5               | support for indexing common da
 fuzzystrmatch | 1.1             | 1.1               | determine similarities and dis
 hstore        | 1.4             | 1.4               | data type for storing sets of
 ogr_fdw       | 1.0             | 1.0               | foreign-data wrapper for GIS d
 pgrouting     | 2.4.1           | 2.4.1             | pgRouting Extension
 plpgsql       | 1.0             | 1.0               | PL/pgSQL procedural language
 plv8          | 1.4.10          | 1.4.10            | PL/JavaScript (v8) trusted pro
 postgis       | 2.4.0dev        | 2.4.0dev          | PostGIS geometry, geography, a
(8 rows)

If you want to see all the extensions installed on the server, regardless of if they are installed in your current database, leave out the WHERE installed_version IS NOT NULL.

To get more details about a particular extension already installed in your database, enter the following command from psql:

\dx+ fuzzystrmatch

Alternatively, execute the following query:

SELECT pg_describe_object(D.classid,D.objid,0) AS description
FROM pg_catalog.pg_depend AS D INNER JOIN pg_catalog.pg_extension AS E
ON D.refobjid = E.oid
WHERE
D.refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass AND
deptype = 'e' AND
E.extname = 'fuzzystrmatch';

This shows what’s packaged in the extension:

description
---------------------------------------------------------------------------
function dmetaphone_alt(text)
function dmetaphone(text)
function difference(text,text)
function text_soundex(text)
function soundex(text)
function metaphone(text,integer)
function levenshtein_less_equal(text,text,integer,integer,integer,integer)
function levenshtein_less_equal(text,text,integer)
function levenshtein(text,text,integer,integer,integer)
function levenshtein(text,text)

Extensions can include database assets of all types: functions, tables, data types, casts, languages, operators, etc., but functions usually constitute the bulk of the payload.

Installing Extensions

Getting an extension into your database takes two installation steps. First, download the extension and install it onto your server. Second, install the extension into your database.

Tip

We’ll be using the same term—install—to refer to both procedures but distinguish between the installation on the server and the installation into the database when the context is unclear.

We cover both steps in this section as well as how to install on PostgreSQL versions prior to extension support.

Step 1: Installing on the server

The installation of extensions on your server varies by OS. The overall idea is to download binary files and requisite libraries, then copy the respective binaries to the bin and lib folders and the script files to share/extension (versions 9.1 and above) or share/contrib (prior to version 9.1). This makes the extension available for the second step.

SELECT * FROM pg_available_extensions;

Step 2: Installing into a database

The extension support makes installation of added features simple. Use the CREATE EXTENSION command to install extensions into each database. The three big benefits are that you don’t have to figure out where the extension files are kept (share/extension), you can uninstall them at will using DROP EXTENSION, and you will have a readily available listing of what is installed and what is available.

PostgreSQL installation packages already include the most popular extensions. To retrieve additional extensions, visit the PostgreSQL Extension Network. You’ll also find many PostgreSQL extensions on GitHub by searching for postgresql extension.

Here is how we would install the fuzzystrmatch extension using SQL:

CREATE EXTENSION fuzzystrmatch;

You can still install an extension noninteractively using psql. Make sure you’re connected to the database where you need the extension, then run:

psql -p 5432 -d mydb -c "CREATE EXTENSION fuzzystrmatch;"

Warning

C-based extensions must be installed by a superuser. Most extensions fall into this category.

We strongly suggest you create one or more schemas to house extensions to keep them separate from production data. After you create the schema, install extensions into it through a command like the following:

CREATE EXTENSION fuzzystrmatch SCHEMA my_extensions;

Upgrading to the new extension model

If you’ve been using a version of PostgreSQL older than 9.1 and restored your old database into version 9.1 or later during a version upgrade, all extensions should continue to function without intervention. For maintainability, you should upgrade your old extensions in the contrib folder to use the new approach to extensions. You can upgrade extensions, especially the ones that come packaged with PostgreSQL, from the old contrib model to the new one. Remember that we’re referring only to the upgrade in the installation model, not to the extension itself.

For example, suppose you had installed the tablefunc extension (for cross-tab queries) to your PostgreSQL 9.0 in a schema called contrib, and you have just restored your database to a 9.1 server. Run the following command to upgrade:

CREATE EXTENSION tablefunc SCHEMA contrib FROM unpackaged;

This command searches through contrib schema (assuming this is where you placed all the extensions), retrieves all components of the extension, and repackages them into a new extension object so it appears in the pg_available_extensions list as being installed.

This command leaves the old functions in the contrib schema intact but removes them from being a part of a database backup.

Common Extensions

Many extensions come packaged with PostgreSQL but are not installed by default. Some past extensions have gained enough traction to become part of the PostgreSQL core. If you’re upgrading from an ancient version, you may gain functionality without needing any extensions.

Popular extensions

Since version 9.1, PostgreSQL prefers the extension model to deliver all add-ons. These include basic extensions consisting only of functions and types, as well as PLs, index types, and FDWs. In this section we list the most popular extensions (some say, “must-have” extensions) that PostgreSQL doesn’t install into your database by default. Depending on your PostgreSQL distribution, you’ll find many of these already available on your server:

btree_gist: Provides GiST index operator classes that implement B-Tree equivalent behavior for common B-Tree services data types. See “PostgreSQL Stock Indexes” for more details.
btree_gin: Provides GIN index operator classes that implement B-Tree equivalent behavior for common B-Tree serviced data types. See “PostgreSQL Stock Indexes” for more details.
postgis: Elevates PostgreSQL to a state-of-the-art spatial database outrivaling all commercial options. If you deal with standard OGC GIS data, demographic statistics data, or geocoding, 3d data, or even raster data, you don’t want to be without this one. You can learn more about PostGIS in our book PostGIS in Action. PostGIS is a whopper of an extension, weighing in at more than 800 functions, types, and spatial indexes. PostGIS is so big it has extensions that extend it. There exist extensions on top of PostGIS such as those included with PostGIS itself. In addition, there is pgpointcloud for managing point clouds and pgRouting for network routing, which are packaged separately.
fuzzystrmatch: A lightweight extension with functions such as soundex, levenshtein, and metaphone algorithms for fuzzy string matching. We discuss its use in Where is Soundex and Other Fuzzy Things.
hstore: An extension that adds key-value pair storage and index support, well-suited for storing pseudonormalized data. If you are looking for a comfortable medium between a relational database and NoSQL, check out hstore. Usage of hstore in many cases has been replaced with the built-in jsonb type. So this extension isn’t as popular as it used to be.
pg_trgm (trigram): Another fuzzy string search library, used in conjunction with fuzzystrmatch. It includes an operator class, making searches using the ILIKE operator indexable. trigram can also allow wildcard searches in the form of LIKE %something%' or regular expression searches such as somefield ~ '(foo|bar)' to utilize an index. See Teaching ILIKE and LIKE New Tricks for further discussion.
dblink: Allows you to query a PostgreSQL database on another server. Prior to the introduction of FDWs in version 9.3, this was the only supported mechanism for cross-database interactions. It remains useful for one-time connections or ad hoc queries, especially where you need to call functions on the foreign server. Prior to PostgreSQL 9.6, postgres_fdw doesn’t allow a statement to call functions on the foreign server, only local ones. In PostgreSQL 9.6 you can call functions defined in an extension if you denote in the foreign server that the server has that extension installed.
pgcrypto: Provides encryption tools, including the popular PGP. It’s handy for encrypting top-secret information stored in the database. See our quick primer on it at Encrypting Data with pgcrypto.

Classic extensions

Here are a few venerable ex-extensions that have gained enough of a following to make it into official PostgreSQL releases. We call them out here because you could still run into them as separate extensions on older servers:

tsearch: A suite of indexes, operators, custom dictionaries, and functions that enhance FTSs. It is now part of PostgreSQL proper. If you’re still relying on behavior from the old extension, you can install tsearch2. A better tactic would be just to update servers where you’re using the old functions, because compatibility could end at any time.
xml: An extension that added an XML data type, related functions, and operators. The XML data type is now an integral part of PostgreSQL, in part to meet the ANSI SQL XML standard. The old extension, now dubbed xml2, can still be installed and contains functions that didn’t make it into the core. In particular, you need this extension if you relied on the xlst_process function for processing XSL templates. There are also a couple of old XPath functions only found in xml2.

Backup and Restore

PostgreSQL ships with three utilities for backup: pg_dump, pg_dumpall, and pg_basebackup. You’ll find all of them in the PostgreSQL bin folder.

Use pg_dump to back up specific databases. To back up all databases in plain text along with server globals, use pg_dumpall, which needs to run under a superuser account so that it back up all databases. Use pg_basebackup to do system-level disk backup of all databases.

For the rest of this section, we’ll focus our discussion on using pg_dump and pg_dumpall. pg_basebackup is the most efficient way of doing a full postgresql server cluster backup. If you have a reasonably sized database, as in 500 GB or more, you should be using pg_basebackup as part of your backup strategy. pg_basebackup, however, requires enabling of features that are often turned off, but that are also needed for replication, so we’ll save discussion of pg_basebackup for “Setting Up Full Server Replication”.

Most of the command-line options for these tools exist both in GNU style (two hyphens plus a word) and the traditional single-letter style (one hyphen plus an alphabetic character). You can use both styles interchangeably, even in the same command. We’ll be covering just the basics here; for a more in-depth discussion, see the PostgreSQL documentation Backup and Restore.

In this section we will not discuss third-party tools that are often used for PostgreSQL backup and restore. Two popular open source ones you might want to consider are pgBackRest and Barman. These offer additional features like backup scheduling, multiserver support, and restore shortcuts.

As you wade through this section, you’ll find that we often specify the port and host in our examples. This is because we often run backups for a different server as scheduled jobs using pgAgent, as discussed in “Job Scheduling with pgAgent”. We often have multiple instances of PostgreSQL running on the same machine, on different ports as well. Sometimes specifying the host can cause problems if your service is set to listen only on localhost. You can safely leave out the host if you are running the examples directly on the server.

You may also want to create a ~/.pgpass file to store all passwords. pg_dump and pg_dumpall don’t have password options. Alternatively, you can set a password in the PGPASSWORD environment variable.

Selective Backup Using pg_dump

For day-to-day backup, pg_dump is more expeditious than pg_dumpall because pg_dump can selectively back up tables, schemas, and databases. pg_dump can back up to plain SQL, as well as compressed, TAR, and directory formats. Compressed, TAR, and directory format backups can take advantage of the parallel restore feature of pg_restore. Directory backups allow parallel pg_dump of a large database. Because we believe you’ll be using pg_dump as part of your daily regimen, we have included a full dump of the help in “Database Backup Using pg_dump” so you can see the myriad switches in a single glance.

The next examples demonstrate a few common backup scenarios and corresponding pg_dump options. They should work for any version of PostgreSQL.

To create a compressed, single database backup:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -f mydb.backup mydb

To create a plain-text single database backup, including a -C option, which stands for CREATE DATABASE:

pg_dump -h localhost -p 5432 -U someuser -C -F p -b -v -f mydb.backup mydb

To create a compressed backup of tables whose names start with pay in any schema:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -t *.pay* -f pay.backup mydb

To create a compressed backup of all objects in the hr and payroll schemas:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v \
-n hr -n payroll -f hr.backup mydb

To create a compressed backup of all objects in all schemas, excluding the public schema:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -N public \
-f all_sch_except_pub.backup mydb

To create a plain-text SQL backup of select tables, useful for porting structure and data to lower versions of PostgreSQL or non-PostgreSQL databases (plain text generates an SQL script that you can run on any system that speaks SQL):

pg_dump -h localhost -p 5432 -U someuser -F p --column-inserts \
-f select_tables.backup mydb

Tip

If your file paths contain spaces or other characters that could throw off the command-line interpreter, wrap the file path in double quotes: "/path with spaces/mydb.backup". As a general rule, you can always use double quotes if you aren’t sure.

The directory format option was introduced in version PostgreSQL 9.1. This option backs up each table as a separate file in a folder and gets around file size limitations. This option is the only pg_dump backup format option that results in multiple files, as shown in Example 2-10. It creates a new directory and populates it with a gzipped file for each table; also included is a file listing the hierarchy. This backup command exits with an error if the directory already exists.

Example 2-10. Directory format backup

pg_dump -h localhost -p 5432 -U someuser -F d -f /somepath/a_directory mydb

A parallel backup option was introduced in version 9.3 using the --jobs or -j option and specifying the number of jobs. For example: --jobs=3 (-j 3) runs three backups in parallel. Parallel backup makes sense only with the directory format option, because it’s the only backup where multiple files are created. Example 2-11 demonstrates its use.

Example 2-11. Directory format parallel backup

pg_dump -h localhost -p 5432 -U someuser -j 3 -Fd -f /somepath/a_directory mydb

Systemwide Backup Using pg_dumpall

Use the pg_dumpall utility to back up all databases on a server into a single plain-text file. This comprehensive backup automatically includes server globals such as tablespace definitions and roles. See “Server Backup: pg_dumpall” for a listing of available pg_dumpall command options.

It’s a good idea to back up globals on a daily basis. Although you can use pg_dumpall to back up databases as well, we prefer backing up databases individually using pg_dump or using pg_basebackup to do a PostgreSQL service-level backup. Restoring from a huge plain-text backup tries our patience. Using pg_basebackup in conjunction with streaming replication is the fastest way to recover from major server failure.

To back up all globals and tablespace definitions only, use the following:

pg_dumpall -h localhost -U postgres --port=5432 -f myglobals.sql --globals-only

To back up specific global settings, use the following:

pg_dumpall -h localhost -U postgres --port=5432 -f myroles.sql --roles-only

Restoring Data

There are two ways to restore data in PostgreSQL from backups created with pg_dump or pg_dumpall:

Use psql to restore plain-text backups generated with pg_dumpall or pg_dump.
Use pg_restore to restore compressed, TAR, and directory backups created with pg_dump.

Using psql to restore plain-text SQL backups

A plain SQL backup is nothing more than a text file containing a hefty SQL script. It’s the least convenient backup to have, but it’s the most versatile. With SQL backup, you must execute the entire script. You can’t cherry-pick objects unless you’re willing to manually edit the file. Run all of the following examples from the OS console or psql.

To restore a backup and ignore errors:

psql -U postgres -f myglobals.sql

To restore, stopping if any error is found:

psql -U postgres --set ON_ERROR_STOP=on -f myglobals.sql

To restore to a specific database:

psql -U postgres -d mydb -f select_objects.sql

Using pg_restore

If you backed up using pg_dump and chose a format such as TAR, custom, or directory, you have to use the pg_restore utility to restore. pg_restore provides a dizzying array of options, far surpassing the restore utility found in other database products we’ve used. Some of its outstanding features include:

You can perform parallel restores using the -j (equivalent to --jobs=) option to indicate the number of threads to use. This allows each thread to restore a separate table simultaneously, significantly picking up the pace of what could otherwise be a lengthy process.
You can use pg_restore to generate a table of contents file from your backup file to check what has been backed up. You can also edit this table of contents and use the revised file to control what gets restored.
pg_restore allows you to selectively restore, even from within a backup of a full database. If you just need one table restored, you can do that.
pg_restore is backward-compatible, for the most part. You can back up a database on an older version of PostgreSQL and restore to a newer version.

See “Database Restore: pg_restore” for a listing of pg_restore options.

To perform a restore using pg_restore, first create the database anew using SQL:

CREATE DATABASE mydb;

Then restore:

pg_restore --dbname=mydb --jobs=4 --verbose mydb.backup

If the name of the database is the same as the one you backed up, you can create and restore the database in one step:

pg_restore --dbname=postgres --create --jobs=4 --verbose mydb.backup

When you use the --create option, the database name is always the name of the one you backed up. You can’t rename it. If you’re also using the --dbname option, that database name must be different from the name of the database being restored. We usually just specify the postgres database.

Normally, a restore will not re-create objects already present in a database. If you have data in the database, and you want to replace it with what’s in the backup, you need to add the --clean switch to the pg_restore command. This will cause objects to be dropped from the current database so that restore can re-create them.

Warning

If you restore over an existing database, the content of the backup may replace things in your current database. Be careful during a restore: don’t accidentally pick the wrong backup file or the wrong database to restore to!

With PostgreSQL 9.2 or later, you can take advantage of the --section option to restore just the structure without the data. This is useful if you want to use an existing database as a template for a new one. To do so, first create the target database:

CREATE DATABASE mydb2;

Then use pg_restore:

pg_restore --dbname=mydb2 --section=pre-data --jobs=4 mydb.backup

Managing Disk Storage with Tablespaces

PostgreSQL uses tablespaces to ascribe logical names to physical locations on disk. Initializing a PostgreSQL cluster automatically begets two tablespaces: pg_default, which stores all user data, and pg_global, which stores all system data. These are located in the same folder as your default data cluster. You’re free to create tablespaces at will and house them on any serverdisks. You can explicitly assign default tablespaces for new objects by database. You can also move existing database objects to new ones.

Creating Tablespaces

To create a new tablespace, specify a logical name and a physical folder and make sure that the postgres service account has full access to the physical folder. If you are on a Windows server, use the following command (note the use of Unix-style forwardslashes):

CREATE TABLESPACE secondary LOCATION 'C:/pgdata94_secondary';

For Unix-based systems, you first must create the folder or define an fstab location, then use this command:

CREATE TABLESPACE secondary LOCATION '/usr/data/pgdata94_secondary';

Moving Objects Among Tablespaces

You can shuffle database objects among different tablespaces. To move all objects in the database to your secondary tablespace, issue the following SQL command:

ALTER DATABASE mydb SET TABLESPACE secondary;

To move just one table:

ALTER TABLE mytable SET TABLESPACE secondary;

New in PostgreSQL 9.4 is the ability move a group of objects from one tablespace to another. If the role running the command is a superuser, all objects will be moved. If not, only the owned objects will be moved.

To move all objects from default tablespace to secondary use:

ALTER TABLESPACE pg_default MOVE ALL TO secondary;

During the move, your database or table will be locked.

Verboten Practices

We have acted as first responders to many PostgreSQL accidents, so we thought it best to end this chapter by itemizing the most common mistakes.

For starters, if you don’t know what you did wrong, the logfile could provide clues. Look for the pg_log folder in your PostgreSQL data folder or the root of the PostgreSQL data folder. It’s also possible that your server shut down before a log entry could be written, in which case the log won’t help you. If your server fails to restart, try the following from the OS command line:

path/to/your/bin/pg_ctl -D your_postgresql_data_folder

Don’t Delete PostgreSQL Core System Files and Binaries

Perhaps this is stating the obvious, but when people run out of disk space, the first thing they do is start deleting files from the PostgreSQL data cluster folder because it’s so darn big. Part of the reason this mistake happens so frequently is that some folders sport innocuous names such as pg_log, pg_xlog, and pg_clog. Yes, there are some files you can safely delete, but unless you know precisely which ones, you could end up destroying your data.

The pg_log folder, often found in your data folder, is a folder that builds up quickly, especially if you have logging enabled. You can always purge files from this folder without harm. In fact, many people schedule jobs to remove logfiles on a regular basis.

Files in the other folders, except for pg_xlog, should never be deleted, even if they have log-sounding names. Don’t even think of touching pg_clog, the active commit log, unless you want to invite disaster.

pg_xlog stores transaction logs. Some systems we’ve seen are configured to move processed transaction logs into a subfolder called archive. You’ll often have an archive folder somewhere (not necessarily as a subfolder of pg_xlog) if you are running synchronous replication, doing continuous archiving, or just keeping logs around in case you need to revert to a different point in time. Deleting files in the root of pg_xlog will mess up the process. Deleting files in the archived folder will just prevent you from performing point-in-time recovery, or if a slave server hasn’t played back the logs, will prevent the slave from fetching them. If these scenarios don’t apply to you, it’s safe to remove files in the archive folder.

Be leery of overzealous antivirus programs, especially on Windows. We’ve seen cases in which antivirus software removed important binaries in the PostgreSQL bin folder. If PostgreSQL fails to start on a Windows system, the event viewer is the first place to look for clues as to why.

Note

In version 10, the pg_xlog folder was renamed to pg_wal and pg_clog was renamed to pg_xact to prevent people from thinking these are log folders where contents can be deleted without destructive consequences.

Don’t Grant Full OS Administrative Privileges to the Postgres System Account (postgres)

Many people are under the misconception that the postgres account needs to have full administrative privileges to the server. In fact, depending on your PostgreSQL version, if you give the postgres account full administrative privileges to the server, your database server might not even start.

The postgres account should always be created as a regular system user in the OS with privileges just to the data cluster and additional tablespace folders. Most installers will set up the correct permissions without you needing to worry. Don’t try to do postgres any favors by giving it more access than it needs. Granting unnecessary access leaves your system vulnerable if you fall victim to an SQL injection attack.

There are cases where you’ll need to give the postgres account write/delete/read rights to folders or executables outside of the data cluster. With scheduled jobs that execute batch files and FDWs that have foreign tables in files, this need often arises. Practice restraint and bestow only the minimum access necessary to get the job done.

Don’t Set shared_buffers Too High

Loading up your server with RAM doesn’t mean you can set the shared_buffers as high as your physical RAM. Try it and your server may crash or refuse to start. If you are running PostgreSQL on 32-bit Windows, setting it higher than 512 MB often results in instability. With 64-bit Windows, you can push the envelope higher, and can even exceed 8 GB without any issues. On some Linux systems, shared_buffers can’t be higher than the SHMMAX variable, which is usually quite low.

PostgreSQL 9.3 changed how kernel memory is used, so that many of the issues people ran into with limitations in prior versions are no longer issues. You can find more details in Kernel Resources.

Don’t Try to Start PostgreSQL on a Port Already in Use

If you try to start PostgreSQL on a port that’s already in use, you’ll see errors in your pg_log files of the form: make sure PostgreSQL is not already running. Here are the common reasons why this happens:

You’ve already started the postgres service.
You are trying to run PostgreSQL on a port already in use by another service.
Your postgres service had a sudden shutdown and you have an orphan postgresql.pid file in the data folder. Delete the file and try again.
You have an orphaned PostgreSQL process. When all else fails, kill all running PostgreSQL processes and then try starting again.