PostgreSQL provides an activity-tracking module called the statistics collector, which tracks table access and other internal events. If your database is experiencing long wait times, you may be able to use this tool and some simple SQL to find and fix the problem.
Tracking Postgres Activity With the Statistics Collector Tool
The statistics collector is designed to keep records about internal activity in a Postgres database manager. It can:
- count table and index access in both disk-block and individual-row terms.
- track the total number of rows in each table.
- store information about admin actions (like vacuum and analyze) for each table.
- count calls to user-defined functions and the total time spent in each one.
This tool has many tables; we’ll focus on
pg_stat_database. Both are shown in the following screenshot, along with the
pg_locks table. The
pg_locks table is not part of the collector data model, but it can be joined with collector tables.
Setting the Table With SQL
In order to have something to be monitored, we’ll begin by creating a table. Our table will have 20,000 records for people participating in a (fictional) medical drug study. The following SQL code creates this table, which we’ll name
CREATE TABLE people ( identification_no integer, full_name varchar(50), weight float, height float, test_result int /* 0 to 10 */ ); insert into people(identification_no,full_name) select a,'John Smith'||a from generate_series(1,20000,1) a; update people set weight = normal_rand(1,90,40), height = normal_rand(1,70,20), test_result = random()*10+1;
As a side note, it is interesting to see how we populated the table with some pretty real data using only two SQLs. First we used the generate_series() table function (which returns multiple rows) to obtain 20,000 IDs and names. Then we populated the height and weight columns with values according to a Gaussian normal distribution, using mean=90 and variance=40 for weight and mean=70 and variance=20 for height. For the
test_result column, we used a random integer belonging to [1,10].
Creating a Very Slow Query
Once we have the
people table fully populated, we will run a many times self Cartesian product like the following:
SELECT count(*) FROM people p1, people p2, people p3, people p4;
This query represents a 4 times self Cartesian product for the table “people” with itself. The number of rows to be counted is 20,000 (^ 4). So applying some math concepts, this is equal to:
20,000 ^ 4 rows = ((2 * 10) ^4) ^4 rows = 2 * 10 ^ 16 rows = 20,000,000,000,000,000 (20 quadrillion) rows.
We can be sure that this query will need more than a few minutes to finish!
Detecting a Slow Query
Now that our slow query is running, we’ll start to explore the Postgres statistic collector. We can activate it by setting a configuration parameter. (The configuration process won’t be covered here; there is plenty of information online explaining how to activate the PostgreSQL statistic collector.) Once this is set, some catalog tables will be populated automatically by the Postgres engine.
One of the most important tables is
pg_stat_activity. It has one record per each running query in the database. Its most important columns are:
query: query sql text,
datname: database name,
query_start: query start time
What can the statistics collector tell us about our running queries? Check out the screenshot below:
select now() - query_start AS elapsed, query AS sql_text, datname AS database, usename AS username from pg_stat_activity where now() - query_start > '00:01:00' /* we only want queries lasting more than one minute */ and state = 'active' order by 1 desc
The results are:
|"00:24:00"||"SELECT count(*) FROM people p1, people p2, people p3, people p4"||"postgres"||"postgres"|
Note that we calculated the elapsed query time by finding the difference between the current time, now(), and the query’s start time.
Next, we will run a less aggressive Cartesian product like...
SELECT count(*) FROM people p1, people p2, people p3;
… which gives these results:
|"00:30:20"||"SELECT count(*) FROM people p1, people p2, people p3, people p4"||"postgres"||"postgres"|
|"00:01:11"||"SELECT count(*) FROM people p1, people p2, people p3"||"postgres"||"postgres"|
Using SQL to Fine-Tune Shared Buffers
All database engines use an area of RAM to store all blocks read from disk. This lets them avoid accessing the same disk block the next time it is required. In Postgres, this is known as shared buffers.
The ratio between the quantity of pages found in the shared buffers and the quantity of blocks read from disk (a miss in the shared buffer) is one of the main ways to judge database performance. The
pg_stat_database also has other metrics, like the quantity of fetched/inserted/deleted/updated tuples, disk block reads, and deadlocks. The following screenshot shows the pg_stat_database schema.
The following query returns these performance metrics:
select datname, (blks_hit::numeric / (blks_hit + blks_read)::numeric)*100 as read_hit_percentage,* from pg_stat_database where blks_hit + blks_read > 0;
blks_hitcolumn shows the quantity of disk blocks found in the shared buffer area.
blks_readcolumn shows the quantity of disk blocks read from disk (because the block was not found in the shared buffer).
The metric returned by the query is a percentage. Values close to 100% mean a well-tuned database engine.
When the hit percentage is low (about 75% or less), you should increase the shared_buffer configuration parameter. Afterwards, reboot the Postgres engine, wait until some database activity occurs, run the query again, and compare hit percentages.
Discovering Locks in PostgreSQL
PostgreSQL has a table called
pg_locks, where all locks are represented by a record. In this short exercise, we are going to use three sessions to illustrate locking issues. You can initiate the PostgreSQL session using your prefered PostgreSQL client (pgAdmin, psql, or any other PostgreSQL client), and make sure all of them are connected to the same database.
In the session 1, we execute the following code:
start transaction; update people set weight = weight + 1 where identification_no = 12345;
In the session 2, we execute this bit of SQL:
start transaction; update people set height = height + 1 where identification_no = 12345;
We can see that the second SQL transaction is not ending. The reason? It is waiting for the first session to finish and free the modified records. How can we tell when a session is waiting for another session? Moreover, how can we break this non-ending situation and avoid a program being ‘put on hold’ forever?
Every lock hold has a record in the
pg_locks table. The most important columns in this table are
tuple (which, with
page, identifies which tuple is being locked), and
pid (to identify the process owner of the lock).
We’ll use our third session to execute the query below, which will show us if we have lock issues. (Don’t be afraid of the complexity – just copy this query and keep it in your DBA toolbox.)
SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS current_statement_in_blocking_process FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE blocking_locks.pid != blocked_locks.pid AND NOT blocked_locks.GRANTED ;
To avoid showing the results in column form, the query is run on psql:
We can see the blocked statement and the blocking statement. In this situation, the blocked session is waiting until the blocking session is finished. When this occurs, the blocked session will continue and execute its update.
Killing PostgreSQL Sessions
At this point, you probably have an idea for fixing our “waiting on lock” problem. If we finish the transaction started in session 1, as mentioned, then session 2 can continue. While this is a valid solution, it can’t always be applied; perhaps session 1 is part of a program and we cannot change the program’s behavior or actions.
Fortunately, Postgres provides a function to end or close a database session programmatically. When a session is closed, all its locks are freed. Then any other session waiting for these locks can move on as well.
The following query can be a little bit dangerous, but it is a good example of the power of SQL. We are going to kill an obstructive session by integrating the lock detector from the previous query along with the pg_terminate_backend() function.
SELECT pg_terminate_backend(blocking_locks.pid) FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE blocking_locks.pid != blocked_locks.pid AND NOT blocked_locks.GRANTED ;
We can see the result in the following screenshot; however, the most important result is shown in the second screenshot, where we can see the update has finished after 30 minutes of wait time!
Try It Yourself: Write a Query to Cancel a Non-Ending Query
Another function, pg_cancel_backend(), is a little less aggressive than pg_terminate_backend(). It only cancels the currently-running query. Try writing your own SQL to cancel any queries that have been running for more than one hour. What do you come up with?