Postresql Monitoring
with SQL

Maria Alcaraz, Database Developer

by
Maria Alcaraz
Database Developer

Posted: July 21, 2016

PostgreSQL provides an activity-tracking module called the statistics collector, which tracks table access and other internal events. If your database is experiencing long wait times, you may be able to use this tool and some simple SQL to find and fix the problem.

Tracking Postgres Activity With the Statistics Collector Tool

The statistics collector is designed to keep records about internal activity in a Postgres database manager. It can:

  • count table and index access in both disk-block and individual-row terms.
  • track the total number of rows in each table.
  • store information about admin actions (like vacuum and analyze) for each table.
  • count calls to user-defined functions and the total time spent in each one.

This tool has many tables; we’ll focus on pg_stat_activity and pg_stat_database. Both are shown in the following screenshot, along with the pg_locks table. The pg_locks table is not part of the collector data model, but it can be joined with collector tables.




Setting the Table With SQL

In order to have something to be monitored, we’ll begin by creating a table. Our table will have 20,000 records for people participating in a (fictional) medical drug study. The following SQL code creates this table, which we’ll name people:

CREATE TABLE people (
identification_no integer,
full_name varchar(50),
weight float,
height float,
test_result int 	/* 0 to 10 */
);

insert into people(identification_no,full_name) 
select a,'John Smith'||a 
from generate_series(1,20000,1) a;

update people 
set weight = normal_rand(1,90,40), 
height = normal_rand(1,70,20), 
test_result = random()*10+1;  

As a side note, it is interesting to see how we populated the table with some pretty real data using only two SQLs. First we used the generate_series() table function (which returns multiple rows) to obtain 20,000 IDs and names. Then we populated the height and weight columns with values according to a Gaussian normal distribution, using mean=90 and variance=40 for weight and mean=70 and variance=20 for height. For the test_result column, we used a random integer belonging to [1,10].




Creating a Very Slow Query

Once we have the people table fully populated, we will run a many times self Cartesian product like the following:

SELECT count(*) FROM people p1, people p2, people p3, people p4;

This query represents a 4 times self Cartesian product for the table “people” with itself. The number of rows to be counted is 20,000 (^ 4). So applying some math concepts, this is equal to:

20,000 ^ 4 rows = ((2 * 10) ^4) ^4 rows = 2 * 10 ^ 16 rows = 20,000,000,000,000,000 (20 quadrillion) rows.

We can be sure that this query will need more than a few minutes to finish!

Detecting a Slow Query

Now that our slow query is running, we’ll start to explore the Postgres statistic collector. We can activate it by setting a configuration parameter. (The configuration process won’t be covered here; there is plenty of information online explaining how to activate the PostgreSQL statistic collector.) Once this is set, some catalog tables will be populated automatically by the Postgres engine.

One of the most important tables is pg_stat_activity. It has one record per each running query in the database. Its most important columns are:

  • query: query sql text,
  • datname: database name,
  • query_start : query start time

What can the statistics collector tell us about our running queries? Check out the screenshot below:

statistics collector on running queries, Postresql performance

select now() - query_start AS elapsed, 
query AS sql_text, 
datname AS database, 
usename AS username
from 	pg_stat_activity
where now() - query_start > '00:01:00'   /* we only want queries lasting more than one minute */
and state = 'active'
order by 1 desc

The results are:


elapsedsql_textdatabaseusername
"00:24:00""SELECT count(*) FROM people p1, people p2, people p3, people p4""postgres""postgres"


Note that we calculated the elapsed query time by finding the difference between the current time, now(), and the query’s start time.

Next, we will run a less aggressive Cartesian product like...

SELECT count(*) FROM people p1, people p2, people p3;

… which gives these results:


elapsedsql_textdatabaseusername
"00:30:20""SELECT count(*) FROM people p1, people p2, people p3, people p4""postgres""postgres"
"00:01:11""SELECT count(*) FROM people p1, people p2, people p3""postgres""postgres"


Query result 1, Postresql Monitoring

Using SQL to Fine-Tune Shared Buffers

All database engines use an area of RAM to store all blocks read from disk. This lets them avoid accessing the same disk block the next time it is required. In Postgres, this is known as shared buffers.

The ratio between the quantity of pages found in the shared buffers and the quantity of blocks read from disk (a miss in the shared buffer) is one of the main ways to judge database performance. The pg_stat_database also has other metrics, like the quantity of fetched/inserted/deleted/updated tuples, disk block reads, and deadlocks. The following screenshot shows the pg_stat_database schema.

postgresql management tools

The following query returns these performance metrics:

select datname, 
	(blks_hit::numeric / (blks_hit + blks_read)::numeric)*100 as read_hit_percentage,* 
from pg_stat_database where blks_hit + blks_read > 0;

  • The blks_hit column shows the quantity of disk blocks found in the shared buffer area.
  • The blks_read column shows the quantity of disk blocks read from disk (because the block was not found in the shared buffer).

The metric returned by the query is a percentage. Values close to 100% mean a well-tuned database engine.


DatabasePercentage Hit
"postgres";99.81
"ultima";99.97
"panacim_jun20";99.65
"panacim_jun23";99.62


When the hit percentage is low (about 75% or less), you should increase the shared_buffer configuration parameter. Afterwards, reboot the Postgres engine, wait until some database activity occurs, run the query again, and compare hit percentages.

Discovering Locks in PostgreSQL

PostgreSQL has a table called pg_locks, where all locks are represented by a record. In this short exercise, we are going to use three sessions to illustrate locking issues. You can initiate the PostgreSQL session using your prefered PostgreSQL client (pgAdmin, psql, or any other PostgreSQL client), and make sure all of them are connected to the same database.

In the session 1, we execute the following code:

start transaction;
update people set weight = weight + 1 where identification_no = 12345;

In the session 2, we execute this bit of SQL:

start transaction;
update people set height = height + 1 where identification_no = 12345;

We can see that the second SQL transaction is not ending. The reason? It is waiting for the first session to finish and free the modified records. How can we tell when a session is waiting for another session? Moreover, how can we break this non-ending situation and avoid a program being ‘put on hold’ forever?

pg_locks Table, Postresql performance

Every lock hold has a record in the pg_locks table. The most important columns in this table are database, relation, page, tuple (which, with page, identifies which tuple is being locked), and pid (to identify the process owner of the lock).

We’ll use our third session to execute the query below, which will show us if we have lock issues. (Don’t be afraid of the complexity – just copy this query and keep it in your DBA toolbox.)

SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks   blocked_locks  
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks  blocking_locks    ON blocking_locks.locktype = blocked_locks.locktype
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE blocking_locks.pid != blocked_locks.pid    AND    NOT blocked_locks.GRANTED ;

Results:

To avoid showing the results in column form, the query is run on psql:

Screenshot of query run on psql, Postresql Monitoring

We can see the blocked statement and the blocking statement. In this situation, the blocked session is waiting until the blocking session is finished. When this occurs, the blocked session will continue and execute its update.

Killing PostgreSQL Sessions

At this point, you probably have an idea for fixing our “waiting on lock” problem. If we finish the transaction started in session 1, as mentioned, then session 2 can continue. While this is a valid solution, it can’t always be applied; perhaps session 1 is part of a program and we cannot change the program’s behavior or actions.

Fortunately, Postgres provides a function to end or close a database session programmatically. When a session is closed, all its locks are freed. Then any other session waiting for these locks can move on as well.

The following query can be a little bit dangerous, but it is a good example of the power of SQL. We are going to kill an obstructive session by integrating the lock detector from the previous query along with the pg_terminate_backend() function.

SELECT pg_terminate_backend(blocking_locks.pid) 
   FROM  pg_catalog.pg_locks   blocked_locks  
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks  blocking_locks    ON blocking_locks.locktype = blocked_locks.locktype
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE blocking_locks.pid != blocked_locks.pid    AND    NOT blocked_locks.GRANTED ;

We can see the result in the following screenshot; however, the most important result is shown in the second screenshot, where we can see the update has finished after 30 minutes of wait time!

Update finished after 30 minutes of wait time - Data output tab, Postresql performance

Update finished after 30 minutes of wait time - Messages tab, postgresql management tools

Try It Yourself: Write a Query to Cancel a Non-Ending Query

Another function, pg_cancel_backend(), is a little less aggressive than pg_terminate_backend(). It only cancels the currently-running query. Try writing your own SQL to cancel any queries that have been running for more than one hour. What do you come up with?

 
 
 

Try our online database modeler. No registration. No commitments.

 
 
Tags
 
Subscribe to our newsletter

If you find this article useful, join our weekly newsletter to be notified about the latest posts.

 
 
 
New SQL Course! Find out how to identify all the factors of a successful visualization: think like your audience, find the right chart type, and take out the meaning of the data. View course Discover our other courses: