Estuary

Measuring WAL Throughput in PostgreSQL: Step-by-Step Guide

Learn how to approximate change event throughput for PostgreSQL by measuring the Write-ahead-log (WAL).

Measuring WAL Throughput in PostgreSQL: Step-by-Step Guide
Share this article
Screenshot 2024-10-04 at 09.34.27.png

Write-ahead logging (WAL) is essential to PostgreSQL's crash recovery and Change Data Capture (CDC) replication. Tracking the amount of WAL generated over time can help developers understand the volume of changes happening in their database, which is particularly useful for evaluating replication systems or monitoring replication performance.

In this article, we’ll walk through how to measure WAL throughput using SQL queries in PostgreSQL. We will create a table and views that allow you to track and calculate WAL volume over specific periods.

Step 1: Set Up a Table to Track WAL LSN Positions

To start, you’ll need a table to store the Log Sequence Numbers (LSNs) along with timestamps. This allows you to record the current LSN over time, which we can later use to calculate the WAL volume.

plaintext
CREATE TABLE wal_lsn_history (    id SERIAL PRIMARY KEY,    timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,    lsn_position pg_lsn );

This table captures two key pieces of information:

  • timestamp: The exact time the LSN was recorded.
  • lsn_position: The current LSN when the row is inserted.

The Log Sequence Number (LSN) in PostgreSQL is a unique identifier for a specific location within the WAL. An LSN might look like this: 0/16B3740. This represents a specific point in the WAL, which the system can use to determine where to read or write next.

Screenshot 2024-10-04 at 09.34.14.png
Source: https://materialize.com/blog/connecting-materialize-directly-to-postgresql-via-the-replication-stream/

Step 2: Create a Function to Record the Current WAL LSN

Next, you will create a function that inserts the current LSN into the wal_lsn_history table. This function can be called periodically to track changes in the WAL.

plaintext
CREATE OR REPLACE FUNCTION record_current_wal_lsn() RETURNS void AS $$ BEGIN    INSERT INTO wal_lsn_history (lsn_position)    VALUES (pg_current_wal_lsn()); END; $$ LANGUAGE plpgsql;

You can execute this function manually or set up a cron job to run it at regular intervals, depending on how often you want to sample the WAL LSN.

Step 3: Create a View for WAL Volume Analytics

This view calculates the WAL bytes generated between each recorded LSN and the time difference between samples. It allows you to analyze WAL throughput over time.

plaintext
CREATE OR REPLACE VIEW wal_volume_analytics AS WITH time_periods AS (    SELECT        timestamp,        lsn_position,        LEAD(timestamp) OVER (ORDER BY timestamp DESC) as prev_timestamp,        LEAD(lsn_position) OVER (ORDER BY timestamp DESC) as prev_lsn    FROM wal_lsn_history ), calculations AS (    SELECT        timestamp,        lsn_position,        EXTRACT(EPOCH FROM (timestamp - prev_timestamp)) as seconds_diff,        pg_wal_lsn_diff(lsn_position, prev_lsn) as bytes_written    FROM time_periods    WHERE prev_lsn IS NOT NULL ) SELECT    timestamp,    lsn_position,    bytes_written as wal_bytes_since_previous,    pg_size_pretty(bytes_written) as wal_size_since_previous,    ROUND(bytes_written::numeric / NULLIF(seconds_diff, 0), 2) as bytes_per_second,    pg_size_pretty((bytes_written::numeric / NULLIF(seconds_diff, 0))::bigint) || '/s' as rate_pretty,    seconds_diff as seconds_since_previous FROM calculations WHERE seconds_diff > 0 ORDER BY timestamp DESC;

Explanation of Columns in the View:

  • wal_bytes_since_previous: The number of bytes written to the WAL since the previous LSN.
  • wal_size_since_previous: Human-readable size of the WAL written (e.g., KB, MB).
  • bytes_per_second: The rate of WAL generation, in bytes per second.
  • rate_pretty: A human-readable version of the rate, in KB/s or MB/s.
  • seconds_since_previous: The time difference between the current and previous LSN sample.

Example Output

The following output shows the calculated WAL volume for each recorded LSN:

timestamp

wal_size_since_previous

bytes_per_second

rate_pretty

seconds_since_previous

2024-10-01 14:22:10 +00:002 MB250000.00244 KB/s8
2024-10-01 14:22:02 +00:001.5 MB187500.00183 KB/s8
wal_volume_analytics table sample

Step 4: Create Summary Views for Time Windows

To provide a broader overview of WAL volume over various time windows (e.g., last 5 minutes, last hour), we can create another view that summarizes the WAL volume over configurable periods.

plaintext
CREATE OR REPLACE VIEW wal_volume_summary AS WITH time_windows AS (    SELECT        'Last 5 minutes' as window,        5 as minutes,        NOW() - INTERVAL '5 minutes' as start_time    UNION ALL    SELECT 'Last 15 minutes', 15, NOW() - INTERVAL '15 minutes'    UNION ALL    SELECT 'Last hour', 60, NOW() - INTERVAL '1 hour'    UNION ALL    SELECT 'Last day', 1440, NOW() - INTERVAL '1 day' ), wal_diffs AS (    SELECT        h.timestamp,        h.lsn_position,        LAG(h.lsn_position) OVER (ORDER BY h.timestamp) as prev_lsn    FROM wal_lsn_history h ), calculated_wal AS (    SELECT        wd.timestamp,        wd.lsn_position,        wd.prev_lsn,        pg_wal_lsn_diff(wd.lsn_position, wd.prev_lsn) as wal_diff    FROM wal_diffs wd    WHERE wd.prev_lsn IS NOT NULL ) SELECT    w.window,    COUNT(*) as samples,    pg_size_pretty(SUM(c.wal_diff)) as total_wal_size,    pg_size_pretty(AVG(c.wal_diff)::bigint) as avg_wal_per_minute,    pg_size_pretty((SUM(c.wal_diff)::numeric / (w.minutes * 60))::bigint) || '/s' as avg_rate FROM    time_windows w    JOIN calculated_wal c ON c.timestamp > w.start_time GROUP BY w.window, w.minutes ORDER BY w.minutes;

Quick Explanation of Columns:

  • window: The measured time window.
  • samples: Number of WAL size samples.
  • total_wal_size: Total WAL size written in the time window.
  • avg_wal_per_minute: Average WAL generated per minute.
  • avg_rate: Average rate of WAL generation (e.g., KB/s).

Example Output

window

samples

total_wal_size

avg_wal_per_minute

avg_rate

Last 5 minutes310 MB2 MB341 KB/s
Last 15 minutes3430 MB2 MB227 KB/s
Last hour532120 MB2 MB341 KB/s
Last day32432 GB1.5 MB170 KB/s
wal_volume_summary table sample

 

 

 

 

 

 

 

And there we have it! Detailed statistics of WAL throughput over time. Keep in mind the WAL contains other stuff as well, so this is still just an approximation of change event throughput, but it is as close as we can get. Here's why:

  1. WAL Logging: WAL logs more than just data changes—it also includes metadata and internal information not always reflected in CDC output.
  2. WAL Compression: PostgreSQL can compress operations or include additional overhead (e.g., transaction IDs) that CDC might exclude.
  3. Row-Level vs Full Writes: Some CDC tools might capture only the new values, while WAL logs old and new data along with metadata, affecting size comparisons.

But of course we don’t want to manually run the size collection function every time, so as a last step, let’s take a look at how we can automate this.

Scheduling WAL Data Collection with Cron

To continuously monitor WAL throughput over time, you need to collect the WAL LSN positions at regular intervals. This can be easily achieved by using cron jobs, a Linux-based task scheduler that runs scripts or commands at specified intervals.

If you prefer to avoid external tools like cron, PostgreSQL offers an internal job scheduling option using the pg_cron extension. pg_cron allows you to run SQL-based jobs directly within the database without needing to manage external scripts. This is particularly useful if you want to keep everything self-contained within PostgreSQL.

Step 1: Install pg_cron

First, you need to install and enable pg_cron. On most PostgreSQL installations, you can do this by running:

Install pg_cron via your package manager (e.g., for Ubuntu/Debian):

plaintext
sudo apt install postgresql-16-cron
  1. Load the extension into your PostgreSQL instance:
plaintext
CREATE EXTENSION pg_cron;
  1. Ensure pg_cron is set up to run jobs by editing your postgresql.conf file to include the cron schema:
plaintext
shared_preload_libraries = 'pg_cron'
  1. Then, restart your PostgreSQL instance to apply the changes.

Step 2: Schedule WAL Data Collection

With pg_cron installed and loaded, you can now schedule jobs directly in SQL. To collect WAL LSN data periodically, use the following command to schedule the record_current_wal_lsn() function to run every minute:

plaintext
SELECT cron.schedule('Record WAL LSN every minute', '*/1 * * * *', 'SELECT record_current_wal_lsn();');

This command schedules the record_current_wal_lsn() function to run every minute using the cron syntax. The job will be handled entirely within PostgreSQL, eliminating the need for external scripts or cron setups.

Step 3: Monitor Scheduled Jobs

You can monitor and manage your scheduled jobs within PostgreSQL using pg_cron. To see the list of jobs scheduled with pg_cron, you can query the cron.job table:

plaintext
SELECT jobid, schedule, command, nodename, nodeport, active FROM cron.job;

If you want to stop a scheduled job, you can disable it using the cron.unschedule function:

plaintext
SELECT cron.unschedule(jobid);

Step 4: Adjusting the Collection Frequency

As with standard cron, you can adjust the frequency by modifying the cron syntax in the schedule string:

  • */5 * * * *: Every 5 minutes.
  • 0 * * * *: Every hour.
  • 0 0 * * *: Every day at midnight.

Conclusion

Measuring WAL throughput is an important part of monitoring and optimizing PostgreSQL, especially for systems that rely on replication. By using the table, functions, and views outlined in this article, you can track how much data is being written to the WAL in real-time and over specific periods. This can help you make informed decisions about replication system evaluations, performance tuning, and database scaling.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Picture of Dani Pálma
Dani Pálma

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.