pg_repack

1. Overview

pg_repack is an extension for PostgreSQL used to rebuild tables and indexes online and eliminate storage bloat. It can achieve effects similar to VACUUM FULL (reclaiming space), while barely blocking business read/write operations.

Both the PG mode and Oracle compatibility mode of IvorySQL have been adapted for pg_repack.

License: BSD-3-Clause License

2. Principle

VACUUM FULL lacks a mechanism for capturing concurrent changes during the rewrite. It holds an ACCESS EXCLUSIVE lock on the entire table for the whole duration, during which the table cannot be read or written, and the lock duration grows with the table size. pg_repack compresses the strong lock into two brief moments at the beginning and end by means of "shadow table + triggers + file node swap", thereby enabling online rebuild.

2.1. Change Capture and Data Copy

  • First, install row-level triggers on the original table to record subsequent INSERT / UPDATE / DELETE operations into a log table (repack.log_<oid>) — this step requires a brief ACCESS EXCLUSIVE lock to atomically install the triggers, which is released immediately after installation;

  • Create a shadow table with the same structure as the original table (repack.table_<oid>), and bulk-copy the original table’s data under a consistent MVCC snapshot; this phase holds only SHARE UPDATE EXCLUSIVE, so the original table continues to allow normal reads and writes;

  • Rebuild indexes on the shadow table, then replay the log so that the shadow table catches up with the latest state of the original table.

2.2. Swap and Cleanup

  • Acquire a brief ACCESS EXCLUSIVE lock, replay any remaining log entries, then swap the physical file nodes (relfilenode) of the original table and the shadow table at the system catalog level — the table’s OID remains unchanged, and the swap is instantaneous;

  • Drop the triggers and log tables, then drop the shadow table (which now points to the old heap, so dropping it reclaims the old physical files); finally, run ANALYZE on the table.

Prerequisites for use: the target table must have a primary key or a non-null unique key; the operation requires approximately twice the table size of temporary disk space during the process.

3. Installation and Enabling

IvorySQL 5 or above is already installed in the environment. pg_repack consists of two parts: the server-side extension (pg_repack.so and SQL scripts) and the client-side command-line tool pg_repack.

It is assumed that IvorySQL 5 is currently installed and the pg_config tool is available.

3.1. Source Code Installation

Pull the pg_repack source code:

git clone --branch ver_1.5.3 https://github.com/reorg/pg_repack.git

Perform compilation and installation:

cd pg_repack
make
make install

3.2. Create the Extension

[ivorysql@localhost ivorysql]$ psql
psql (18.0)
Type "help" for help.

ivorysql=# CREATE EXTENSION pg_repack;
CREATE EXTENSION

CREATE EXTENSION pg_repack must be executed by a superuser; the client tool pg_repack also requires connecting and running as a superuser by default.

4. Usage

pg_repack is driven through the command-line client. The connection parameters are the same as those of psql (-h host, -p port, -U user, -d database, or use the PGHOST / PGPORT / PGUSER environment variables).

4.1. Rebuild All Eligible Tables in the Entire Database

pg_repack -d ivorysql

4.2. Rebuild a Specified Table

The schema.table form is recommended:

pg_repack -d ivorysql -t public.big_table

4.3. Rebuild Indexes Only

pg_repack -d ivorysql -t big_table --only-indexes    # rebuild all indexes of this table
pg_repack -d ivorysql -i public.big_table_idx        # rebuild a single index

4.4. Dry-Run Mode

Only shows the actions that would be performed, without actually rebuilding:

pg_repack -d ivorysql -t big_table --dry-run