Now That We Know Where PostgreSQL Stores Data, We Can Look At How

 Last time, we started looking at how PostgreSQL stores data by discovering where the data is stored and had a quick peek at that data. There is only one record at this point.


demo=# SELECT * FROM ex01;
 a | b 
---+---
 1 | 1
(1 row)


And the contents of that file are not precisely illuminating at first glance. For now, just trust me that our row is in there, and you will see where in just a little bit.

root@ThinkPad:/var/lib/postgresql/17/main/base/16389# od -c 16390
0000000  \0  \0  \0  \0   (   z 227 001  \0  \0  \0  \0 034  \0 340 037
0000020  \0     004      \0  \0  \0  \0 340 237   @  \0  \0  \0  \0  \0
0000040  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0017740 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017760 001  \0 002  \0  \0  \t 030  \0 001  \0  \0  \0 001  \0  \0  \0
0020000

A quick peek at the above does not make it obvious where the two columns of one are located. But they are there. It helps to add more data so you can see where it goes.

Example 2


A new table is created with only one column, but we have added eleven records now. One column will make it easier to see how the data is stored.

demo=# create table ex02 (a int);
CREATE TABLE
demo=# insert into ex02 (a) values (1),(3),(5),(7),(9),(11);
INSERT 0 6
demo=# select oid, relname, relnamespace 
       from pg_class where relname ='ex02';
  oid  | relname | relnamespace 
-------+---------+--------------
 16390 | ex02    |         2200
(1 row)

demo=# select * from ex02;
 a  
----
  1
  3
  5
  7
  9
 11
(6 rows)

demo=# 



And what does the file content look like?

root@x:/var/lib/postgresql/17/main/base/16389# od -c 16390
0000000  \0  \0  \0  \0   8   m 227 001  \0  \0  \0  \0   0  \0   @ 037
0000020  \0     004      \0  \0  \0  \0 340 237   8  \0 300 237   8  \0
0000040 240 237   8  \0 200 237   8  \0   ` 237   8  \0   @ 237   8  \0
0000060  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0017500 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017520 006  \0 001  \0  \0  \t 030  \0  \v  \0  \0  \0  \0  \0  \0  \0
0017540 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017560 005  \0 001  \0  \0  \t 030  \0  \t  \0  \0  \0  \0  \0  \0  \0
0017600 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017620 004  \0 001  \0  \0  \t 030  \0  \a  \0  \0  \0  \0  \0  \0  \0
0017640 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017660 003  \0 001  \0  \0  \t 030  \0 005  \0  \0  \0  \0  \0  \0  \0
0017700 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017720 002  \0 001  \0  \0  \t 030  \0 003  \0  \0  \0  \0  \0  \0  \0
0017740 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017760 001  \0 001  \0  \0  \t 030  \0 001  \0  \0  \0  \0  \0  \0  \0
0020000



The PostgreSQL page layout can be thought of as an empty box. The first row goes on the bottom, the second row on top of the first, and so on until we run out of the 8K space and need to add another page. Look at the last long line above, and we can see a highlighted 001. Two lines above that is the second row's 003. This works for our third row. But the fourth row, a seven, is a backslash 'a'.  It is appropriate at this time to mutter a 'What the heck!'. 

We will come back to this. You will have to wait until later for the explanation.

This might seem like a logically messy version of the Jenga game, but there is an elegance to it that you will soon see. 

At the top of our box is a pointer to each of the rows, and we fill from the boom and top, placing new rows and their pointers in the middle.

CTIDs


Each row gets an identifier for its position in the table called the CTID. CTIDs will make a little more sense later when we get into multiple versions of a row.

demo=# select a, ctid from ex02;
 a  | ctid  
----+-------
  1 | (0,1)
  3 | (0,2)
  5 | (0,3)
  7 | (0,4)
  9 | (0,5)
 11 | (0,6)
(6 rows)


To dig a little deeper, we can use the pageinspect extension.

demo=# CREATE EXTENSION pageinspect;
CREATE EXTENSION

demo=# select lp, t_ctid, t_data from heap_page_items(get_raw_page('ex02',0));
 lp | t_ctid |   t_data   
----+--------+------------
  1 | (0,1)  | \x01000000
  2 | (0,2)  | \x03000000
  3 | (0,3)  | \x05000000
  4 | (0,4)  | \x07000000
  5 | (0,5)  | \x09000000
  6 | (0,6)  | \x0b000000
(6 rows)

The heap_page_items shows all line pointers on a heap page. So line-pointer (lp) number 1 points to ctid (0,1) with the data \x01000000. 

First Row / Last Row


Peek at the last two rows of the octal dump (od) above.

0017740 362 002  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0017760 001  \0 001  \0  \0  \t 030  \0 001  \0  \0  \0  \0  \0  \0  \0

Details on the table row layout can be found here, and we will dig into the parts later. Right now, we can see the size of each row on the page.

demo=# SELECT lp, lp_off, lp_len t_attrs FROM heap_page_item_attrs(get_raw_page('ex02', 0), 'pg_class'::regclass) limit 3; 
 lp | lp_off | t_attrs 
----+--------+---------
  1 |   8160 |      28
  2 |   8128 |      28
  3 |   8096 |      28
(3 rows)

Each row is 28 bytes (8160 - 28 is 8128).  So, each time we add a row, we need 28 bytes for the row itself and a little more for the pointer to that row. 

The Header


Each page has a header that describes the data. The first 24 bytes is the page header.

root@x:/var/lib/postgresql/17/main/base/16389# od -N 24 -c 16390
0000000  \0  \0  \0  \0   8   m 227 001  \0  \0  \0  \0   0  \0   @ 037
0000020  \0     004      \0  \0  \0  \0
0000030

This tells us things like where free space starts and ends. 

demo=# SELECT * FROM page_header(get_raw_page('ex02', 0));
    lsn    | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
 0/1976D38 |        0 |     0 |    48 |  8000 |    8192 |     8192 |       4 |         0
(1 row)

demo=# 

 Lower and upper refer to the start and end of free space, where the rows are stored. These are tracked so that news rows can be added until we do not have enough free space to add another row. By the way, that 8,000 for the upper is the lp_off of our sixth and last row.

Next Time


We will dig deeper now that we know where and roughly how our data is stored. We are heading into how PostgreSQL tracks changes in rows and the basics of transactions. 



Comments

Popular posts from this blog

How Does PostgreSQL Store Your Data?