Constraint Checks And Dirty Data

 It is much easier to keep bad data out of your database than it is to have to correct it later. Those costs are measured in time, money, and sanity. One of the easier ways of protecting your data is with a constraint check

I was recently approached by someone who urgently needed a constraint check and could afford to 'fix' the bad data after the constraint was in place. They had to quickly keep new 'bad' data out and could afford to correct the old 'bad' data at leisure. 

The Overly Simplified Example

To illustrate how to do just that, here is an example of a table and some simple data.

demo=# create table x1 (a int);
CREATE TABLE
demo=# insert into x1 (a) values (10),(20),(30),(-5),(1);
INSERT 0 5
demo=#

Sometime later, it is discovered that any value of a needs to be greater than five. If we add the constraint, the system will balk and refuse the request.

demo=# alter table x1 add constraint x_id_gt_5 check (a > 5);
ERROR:  check constraint "x_id_gt_5" of relation "x1" is violated by some row

The trick is to tell the ALTER TABLE command to not validate the constraint check. The constraint is added, but the system does not ensure the data conforms to that check. This is done by adding NOT VALID to the ALTER TABLE command.

demo=# alter table x1 add constraint a_gt_5 check (a > 5) not valid;
ALTER TABLE

The data is in the same state as before the constraint was added. The old 'bad' data is still there and can be corrected or deleted later.

demo=# select a from x1;
 a  
----
 10
 20
 30
 -5
  1
(5 rows)

demo=# 

Any new attempts to add new 'bad' data will fail.

demo=# insert into x1 (a) values (3);
ERROR:  new row for relation "x1" violates check constraint "a_gt_5"
DETAIL:  Failing row contains (3).
demo=# 

Now, it is time to fix the rows that do not conform to the check. When that is done, the check can be validated.

demo=# alter table x1 validate constraint a_gt_5;
ALTER TABLE
demo=# 



Constraint Hints


1. Name your constraints. The name 'a_gt_5' describes the constraint and explains why the data was rejected. If you do not name the constraint, the system will assign a name that will list the table name, the column name, the word 'check,' and no clue what went wrong. 

Look at the following definitions. Seeing a_gt_t error informs you what is wrong, But x1_a_check is nebulous, at best. Remember that you may be the poor person trying to figure out why the inserts are failing at three in the morning.

demo=# \d x1
                 Table "public.x1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
Check constraints:
    "a_gt_5" CHECK (a > 5) 
    "x1_a_check" CHECK (a > 6)

demo=# 

2. Double-check the constraints.  In the above, having a check on a being greater than five and another greater than six makes no sense.  This type of problem happens as the tables age and multiple people make changes without making sure there no conflict or similar action already established. 


Comments

Popular posts from this blog

How Does PostgreSQL Store Your Data?

Now That We Know Where PostgreSQL Stores Data, We Can Look At How

Beginning PostgreSQL Transactions