Artificial Intelligence is one of those conundrums where the ability to have some tasks handled for you contrasts with a mix of ego and pride that it may be better than you at those tasks. I recently wrote a blog using another database about an AI-generated SQL that was quite sophisticated. But what about asking an AI to create a table with a specification like a DBA/SRE/Analyst might receive?
I used Grok and entered the following prompt:
Write the SQL to create a table on a PostgreSQL version 17 server to record data on customers. The data will include a first name, a last name, an address, a birthdate, and a unique UUID primary key. Partition the data by the year of birth. And produce some test data of at least ten records.
I am sure many of us have started large projects given less instruction. Notice: I did not denote the format for the address (US Post Office's format, or UK, or other). Nor did I set a name length. I wanted to see what assumptions were made by the AI.
The Answer
The response came very quickly, much more quickly that I could interactively input the CREATE TABLE statement.
-- First, create the parent table with partitioning
CREATE TABLE customers (
customer_id UUID PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birthdate DATE NOT NULL
) PARTITION BY RANGE (EXTRACT(YEAR FROM birthdate));
-- Create partitions for specific year ranges
CREATE TABLE customers_y1960_1980 PARTITION OF customers
FOR VALUES FROM (1960) TO (1980);
CREATE TABLE customers_y1980_2000 PARTITION OF customers
FOR VALUES FROM (1980) TO (2000);
CREATE TABLE customers_y2000_2020 PARTITION OF customers
FOR VALUES FROM (2000) TO (2020);
CREATE TABLE customers_y2020_2040 PARTITION OF customers
FOR VALUES FROM (2020) TO (2040);
-- Insert test data (10 records)
INSERT INTO customers (customer_id, first_name, last_name, address, birthdate) VALUES
-- 1960-1980 partition
(gen_random_uuid(), 'John', 'Smith', '123 Maple St, Boston, MA', '1975-03-15'),
(gen_random_uuid(), 'Mary', 'Johnson', '456 Oak Ave, Chicago, IL', '1968-07-22'),
-- 1980-2000 partition
(gen_random_uuid(), 'Robert', 'Williams', '789 Pine Rd, Seattle, WA', '1985-11-30'),
(gen_random_uuid(), 'Sarah', 'Brown', '321 Elm St, Austin, TX', '1992-04-17'),
(gen_random_uuid(), 'Michael', 'Davis', '654 Birch Ln, Denver, CO', '1998-09-05'),
-- 2000-2020 partition
(gen_random_uuid(), 'Emily', 'Taylor', '987 Cedar Dr, Miami, FL', '2001-12-25'),
(gen_random_uuid(), 'James', 'Wilson', '147 Spruce Way, Portland, OR', '2005-06-10'),
(gen_random_uuid(), 'Lisa', 'Anderson', '258 Willow Ct, Phoenix, AZ', '2010-02-14'),
-- 2020-2040 partition
(gen_random_uuid(), 'Ethan', 'Martinez', '369 Aspen Pl, San Diego, CA', '2021-08-19'),
(gen_random_uuid(), 'Olivia', 'Garcia', '741 Poplar St, New York, NY', '2023-01-03');
-- Verify the data distribution across partitions
SELECT tableoid::regclass, COUNT(*)
FROM customers
GROUP BY tableoid;
Note the last part about selecting the tableoid, which I was not expecting. Maybe a COUNT * on the Primary Key, but not that. Hmmm.
Wow
The response was very quick. A quick scan of the table definition shows some good but probably most efficient choices for the column data types. I specified unique PKs and was unsure if the AI would pop in a UNIQUE constraint or stick with the UUID types ratre conflict strategy. The test data looks good, too, at a cursory glance.
But Does It Work?
The real test is if the supplied SQL works. I copied the CREATE TABLE section and tried.
demo=# CREATE TABLE customers (
customer_id UUID PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birthdate DATE NOT NULL
) PARTITION BY RANGE (EXTRACT(YEAR FROM birthdate));
ERROR: unsupported PRIMARY KEY constraint with partition key definition
DETAIL: PRIMARY KEY constraints cannot be used when partition keys include expressions.
demo=#
Dang! Well, everyone runs into syntax and function issues from time to time.
Second Pass
So, maybe it is time to be a little more specific about the birthday data. I modified the prompt to break out the birthday fields into unique columns.
Write the SQL to create a table on a PostgreSQL version 17 server to record data on customers. The data will include a first name, a last name, an address, a birthdate, and a unique UUID primary key. Have separate columns for the birthday month, birthday day, and birthday year. Partition the data by the year of birth. And produce some test data from at least ten records.
Once again, the response came quickly.
-- Create the parent table with partitioning
CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birth_date DATE NOT NULL,
birth_month SMALLINT GENERATED ALWAYS AS (EXTRACT(MONTH FROM birth_date)) STORED,
birth_day SMALLINT GENERATED ALWAYS AS (EXTRACT(DAY FROM birth_date)) STORED,
birth_year SMALLINT GENERATED ALWAYS AS (EXTRACT(YEAR FROM birth_date)) STORED
) PARTITION BY RANGE (birth_year);
-- Create partitions for specific year ranges
CREATE TABLE customers_y1970s PARTITION OF customers
FOR VALUES FROM (1970) TO (1980);
CREATE TABLE customers_y1980s PARTITION OF customers
FOR VALUES FROM (1980) TO (1990);
CREATE TABLE customers_y1990s PARTITION OF customers
FOR VALUES FROM (1990) TO (2000);
CREATE TABLE customers_y2000s PARTITION OF customers
FOR VALUES FROM (2000) TO (2010);
CREATE TABLE customers_y2010s PARTITION OF customers
FOR VALUES FROM (2010) TO (2020);
-- Insert test data
INSERT INTO customers (first_name, last_name, address, birth_date) VALUES
('John', 'Smith', '123 Main St, Boston MA', '1985-03-15'),
('Sarah', 'Johnson', '456 Oak Ave, Chicago IL', '1992-07-22'),
('Michael', 'Brown', '789 Pine Rd, Seattle WA', '1978-11-30'),
('Emily', 'Davis', '321 Elm St, Austin TX', '2001-02-14'),
('David', 'Wilson', '654 Cedar Ln, Miami FL', '1988-09-05'),
('Lisa', 'Anderson', '987 Birch Dr, Denver CO', '1995-12-25'),
('James', 'Taylor', '147 Maple St, Portland OR', '1975-06-18'),
('Rachel', 'Martinez', '258 Willow Way, Phoenix AZ', '2004-04-03'),
('Thomas', 'Lee', '369 Spruce Ct, Atlanta GA', '1982-08-27'),
('Kelly', 'White', '741 Ash Pl, San Diego CA', '1998-10-11'),
('Robert', 'Garcia', '852 Poplar Ave, Houston TX', '1972-01-09');
-- Verify the data
SELECT * FROM customers ORDER BY birth_year, birth_month, birth_day;
The choice of generated columns for breaking out the birthdate information looked like a good choice. I was surprised to see the last query changed from counting the rows to selecting all the data. What triggered that?
Now, does that new CREATE TABLE work?
Close, Lacking Cigar
I had high hopes that the second revision would work. The date was split up into individual columns.
demo=# CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birth_date DATE NOT NULL,
birth_month SMALLINT GENERATED ALWAYS AS (EXTRACT(MONTH FROM birth_date)) STORED,
birth_day SMALLINT GENERATED ALWAYS AS (EXTRACT(DAY FROM birth_date)) STORED,
birth_year SMALLINT GENERATED ALWAYS AS (EXTRACT(YEAR FROM birth_date)) STORED
) PARTITION BY RANGE (birth_year);
ERROR: cannot use generated column in partition key
LINE 10: ) PARTITION BY RANGE (birth_year);
^
DETAIL: Column "birth_year" is a generated column.
demo=#
Dang! I did not know that using a GENERATED COLUMN as an argument for partition would be an issue. (Time for me and the AI to RTFM, right?)
One More Into The Breech
So this time, I added 'do not use generated columns'.
Write the SQL to create a table on a PostgreSQL version 17 server to record data on customers. The data will include a first name, a last name, an address, a birthdate, and a unique UUID primary key. Have separate columns, do not use generated columns, for the birthday month, birthday day, and birthday year. Partition the data by the year of birth. And produce some test data from at least ten records.
-- Create the parent table with partitioning
CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birth_date DATE NOT NULL,
birth_month SMALLINT NOT NULL CHECK (birth_month >= 1 AND birth_month <= 12),
birth_day SMALLINT NOT NULL CHECK (birth_day >= 1 AND birth_day <= 31),
birth_year SMALLINT NOT NULL
) PARTITION BY RANGE (birth_year);
-- Create partitions for specific year ranges
CREATE TABLE customers_y1970s PARTITION OF customers
FOR VALUES FROM (1970) TO (1980);
CREATE TABLE customers_y1980s PARTITION OF customers
FOR VALUES FROM (1980) TO (1990);
CREATE TABLE customers_y1990s PARTITION OF customers
FOR VALUES FROM (1990) TO (2000);
CREATE TABLE customers_y2000s PARTITION OF customers
FOR VALUES FROM (2000) TO (2010);
CREATE TABLE customers_y2010s PARTITION OF customers
FOR VALUES FROM (2010) TO (2020);
-- Insert test data
INSERT INTO customers (first_name, last_name, address, birth_date, birth_month, birth_day, birth_year) VALUES
('John', 'Smith', '123 Main St, Boston MA', '1985-03-15', 3, 15, 1985),
('Sarah', 'Johnson', '456 Oak Ave, Chicago IL', '1992-07-22', 7, 22, 1992),
('Michael', 'Brown', '789 Pine Rd, Seattle WA', '1978-11-30', 11, 30, 1978),
('Emily', 'Davis', '321 Elm St, Austin TX', '2001-02-14', 2, 14, 2001),
('David', 'Wilson', '654 Cedar Ln, Miami FL', '1988-09-05', 9, 5, 1988),
('Lisa', 'Anderson', '987 Birch Dr, Denver CO', '1995-12-25', 12, 25, 1995),
('James', 'Taylor', '147 Maple St, Portland OR', '1975-06-18', 6, 18, 1975),
('Rachel', 'Martinez', '258 Willow Way, Phoenix AZ', '2004-04-03', 4, 3, 2004),
('Thomas', 'Lee', '369 Spruce Ct, Atlanta GA', '1982-08-27', 8, 27, 1982),
('Kelly', 'White', '741 Ash Pl, San Diego CA', '1998-10-11', 10, 11, 1998),
('Robert', 'Garcia', '852 Poplar Ave, Houston TX', '1972-01-09', 1, 9, 1972);
-- Verify the data
SELECT * FROM customers ORDER BY birth_year, birth_month, birth_day;
No generated columns and some constrict checks added gratis. Now, the final query has been augmented with an ORDER BY.
But, once again, does IT work?
You Make Take A Prize From The Losest Shelf
demo=# CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birth_date DATE NOT NULL,
birth_month SMALLINT NOT NULL CHECK (birth_month >= 1 AND birth_month <= 12),
birth_day SMALLINT NOT NULL CHECK (birth_day >= 1 AND birth_day <= 31),
birth_year SMALLINT NOT NULL
) PARTITION BY RANGE (birth_year);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "customers" lacks column "birth_year" which is part of the partition key.
demo=#
Okay, I need to read up on the partitioning by range manual page.
Conclusions
The DDL definition looks good at first glance. And it is almost right, which may catch the attention of many of us who write SQL for a living. Replace the AI with a Junior DBA in your mind, and you would consider the results pretty good. Now, to get the AI to RTFM about partitioning and generated columns, the results will be stupifying. I am sure there are some clever folks working to include the source and manuals for PostgreSQL into a model so the issues I had above will disappear.
IDEs have become essential tools for many developers, with features like syntax completion. Could the future include a similar collection of features where we supply the DDL and/or DML spec and the AI returns what was requested?
One of my first programming classes featured a section on unexpressed latent desires in specifications. Earlier, I mentioned not specifying a format for addresses. I have been on a project where the multinationalization of addresses became a headache, and maybe the AI could anticipate that issue before the developers notice it with the proper design to accommodate the situation.
I am impressed how well Grok did with my requirements. I am sure it will only get better as time goes on. Do not automatically reject the idea that an AI can produce good SQL. Just like we trust compilers to take our thoughts and turn them into machine code, AI will soon turn our database ideas and create the required DML and DDL.
Update
After finishing this blog's first pass, I was drinking a coffee when the brain asked, 'What if I drop the partitioning requirement to the original query and try again?' The results:
-- Create the customers table
CREATE TABLE customers (
customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
address TEXT NOT NULL,
birth_date DATE NOT NULL
);
-- Insert test data
INSERT INTO customers (first_name, last_name, address, birth_date) VALUES
('John', 'Smith', '123 Main St, Boston, MA 02108', '1985-06-15'),
('Sarah', 'Johnson', '456 Oak Ave, Chicago, IL 60601', '1990-03-22'),
('Michael', 'Brown', '789 Pine Rd, Austin, TX 73301', '1978-11-30'),
('Emily', 'Davis', '321 Elm St, Seattle, WA 98101', '1995-09-12'),
('David', 'Wilson', '654 Cedar Ln, Miami, FL 33101', '1982-04-18'),
('Lisa', 'Anderson', '987 Birch Dr, Denver, CO 80201', '1988-12-05'),
('James', 'Taylor', '147 Maple Way, Portland, OR 97201', '1975-07-25'),
('Rachel', 'Martinez', '258 Spruce Ct, Phoenix, AZ 85001', '1993-01-09'),
('Thomas', 'Lee', '369 Willow Pl, Atlanta, GA 30301', '1980-10-14'),
('Kelly', 'White', '741 Aspen St, San Diego, CA 92101', '1987-08-03');
I ran a quick query:
demo=# select first_name, last_name, birth_date
from customers order by last_name, first_name;
first_name | last_name | birth_date
------------+-----------+------------
Lisa | Anderson | 1988-12-05
Michael | Brown | 1978-11-30
Emily | Davis | 1995-09-12
Sarah | Johnson | 1990-03-22
Thomas | Lee | 1980-10-14
Rachel | Martinez | 1993-01-09
John | Smith | 1985-06-15
James | Taylor | 1975-07-25
Kelly | White | 1987-08-03
David | Wilson | 1982-04-18
(10 rows)
Grok added a note at the end of the code that I had not concentrated on before this iteration,
This SQL code:
- Creates a table named "customers" with:
- customer_id: A UUID primary key that auto-generates using PostgreSQL 17's gen_random_uuid() function
- first_name: A required variable-length string (up to 50 characters)
- last_name: A required variable-length string (up to 50 characters)
- address: A required text field for full addresses
- birth_date: A required date field
- Inserts 10 test records with realistic sample data:
- Varied names
- Addresses in different US cities with plausible ZIP codes
- Birth dates spanning different years from 1975 to 1995
The UUID values will be automatically generated when you run the INSERT statement. Each record will get a unique UUID in the format like '550e8400-e29b-41d4-a716-446655440000'.
You can run this SQL directly in a PostgreSQL 17 database to create the table and populate it with the test data.
So, is this a good thing or a bad thing? Grok was very close to making my socks fly off. I submit that the partitioning issue is a minor issue. Maybe some extra specification on my part could guide it past the rocky shoals of the partitioning definition waters.
I have to admit that I am not very impressed. It seems that - once I understand enough about the ins and outs of partitioning to "guide it past the [...]" I may as well just write the SQL statement myself.
ReplyDelete