Posts

Showing posts from October, 2025

Loading The Titanic Passenger Data Into PostgreSQL With DBeaver Part 2

In the last edition of this blog, the passenger list data from the HMS Titanic was loaded into a preliminary database. Now it is time to refine. I am using DBeaver Enterprise 25.2.0. PostgreSQL 18, and Github Copilot with Gpt-4.  Prompt: Any recommendations on improving this table for storage efficiency? This prompt was entered into the DBeaver AI Assistant. To improve the storage efficiency of the "titanic" table, we can consider the following recommendations: ### 1. **Optimize Data Types** - **"PassengerId"**: If the number of passengers is not expected to exceed 2,147,483,647, the `int4` type is fine. However, if the range is smaller (e.g., less than 32,767), you could use `int2` (smallint) to save space. - **"Survived"**: Since this is likely a binary value (0 or 1), you can use `boolean` instead of `int4`. - **"Pclass"**: If the class is limited to 1, 2, or 3, you can use `int2` (smallint) instead of `int4`. - **"Sex"*...

Loading The Titanic Passenger Data Into PostgreSQL With DBeaver Part 1

Image
The Sinking of the RMS Titanic in 1912 has shocked and entertained for over a century.  I will admit to being shocked to find a GitHub Repo with a CSV formatted file with the passenger list. This file is an interesting glimpse into the lives of the folks who sailed on the ill fated ship so long ago. And interesting datasets are always good for examples. I had been looking for a project to show off what I call 'Lifting and Shifting' but others call ETL.  At the PG NYC 2025 Conference, Ryan Booz had a session title Transforming Data with the Power of PostgreSQL and SQL. For many years, ETL was considered as the way to move data from one source to another  Ryan argued that it should be ELT, and rather convincingly.  ETL? ETL is a three-step data integration process (Extract, Transform, Load) used to collect raw data from various sources, process it into a consistent, usable format, and then load it into a central data warehouse or database for analytics and busin...