CSV Loading Error Fix: Data Too Long Or Value Too Long

 You are trying to load a CSV file into a database, and almost inevitably, you run into an ERROR: value too long for type character varying 64 if you are using PostgreSQL.  Or SQL Error [1406] [22001]: Data truncation: Data too long for column 'Name' at row 1 if you are using MariaDB or MySQL.

It doesn't matter if you are using the DBeaver application or the native database import tools. Newbies will look at the first line of the CSV file to examine the very first line and scratch their heads, as there is no error in the first line.

The database is ingesting the data one line at a time, so each line in the file is the first line. Your line in the CSV file is hiding deep in the file. And there may be many of them in there. 

The error message you receive is helpful, but it DOES NOT tell you which line in the CSV file is causing you this grief. You could use an 'eyeball inspection' of the data, which quickly becomes tedious. And if the CSV file has thousands of lines, your odds of finding the offending line grow longer.

Splitting up the file into smaller chunks is labor-intensive.

Example

I was loading data from the HMS Titanic data set when I received the following error:

Error occurred during batch insert
(you can disable batch insert to skip particular rows).

Reason:

SQL Error [22001]: Batch entry 0 INSERT INTO public.titanic ("PassengerId","Survived","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked")

VALUES (('308'::int8),('1'::int8),('1'::int8),('Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)'),('female'),('17'::int8),('1'::int8),('0'::int8),('PC 17758'),('108.9'::real),('C65'),('C')) was aborted: ERROR: value too long for type character varying(64)  Call getNextException to see other errors in the batch.

A Difference Between PostgreSQL and MariaDB/MySQL

PostgreSQL will return the offending line in the error message, as shown above. MariaDB and MySQL do not. They provide only SQL Error [1406] [22001]: Data truncation: Data too long for column 'Name' at row 1.

DBeaver's import tool selected a VARCHAR(64) for the name field after examining some of the records. The unfortunate Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo needs more space. And there may be other records with names as long, or longer.

Solution

You can use Google Sheets to examine the CSV file before attempting to load it into the database. 

We can use =max(arrayformula(len(D1:D891))) on the columns with the names to determine the maximum length of a name. The longest name in this data is eighty-two characters long.  

If you read the linked blog post, you will see that I defined the Name column as a TEXT field to solve the issue. But what if you are trying to save space? Knowing the width of that column is valuable.

So now you have a tool to determine the width of a column when you get the dreaded 'value too long' 



Comments

  1. Not sure what you are getting at with "But what if you are trying to save space? Knowing the width of that column is valuable." ? From here https://www.postgresql.org/docs/current/datatype-character.html: "The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less."

    ReplyDelete

Post a Comment

Popular posts from this blog

Incremental Backups in PostgreSQL 17

How PostgreSQL's Aggregate FILTER Will Spoil You

Can Artificial Intelligence Created Better Tables Than You?