Skip to main content
Help us improve Python packaging – donate today!

Generates a CREATE TABLE statement from a CSV file by guessing at column types

Project Description

When importing CSV files to databases, I sometimes find it tedious to create the table in a database first. This is a very basic app (and a work in progress) that grabs the rows of a CSV file (sampling, if specified), uses the sample to guess at the data types of the columns, and uses these guesses to output a SQL `CREATE TABLE` statement. Think of it as a version of `CREATE TABLE some_table AS SELECT * FROM <some CSV file>`.

For the moment, this is aimed towards PostgreSQL (although that will change) and for now, a very limited number of data types will be supported, namely `text`, `boolean`, `smallint`, `int`, `bigint`, `numeric`, `date`, and `timestamp` (which are the types that I find myself using the most often).

Here's the flowchart of the type guessing (it's also available in the `images` folder of this repo):

![flowchart](https://github.com/jackmaney/csv-to-table/blob/master/images/type_guessing_flowchart.png?raw=true)

Here's a quick example:

Input:
```
$cat test.csv
a,b,c,d,e
32,"2013-12-28 22:16:57",0,"True","Here's some text, and some more"
"17.1","2012-06-29 05:11:00", -3,"False","Yep, more text"
```

Code:

```
from typeguesser import TypeGuesser

tg = TypeGuesser("test.csv", header=True)

tg.guessTypes()

print tg.getCreateStatement()
```

Output:

```
$ python test.py
CREATE TABLE test (
a numeric,
b timestamp,
c smallint,
d boolean,
e text
);
```

Or, equivalently, you can use the included script `csv-to-table.py`:

```
$ ./csv-to-table.py -h
usage: csv-to-table.py [-h] [--header] [--sample SAMPLE]
[--quotechar QUOTECHAR] [--table_name TABLE_NAME]
file

positional arguments:
file The name (and path) of the target CSV file

optional arguments:
-h, --help show this help message and exit
--header Indicate whether or not the file has a header
--sample SAMPLE Sampling probability (between 0 and 1). If set, this
gives the sampling probability for rows of the given
CSV file
--quotechar QUOTECHAR
The quote character to use for the CSV file (default
'"')
--table_name TABLE_NAME
The name of the table desired in the output
```

So, in particular:

```
$ ./csv-to-table.py --header test.csv
CREATE TABLE test (
a numeric,
b timestamp,
c smallint,
d boolean,
e text
);
```

Release history Release notifications

History Node

0.0.3

History Node

0.0.2

This version
History Node

0.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
csv_to_table-0.0.1.tar.gz (5.3 kB) Copy SHA256 hash SHA256 Source None Feb 8, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page