Skip to main content

Convert your csvs fast

Project description

CSVS Convert

Converts CSV files into XLSX/SQLITE/POSTGRESQL/PARQUET fast.

Install

pip install csvs_convert

Docs

Full Documentaion

Aims

  • Thorough type guessing of CSV columns, so there is no need to configure types of each field. Scans whole file first to make sure all types in a column are consistent. Can detect over 30 date/time formats as well as JSON data.
  • Quick conversions/type guessing (uses rust underneath). Uses fast methods specific for each output format:
    • copy for postgres
    • Prepared statements for sqlite using c API.
    • Arrow reader for parquet
    • Write only mode for libxlsxwriter
  • Tries to limit errors when inserting data into database by resorting to "text" if type guessing can't determine a more specific type.
  • When inserting into existing databases automatically migrate schema of target to allow for new data (evolve option).
  • Memory efficient. All csvs and outputs are streamed so all conversions should take up very little memory.
  • Gather stats and information about CSV files into datapacakge.json file which can use it for customizing conversion.

Drawbacks

  • CSV files currently need header rows.
  • Whole file needs to be on disk as whole CSV is analyzed therefore files are read twice.

Conversion Docs

This is the python library, providing bindings to the rust library.

Contribute on github

Usage From CSV files.

import csvs_convert

#sqlite
csvs_convert.csvs_to_sqlite("sqlite.db", ["file.csv"])
#postgres
csvs_convert.csvs_to_postgres("postgresql://user:postgres@localhost/db", ["file.csv"])
#parquet
csvs_convert.csvs_to_parquet("output", ["file.csv"])
#xlsx
csvs_convert.csvs_to_xlsx("output.xlsx", ["sqlite.db"])

Usage from datapackage

A datapackage is a file that contains metadata about the tables its specification is described here.

To generate datapackage.json file you can use:

csvs_convert.csvs_to_datapackage('path/to/datapackage.json', ["fixtures/large/csv/data.csv"])

Other tools can also generate these files.

You can use this file and alter it as needed. Mostly it is useful if you want to use the same schema across multiple files, as it will save time not having to do the type guessing for every file.

When referring to a datapackage you can either reference:

  • A datapackage.json file.
  • A datapackage directory containing a datapackage.json file. e.g. /a/datapackage/dir
  • A zip file containing a datapackage.json file. e.g. my_datapackage.zip

Examples:

import csvs_convert

#sqlite
csvs_convert.datapackage_to_sqlite("sqlite.db", "path/to/datapackage.json")
#postgres
csvs_convert.datapackage_to_postgres("postgresql://user:postgres@localhost/db", "path/to/datapackage.json")
#parquet
csvs_convert.datapackage_to_parquet("path/to/directory", ["sqlite.db"])
#xlsx
csvs_convert.datapackage_to_xlsx("output.xlsx", "path/to/datapackage.json")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvs_convert-0.7.3.tar.gz (99.2 kB view hashes)

Uploaded Source

Built Distributions

csvs_convert-0.7.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

csvs_convert-0.7.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

csvs_convert-0.7.3-cp311-cp311-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (10.0 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

csvs_convert-0.7.3-cp311-cp311-macosx_10_7_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

csvs_convert-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

csvs_convert-0.7.3-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (10.0 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

csvs_convert-0.7.3-cp310-cp310-macosx_10_7_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

csvs_convert-0.7.3-cp39-none-win_amd64.whl (4.4 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

csvs_convert-0.7.3-cp39-none-win32.whl (4.1 MB view hashes)

Uploaded CPython 3.9 Windows x86

csvs_convert-0.7.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

csvs_convert-0.7.3-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (10.0 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

csvs_convert-0.7.3-cp39-cp39-macosx_10_7_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

csvs_convert-0.7.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

csvs_convert-0.7.3-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (10.0 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

csvs_convert-0.7.3-cp38-cp38-macosx_10_7_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

csvs_convert-0.7.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

csvs_convert-0.7.3-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (10.0 MB view hashes)

Uploaded CPython 3.7m macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

csvs_convert-0.7.3-cp37-cp37m-macosx_10_7_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.7m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page