Skip to main content

Load data from CSV files into database tables

Project description

csv-ingestor

Load data from CSV files into PostgreSQL tables.

Installation

pip install csv-ingestor

Examples

In the simplest case, load data from a CSV file into a table:

from csv_ingestor import Ingestor, ingest_file

class MyIngestor(Ingestor):
    filename_pattern = r'simple\.\d{8}_\d{4}\.csv(\.gz)?'
    tables = [
        {
            'table': 'my_table',
            'csv_columns': ('id', 'value'),
        }
    ]

ingest_file('simple.20240910_1430.csv.gz')

But maybe you have multiple tables to load from different CSV files, or from different fields in each file, and the column names don't match what's in the CSV files, and the data isn't quite the right shape either, and you'd like to skip some CSV records, and you'd like to update existing DB records:

from csv_ingestor import CSVPicker, Ingestor, SkipRecord, ingest_file

class MyPicker(CSVPicker):

    def check_skip(self, record):
        if record['value'].startswith('SKIP!'):
            raise SkipRecord

    def modify_record(self, record):
        record['value'] = record['value'].replace('bad words', '@!#$*%&')


class OneIngestor(Ingestor):
    filename_pattern = r'data\.\d{8}_\d{4}\.csv(\.gz)?'
    tables = [
        {
            'table': 'my_first_table',
            'csv_columns': ('their_id', 'their_value'),
            'column_map': {'their_id': 'id', 'their_value': 'value'},
            'on_conflict': '(id) DO UPDATE SET value = excluded.value',
        }
    ]

class AnotherIngestor(Ingestor):
    filename_pattern = r'other_data\.\d{8}\.csv(\.gz)?'
    csv_picker = MyPicker
    tables = [
        {
            'table': 'my_other_table',
            'csv_columns': ('id', 'value'),
            'on_conflict': '(id) DO UPDATE SET value = excluded.value',
        },
        {
            'table': 'a_third_table',
            'csv_columns': ('id', 'metadata'),
            'on_conflict': '(id) DO NOTHING',
        }
    ]

ingest_file('data.20240910_1430.csv.gz')
ingest_file('other_data.20240910.csv')

Each Ingestor subclass will be tried in turn until one matches the filename, and that one will be used to parse and load the data into its DB tables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_ingestor-0.1.3.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_ingestor-0.1.3-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file csv_ingestor-0.1.3.tar.gz.

File metadata

  • Download URL: csv_ingestor-0.1.3.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for csv_ingestor-0.1.3.tar.gz
Algorithm Hash digest
SHA256 affcd91111b1a1466bdcc20ef542dd2f74bd3b22774b6f73c59092a633fd5f54
MD5 1993c4f65075297aff892c4894ff8b0f
BLAKE2b-256 3896aad656e77e129e175c75579d83021a02b12d65c10dbe4b15fcf4c718a4f2

See more details on using hashes here.

File details

Details for the file csv_ingestor-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: csv_ingestor-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for csv_ingestor-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d3b49898cb851c654e9ae576947ee6b8bd6f66e6bcfa1ba28a6b2b8366a470ff
MD5 f6c8b911f942c8630e2b26ad051387c3
BLAKE2b-256 e7124b6e45694e386a6375e4b3f5891e48eb1a49fc9a937ee318d82cc54bab77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page