Skip to main content

Specialized & performant CSV readers, writers and enrichers for python.

Project description

Build Status

Casanova

If you often find yourself reading CSV files using python, you will quickly notice that, while being more comfortable, csv.DictReader remains way slower than csv.reader:

# To read a 1.5G CSV file:
csv.reader: 24s
csv.DictReader: 84s
casanova.reader: 25s
csvmonkey: 3s
casanova_monkey.reader: 3s

Casanova is therefore an attempt to stick to csv.reader performance while still keeping a comfortable interface, still able to consider headers etc.

Casanova is thus a good fit for you if you need to:

  • Stream large CSV files without running out of memory
  • Enrich the same CSV files by outputing a similar file, all while adding, filtering and editing cells.
  • Have the possibility to resume said enrichment if your process exited
  • Do so in a threadsafe fashion, and be able to resume even if your output does not have the same order as the input

Installation

You can install casanova with pip with the following command:

pip install casanova

If you want to be able to use the faster casanova_monkey namespace relying on the fantastic csvmonkey library, you will also need to install it alongside:

pip install csvmonkey

or you can also install casanova likewise:

pip install casanova[monkey]

Usage

reader

# For the raw python version
import casanova
# Or if you want to rely on faster csvmonkey
import casanova_monkey as casanova

with open('./people.csv') as f:

  # Creating a reader
  reader = casanova.reader(f)

  # Getting header information
  reader.fieldnames
  >>> ['name', 'surname']

  reader.pos
  >>> HeadersPositions(name=0, surname=1)

  name_pos = reader.pos.name
  name_pos = reader.pos['name']
  name_pos = reader.pos[0]

  'name' in reader.pos
  >>> True

  # Iterating over the rows
  for row in reader:
    name = row[name_pos] # it's better to cache your pos outside the loop
    name = row[reader.pos.name] # this works, but is slower

  # Intersted in a single column?
  for name in reader.cells('name'):
    print(name)

  # Interested in several columns (handy but has a slight perf cost!)
  for name, surname in reader.cells(['name', 'surname']):
    print(name, surname)

  # No headers? No problem.
  reader = casanov.reader(f, no_headers=True)

Arguments

  • file file: file object to read.
  • no_headers ?bool [False]: whether your CSV file is headless.

Attributes

  • fieldnames list: field names in order.
  • pos int|namedtuple: header positions object.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

casanova-0.2.0.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

casanova-0.2.0-py3-none-any.whl (8.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page