Skip to main content

Read a CSV file using arbitrary character encodings.

Project description

Could that CSV file be encoded in something other than ASCII? No problem.

This package provides support for reading CSV files that use arbitrary text encodings. It is built on top of Python’s standard csv and codecs packages, and it uses Daniel Blanchard’s ``chardet` universal encoding detector <https://pypi.python.org/pypi/chardet>`_ to guess the encoding for a file, if necessary.

Note that utf-8-sig (UTF-8 with leading Byte Order Mark) is supported. This format is used by recent versions of Microsoft Excel when the user selects “Save As …” and chooses the “CSV UTF-8.”

installation

pip install encoded_csv

using it

There’s just one function: get_csv(), as follows:

encoded_csv.get_csv(csv_file, skip_lines=0, encoding='', dialect='', fieldnames=[], sample_lines=100)

Code in the tests/ directory provides usage examples. The function returns a tuple, in which the first item is a list of the field names. The second item is a list of ordered dictionaries, each containing the data read from a given line of the CSV file.

The first row (after discarding any header lines) is assumed to contain column names.

Keyword arguments:

  • csv_file – path to CSV file to open

  • skip_header_lines – (optional) number of lines to discard in the assumption that they constitute a file header of some sort (default is to skip no lines)

  • encoding – (optional) specifies the encoding which is to be used for the file; the standard python ``codecs` module <https://docs.python.org/3.6/library/codecs.html>`_ is used, so any of the standard encodings may be specified; default behavior is to attempt best guess using chardet)

  • dialect – (optional) a set of parameters specific to a particular CSV dialect; the standard python ``csv` module <https://docs.python.org/3/library/csv.html>`_ is used, so the standard, predefined ``dialect` values or formatting parameters <https://docs.python.org/3/library/csv.html#csv-fmt-params>`_ must be used; default behavior is to attempt best guess using csv.Sniffer.

  • fieldnames – (optional) is used to force the csv.DictReader to use a particular set of fieldnames.

  • sample_lines – (optional) integer used to prepare the sample given to csv.Sniffer() when attempting to detect the CSV dialect in use; default is 100 lines or the entire file, whichever is fewer.

etc.

Bug reports and feature requests are welcome, but really I’d prefer pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

encoded_csv-0.2.tar.gz (16.4 kB view details)

Uploaded Source

File details

Details for the file encoded_csv-0.2.tar.gz.

File metadata

  • Download URL: encoded_csv-0.2.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for encoded_csv-0.2.tar.gz
Algorithm Hash digest
SHA256 22bc58ad6f460b5c6969aaf346adb42cdafc4b131633a919ddf0f2d3f7ae95b5
MD5 ab7476a21e6cc03a58d91d168a7d64af
BLAKE2b-256 9e3e935e3043a863d3656fb136d9992589bf6a42f2dd10fb25fede2fa5c8c58e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page