Skip to main content

Read fixed width data files

Project description

Read Fixed Width Files

Python 3 module for reading fixed width data files and converting the field contents to appropriate Python types.

Running the program

The module can be run from the command line as follows:

python -m fixwidth data.layout data1.txt data2.txt

where data1.txt and data2.txt contain records and data.layout contains a description of the records and how to parse each field. By default, the records will be written as tab-separated values to stdout.

Specifying fixed width layout

The data.layout file should be tab-delimited and might look like this:

employees
# records on workers and their salaries
 6	int	employee_id
15	str	job_title
 8	float	salary
# negative values denote fields to skip when reading data
-3	str	blank
10	date	hire_date

The file starts with a title. This could be used to map records to a database table or file name when using the module from other code. Comments begin with # and must be on their own line. Each line describes a data field. The first value is the field width, the second value describes how to convert the data to a Python object, and the third value is a field name.

Note that negative field widths are used to specify text that should be ignored/discarded when reading the data.

Data types

The possible values for the second value of the layout (the data type) are:

  • str: textual data
  • int: integers
  • float: floating point numbers
  • bool: boolean (True/False) values
  • yesno: parses values like Y, N, Yes, No to a True/False value
  • date: dates like 1995-08-23, 19950823, or 23aug1995
  • datetime: dates with time like 1995-08-23 14:30:00.000
  • julian: Julian dates in YYYYDDD format where DDD is day-of-year

Date types

Currently, the date and datetime types will guess the format of the date using regular expressions. This could be improved by adding more robust methods or adding some way to specify a date format in the layout file.

Adding more data types

Types are defined in converters.py and it is trivial to add more types. To add a type, apply the fixwidth.converters.register_type decorator to a function that takes string input and returns a single object:

from fixwidth.converters import register_type

@register_type('foo')
def convert_foo(value):
    """Convert any input to the string 'foo!'"""
    return 'foo!'

The type foo can then be used in layouts like the above column types.

Usage as a module

There is a fixwidth.DictReader class that resembles csv.DictReader in usage, but requires files be opened in binary mode:

import fixwidth

with open('example/data1.txt', 'rb') as fh:
    rdr = fixwidth.DictReader(
        fh,
        fieldinfo='example/data.layout',
        skip_blank_lines=True
    )
    next(rdr)

The fieldinfo parameter can be a path to a layout file (described above) or a sequence of tuples describing the columns:

layout = [
    (6, 'int', 'employee_id'),
    (15, 'str', 'job_title'),
    (8, 'float', 'salary'),
    (-3, 'str', 'blank'),
    (10, 'date', 'hire_date')
]

with open('example/data1.txt', 'rb') as fh:
    rdr = fixwidth.DictReader(fh, layout)

Alternatively, you can use the functions read_file_format and parse_file:

from fixwidth import read_file_format, parse_file

# read a layout file describing how records are formatted
title, layout = read_file_format('example/data.layout')

# title is 'employees' for the above layout example
# layout is a list of namedtuple objects with (width, datatype, name)

# parse a data file
rows = parse_file('example/data1.txt', spec=layout, type_errors='ignore')

# type_errors determines what should happen when field content does not
# match the given datatype (e.g. an int column containing 'abc'). Use
# 'ignore' to replace fields with None and 'raise' to raise ValueError.

for r in rows:
    print('Salary for {} is {}'.format(r['employee_id'], r['salary'])

# rows is a generator that yields OrderedDict objects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfixwidth-0.2.2.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyfixwidth-0.2.2-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file pyfixwidth-0.2.2.tar.gz.

File metadata

  • Download URL: pyfixwidth-0.2.2.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyfixwidth-0.2.2.tar.gz
Algorithm Hash digest
SHA256 8a1cf846b907f21c2b4ea7538574f8aa9f2070b031691310b1bced1c85abcb6d
MD5 68f42acca4bbd22c70fdea757a33d2a0
BLAKE2b-256 9facecc236301c3f8d25ed66ee70efed8667012b9f8d96c72ef105a6aa3446ce

See more details on using hashes here.

File details

Details for the file pyfixwidth-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: pyfixwidth-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyfixwidth-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7e35e6a1b57365292123eb16e757c1083d1e26332507ea3f71e20e7c6b9dfe9d
MD5 b5bfadab636aa0d996eabe3e2eca1fd4
BLAKE2b-256 574a8295a975e205dcdf1b12bd4c0e792b5ff0c771b697e530fc27ec3a8ba7eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page