Skip to main content

Read fixed width data files

Project description

pyfixwidth

pyfixwidth reads fixed-width text files and converts each record into Python values. It can be used as a command-line tool that writes delimited output or as a small parsing library inside your own code.

The package has no runtime dependencies and is designed to stay lightweight.

Install

pip install pyfixwidth

Quickstart

The repository includes a small example layout and sample data files:

python -m fixwidth example/data.layout example/data1.txt example/data2.txt

This writes tab-separated output to standard output:

employee_id	job_title	salary	hire_date
100001	CEO	15000.0	1995-08-23
100002	Programmer	8500.0	2002-11-10
100003	Data Scientist	10000.0	2005-07-01
100004	Sales Rep	5000.0	1999-06-01
100005	Customer Servic	4800.0	2001-12-17

If you install the package, the same command is also available as:

pyfixwidth example/data.layout example/data1.txt example/data2.txt

Layout File Format

A layout file is tab-delimited and describes how each source field should be read. The first line is a title, then each later line contains:

  1. field width
  2. converter name
  3. field name

Example:

employees
# records on workers and their salaries
  6	int	employee_id
 15	str	job_title
  8	float	salary
# negative values denote fields to skip when reading data
 -3	str	blank
 10	date	hire_date

Rules:

  • Comments begin with # and must occupy their own line.
  • Negative widths skip bytes in the input and do not appear in parsed rows.
  • Blank field content becomes None before type conversion.
  • A layout can be loaded from disk with read_file_format() or supplied directly as a sequence of (width, datatype, name) tuples.

Supported Converters

Type Meaning Accepted values
str text any decoded string
int integer values accepted by int()
float floating point number values accepted by float()
bool boolean Python truthiness via bool()
yesno yes/no boolean Y, N, Yes, No and lowercase variants
date date 1995-08-23, 19950823, 23aug1995, 1995-8-23, 122599
datetime date with time 1995-08-23 14:30:00.000 and similar ISO-like values
julian Julian date YYYYDDD, with optional separators removed before parsing
time time 14:30:00, 14.30.00, 143000, 09:00, 0900

date and datetime formats are inferred with regular expressions, so if you have unusual source formats you may want to register a custom converter.

Python API

For most code, these are the main entry points:

  • read_file_format(path) loads a layout file and returns (title, spec).
  • parse_file(path, spec=...) yields OrderedDict rows from a file on disk.
  • parse_lines(lines, spec=...) parses an iterable of binary lines.
  • DictReader(fileobj, fieldinfo=...) provides a csv.DictReader-like iterator for binary file objects.
  • register_type(name) lets you add custom converters.

Parse a Layout and a Data File

from fixwidth import read_file_format, parse_file

title, layout = read_file_format('example/data.layout')

print(title)

rows = parse_file('example/data1.txt', spec=layout, type_errors='ignore')
for row in rows:
    print('Salary for {} is {}'.format(row['employee_id'], row['salary']))

Use DictReader

DictReader expects a binary file object:

import fixwidth

with open('example/data1.txt', 'rb') as fh:
    reader = fixwidth.DictReader(
        fh,
        fieldinfo='example/data.layout',
        skip_blank_lines=True,
    )
    first_row = next(reader)
    print(first_row['job_title'])

You can also pass the layout directly:

layout = [
    (6, 'int', 'employee_id'),
    (15, 'str', 'job_title'),
    (8, 'float', 'salary'),
    (-3, 'str', 'blank'),
    (10, 'date', 'hire_date'),
]

with open('example/data1.txt', 'rb') as fh:
    reader = fixwidth.DictReader(fh, layout)
    print(next(reader))

Custom Converters

Converters live in fixwidth.converters. To register a new one, decorate a function that accepts a decoded string and returns the converted value.

from fixwidth.converters import register_type

@register_type('uppercase')
def convert_uppercase(value):
    return value.strip().upper()

After registration, the new type name can be used in layouts just like the built-in types.

Troubleshooting

  • Open files in binary mode when using DictReader.
  • parse_file() defaults to encoding='ascii'.
  • parse_lines() defaults to encoding='utf-8'.
  • Use type_errors='ignore' to replace invalid values with None and keep parsing.
  • skip_blank_lines=True ignores lines that are empty after removing trailing newlines. Lines that contain only spaces still produce a row of None values.

More Documentation

Additional documentation lives in docs/index.md:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfixwidth-0.3.2.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyfixwidth-0.3.2-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file pyfixwidth-0.3.2.tar.gz.

File metadata

  • Download URL: pyfixwidth-0.3.2.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyfixwidth-0.3.2.tar.gz
Algorithm Hash digest
SHA256 eff4158def6bc0218f7802527eebb80f098fb239822136062f763570f213b901
MD5 9add0513ac495db3bca527de7b7a5820
BLAKE2b-256 24ff97643c21b2e26fd2cbca2514c42beb37f469cf7ab9c03064265a0996df8d

See more details on using hashes here.

File details

Details for the file pyfixwidth-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: pyfixwidth-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyfixwidth-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 40ea5dc70016b49540ddf4c64dd187423a3fa2aad4677e700521459e563e3055
MD5 956cd89064a6da34b59c9ed09940f7aa
BLAKE2b-256 9439d8bab69c713c08c13e305ce41d0e5337a40e43edab06f76b8a4ae937df8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page