Read fixed width data files
Project description
pyfixwidth
pyfixwidth reads fixed-width text files and converts each record into Python
values. It can be used as a command-line tool that writes delimited output or
as a small parsing library inside your own code.
The package has no runtime dependencies and is designed to stay lightweight.
Install
pip install pyfixwidth
Quickstart
The repository includes a small example layout and sample data files:
python -m fixwidth example/data.layout example/data1.txt example/data2.txt
This writes tab-separated output to standard output:
employee_id job_title salary hire_date
100001 CEO 15000.0 1995-08-23
100002 Programmer 8500.0 2002-11-10
100003 Data Scientist 10000.0 2005-07-01
100004 Sales Rep 5000.0 1999-06-01
100005 Customer Servic 4800.0 2001-12-17
If you install the package, the same command is also available as:
pyfixwidth example/data.layout example/data1.txt example/data2.txt
Layout File Format
A layout file is tab-delimited and describes how each source field should be read. The first line is a title, then each later line contains:
- field width
- converter name
- field name
Example:
employees
# records on workers and their salaries
6 int employee_id
15 str job_title
8 float salary
# negative values denote fields to skip when reading data
-3 str blank
10 date hire_date
Rules:
- Comments begin with
#and must occupy their own line. - Negative widths skip bytes in the input and do not appear in parsed rows.
- Blank field content becomes
Nonebefore type conversion. - A layout can be loaded from disk with
read_file_format()or supplied directly as a sequence of(width, datatype, name)tuples.
Supported Converters
| Type | Meaning | Accepted values |
|---|---|---|
str |
text | any decoded string |
int |
integer | values accepted by int() |
float |
floating point number | values accepted by float() |
bool |
boolean | Python truthiness via bool() |
yesno |
yes/no boolean | Y, N, Yes, No and lowercase variants |
date |
date | 1995-08-23, 19950823, 23aug1995, 1995-8-23, 122599 |
datetime |
date with time | 1995-08-23 14:30:00.000 and similar ISO-like values |
julian |
Julian date | YYYYDDD, with optional separators removed before parsing |
time |
time | 14:30:00, 14.30.00, 143000, 09:00, 0900 |
date and datetime formats are inferred with regular expressions, so if you
have unusual source formats you may want to register a custom converter.
Python API
For most code, these are the main entry points:
read_file_format(path)loads a layout file and returns(title, spec).parse_file(path, spec=...)yieldsOrderedDictrows from a file on disk.parse_lines(lines, spec=...)parses an iterable of binary lines.DictReader(fileobj, fieldinfo=...)provides acsv.DictReader-like iterator for binary file objects.register_type(name)lets you add custom converters.
Parse a Layout and a Data File
from fixwidth import read_file_format, parse_file
title, layout = read_file_format('example/data.layout')
print(title)
rows = parse_file('example/data1.txt', spec=layout, type_errors='ignore')
for row in rows:
print('Salary for {} is {}'.format(row['employee_id'], row['salary']))
Use DictReader
DictReader expects a binary file object:
import fixwidth
with open('example/data1.txt', 'rb') as fh:
reader = fixwidth.DictReader(
fh,
fieldinfo='example/data.layout',
skip_blank_lines=True,
)
first_row = next(reader)
print(first_row['job_title'])
You can also pass the layout directly:
layout = [
(6, 'int', 'employee_id'),
(15, 'str', 'job_title'),
(8, 'float', 'salary'),
(-3, 'str', 'blank'),
(10, 'date', 'hire_date'),
]
with open('example/data1.txt', 'rb') as fh:
reader = fixwidth.DictReader(fh, layout)
print(next(reader))
Custom Converters
Converters live in fixwidth.converters. To register a new one, decorate a
function that accepts a decoded string and returns the converted value.
from fixwidth.converters import register_type
@register_type('uppercase')
def convert_uppercase(value):
return value.strip().upper()
After registration, the new type name can be used in layouts just like the built-in types.
Troubleshooting
- Open files in binary mode when using
DictReader. parse_file()defaults toencoding='ascii'.parse_lines()defaults toencoding='utf-8'.- Use
type_errors='ignore'to replace invalid values withNoneand keep parsing. skip_blank_lines=Trueignores lines that are empty after removing trailing newlines. Lines that contain only spaces still produce a row ofNonevalues.
More Documentation
Additional documentation lives in docs/index.md:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfixwidth-0.3.2.tar.gz.
File metadata
- Download URL: pyfixwidth-0.3.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eff4158def6bc0218f7802527eebb80f098fb239822136062f763570f213b901
|
|
| MD5 |
9add0513ac495db3bca527de7b7a5820
|
|
| BLAKE2b-256 |
24ff97643c21b2e26fd2cbca2514c42beb37f469cf7ab9c03064265a0996df8d
|
File details
Details for the file pyfixwidth-0.3.2-py3-none-any.whl.
File metadata
- Download URL: pyfixwidth-0.3.2-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40ea5dc70016b49540ddf4c64dd187423a3fa2aad4677e700521459e563e3055
|
|
| MD5 |
956cd89064a6da34b59c9ed09940f7aa
|
|
| BLAKE2b-256 |
9439d8bab69c713c08c13e305ce41d0e5337a40e43edab06f76b8a4ae937df8a
|