Skip to main content

A pure Python library supporting Uniform eXchange Format, a plain text human readable optionally typed storage format.

Project description

Python UXF Library

UXF is a plain text human readable optionally typed storage format. UXF may serve as a convenient alternative to csv, ini, json, sqlite, toml, xml, or yaml.

For details of the Uniform eXchange Format (UXF) supported by this library, see the UXF Overview. (PyPI link to UXF Overview.)

Introduction

The Python uxf library works out of the box with the standard library and one dependency, editabletuple. It requires Python 3.8 or later.

  • Install: python3 -m pip install uxf (or download the wheel .whl files and do python -m pip install uxf....whl editabletuple...whl where ... varies)
  • Run: python3 -m uxf -h # this shows the command line help
  • Use: import uxf # see the uxf.py module docs and API for the API

Using uxf as an executable (e.g., python3 -m uxf ...) provides a means of doing .uxf to .uxf conversions (e.g., compress or uncompress or to use the standard pretty print format). The executable can also be used for linting, for deleting unused ttypes, and for replacing imports to ensure that UXF files are stand-alone.

Installed alongside uxf.py are: uxfconvert.py, uxfcompare.py, and uxflint.py. For example, uxfconvert.py can losslessly convert .uxf to .json or .xml and back. It can also do some simple conversions to and from .csv, to .ini, and to and from .sqlite, but these are really to illustrate use of the uxf.py APIs. uxfcompare.py can compare two UXF files, either for equality, i.e., they both contain the same data, with lists and tables compared value by value in order, and maps compared by item—regardless of order (since UXF Maps are unordered), or for equivalence. A UXF file can be linted using python3 -m uxf -l file.uxf; but to lint any number of files, use uxflint.py.

Also see the UXF test files in the ../testdata folder, the Python examples in the py/eg folder, and the Python tests in the py/t folder.

If you just want to create a small standalone .pyz, you could simply copy uxf.py and editabletuple.py into your project folder and include them in your .pyz file.

Python UXF Types

Most Python types convert losslessly to and from UXF types. In particular:

Python Type UXF type Notes
None null
bool bool
bytes bytes
bytearray bytes Lossless conversion but type changes to Python bytes
datetime.date date
datetime.datetime datetime Loses any timezone and only preserves to the nearest 1-second
int int
float real
str str
uxf.List list
uxf.Map map
uxf.Table table

A List is a Python collections.UserList subclass with .data (the list), .comment and .vtype attributes. A .vtype holds a str representing the name of the corresponding UXF type (e.g., int, real, etc.) or a ttype (the name of a custom table type; see TClass). If a .vtype is None it means that any UXF type may be stored.

Similarly a Map is a Python collections.UserDict subclass with .data (the dict), .comment, .ktype, and .vtype attributes. The .ktype is a str like the .vtype except that it may only be None (accept any valid ktype) or one of bytes, date, datetime, int, or str. The .vtype works just the same as for Lists.

The Table class has .records, .comment, and .tclass attributes; with .tclass holding a TClass which in turn holds the table's ttype (i.e., its table type name), and the table's field names and (optional) types. In all cases a type of None signifies that any type valid for the context may be used.

Custom types can be represented using ttypes. For example, for points you could define a ttype such as: =point x:real y:real. Then you could include single points like (point 1.5 7.2), or many of them such as (point 1.5 7.2 8.3 -9.4 14.8 0.6).

Collection types such as set, frozenset, tuple, or collections.deque are automatically converted to a List when they are encountered. Of course, these are one way conversions.

API

The simplest part of the API loads and saves (dumps) UXF data from/to strings or files.

Reading and Writing UXF Data

load(filename_or_filelike): -> uxo
loads(uxt): -> uxo

The load() function reads UXF data from a file or file-like object, and the loads() function reads UXF data from a string. These functions take several optional arguments; see load() and loads(). The returned uxo (UXF object) is of type Uxf.

dump(filename_or_filelike, data)
dumps(data) -> uxt

The dump() function writes the data in UXF format to a file or file-like object, and the dumps() function writes the data into a string that's then returned (here called uxt to indicate UXF text). These functions take several optional arguments; see dump() and dumps(). The data can be a Uxf object or a single list, List, dict, Map, or Table.

If the data contains values of types that aren't supported by UXF, they could be transformed in advance (e.g., to a custom table type, a ttype).

See also the examples in the py/eg folder and the tests in the py/t folder.

API Notes

A UXF object (called a uxo in these docs) has a .value attribute that is always a List or Map or Table. The first two have essentially the same APIs as list and dict respectively. The Table API is a little similar to a list. Individual records can be accessed (read and updated) using table[row], and individual files using table[row][column] or table[row].fieldname. For small tables, records can be accessed with the first, second, third, or fourth properties. The best way to append new records is to use the Table's append() method.

The uxf module distinguishes between a ttype (the name of a user-defined table) and a TClass (the Python class which represents a user-defined table). A TClass has .ttype and .fields attributes.

For reading UXF data it is easiest to iterate, and to do so recursively, if the data has nested collections. Note that the visit.py example provides a way of doing this using the visitor design pattern. The uxfconvert.py example shows alternative approaches to manual iteration.

Note that for imports, if the filename is relative it is searched for in the same folder as the importing UXF file, and if not found there, in the current folder, and if not there either, in each of the paths in UXF_PATH (if any).

Note that the __version__ is the module version (i.e., the version of this implementation), while the VERSION is the maximum UXF version that this module can read (and the UXF version that it writes).

Classes

The classes are documented in convenience order. Here are alphabetically ordered links: Error, Field, Format, List, Map, Table, TClass, Uxf.

Uxf

A Uxf object represents a UXF file in memory.

The easiest way to obtain a Uxf object is to use one of the module-level functions, load() or loads(). Of course, you can also create Uxfs programatically (as shown in many of the examples and tests).

Constructor

Uxf(value=None, *, custom='', tclasses=None, comment=None)

The Uxf constructor takes an optional value (a list, List, tuple, dict, Map, or Table) and optional (by keyword arguments) custom string, tclasses (a dict whose names are ttypes and whose values are the corresponding TClasses), and comment string.

To create a Uxf programmatically (rather than by using, say, load(), you can either create it empty, or with some data and optionally, some ttypes.

value = ... # a list, uxf.List, dict, uxf.Map, or uxf.Table
point_ttype = uxf.TClass('point', (uxf.Field('x', 'real'), uxf.Field('y', 'real')))
uxo = uxf.Uxf(value, tclasses={point_ttype.ttype: point_ttype})

An alternative is to do this:

uxo = uxf.loads('uxf 1.0\n=point x:real y:real\n[]')
uxo.value = value
Properties

.value

A List, Map, or Table. Since these classes can nest, a Uxf can represent any arbitrary data structure.

.custom

An opional str used for customizing the file format, i.e., the header text that follows the initial uxf 1.0 on the first line.

.tclasses

A possibly empty dict where each key is a ttype (i.e., the .tclass.ttype which is a TClass's name—a str) and each value is a TClass object.

.comment

An optional file-level str.

.imports

A possibly empty dict where each key is a ttype and where each value is a str holding that type's import text.

.import_filenames

A utility property useful for some UXF processors. It yields all the unique import filenames.

Methods

.load(filename_or_filelike, *, on_error=on_error, drop_unused=False, replace_imports=False)

Convenience method that wraps the module-level load() function.

.loads(uxt, filename='-', *, on_error=on_error, drop_unused=False, replace_imports=False)

Convenience method that wraps the module-level loads() function.

.dump(filename_or_filelike, *, on_error=on_error, format=Format())

Convenience method that wraps the module-level dump() function.

.dumps(*, on_error=on_error, format=Format())

Convenience method that wraps the module-level dumps() function.

List

A class used to represent a UXF list. It is a collections.UserList subclass which holds its list in the .data attribute and that also has .comment and .vtype attributes.

Constructor

List(seq=None, *, vtype=None, comment=None)

The seq can be a list, tuple, or any iterable acceptable to the built-in list() type. The vtype is a str signifying the type of value this list may hold; None means values may be of any UXF-compatible type (see Python UXF Types). A str may be passed as the comment.

See Python UXF Types for more about vtypes.

Map

A class used to represent a UXF map. It is a collections.UserDict subclass that holds its dict in the .data attribute and that also has .comment, .ktype, and .vtype attributes.

Note that although Python dicts are insertion-ordered, UXF Maps are unordered. This means that while this Python UXF library will always load and dump Map items in the same order, other UXF libraries may not. (The py/uxfcompare.py utility and the py/eg/compare.py example both correctly compare two UXF files regardless of Map key order; useful for comparing two UXFs with different texts and useful for regression testing.)

Constructor

Map(d=None, *, ktype=None, vtype=None, comment=None)

The d can be a dict or any single value acceptable to the built-in dict() type. The ktype is a str signifying the type of key that must be used; None means that any valid ktype, i.e., 'bytes', 'date', 'datetime', 'int', or 'str'. The vtype is a str signifying the type of value this Map may hold; None means values may be of any UXF-compatible type (see Python UXF Types). A str may be passed as the comment.

See Python UXF Types for more about ktypes and vtypes.

Table

A class used to store UXF Tables.

Constructor

Table(tclass=None, *, records=None, comment=None)

A Table can be created using the constructor, passing a TClass, and optionally, records (a list of lists, where each sublist has len(tclass.fields) values), and a comment (a str). Alternativvely, use the table() convenience function which takes a ttype (a str), and fields.

See Python UXF Types for more about and ttypes.

Properties

.tclass

A TClass which holds the table's ttype and field names (and optional field types).

.ttype

A convenience for .tclass.ttype.

.records

The table's data: a list of values where each value is either a list with len(table.fields) values, or an editabletuple of type .RecordClass.

.RecordClass

The table's record class, a dynamically created editabletuple, which can be used to create a single record by calling it with each of the record's fields' values (or with *sequence where len(sequence) equals the number of fields). When a table record is accessed (e.g., when one row of the table's list is returned), it is returned as an instance of this class.

(This is actually a convenience method that returns table.tclass.RecordClass.)

.fields

A convenience for .tclass.fields.

.first

The table's first record as a RecordClass editabletuple, or None if the table is empty.

.second

The table's second record or None.

.third

The table's third record or None.

.fourth

The table's fourth record or None.

.last

The table's last record or None.

.is_scalar

This is True if all the table's values are (or ought to be given its fields' types) scalars; otherwise it is False. Its main use is as a helper for the dump() and dumps() functions.

Operators

table[row]

The table's row-th record as a RecordClass editabletuple. The returned record is editable, so to get or set a field use table[row][column] or table[row].fieldname.

To replace an entire record, use table[row] = record where record is a sequence (e.g., a tuple of length len(table.fields)), or a table.RecordClass instance. (See also append() below.)

To delete an entire record, use del table[row].

iter(table)

For example, for record in table:. Yields every record in the table as a RecordClass instance.

len(table)

The number of records (rows) in the table.

Methods

.append(record)

Appends a new row to the table's list of lists as an editabletuple of type .RecordClass. The row can be provided as a RecordClass editabletuple or as a sequence of field values (which will then be converted to the table's RecordClass).

.insert(row, record)

Inserts the given record (or sequence of field values which is converted to a RecordClass instance) into the table at the given row position.

.field(column)

This function is a convenience for .tclass.fields[column].

.get(row)

Returns the table's row-th record as a RecordClass editabletuple or None if row is out of range. If a record is returned it is editable.

TClass

This class is used to store a Table's ttype (i.e., its name) and fields.

Constructor

TClass(ttype, fields=None, *, comment=None)

The ttype must be a str that holds the TClass's name (equivalent to a vtype or ktype name); it may not be the same as a built-in type name or constant. Leave fields as None for a fieldless table; otherwise pass a sequence of field names (str's) or Field's (which have field names and optionally types). The .comment may be passed a str.

The .ttype attribute holds the TClass's name. The .fields attribute holds a list of fields of type Field. The .comment holds an optional str.

The .isfieldless property returns True if there are no fields (which is valid for a fieldless table); otherwise returns False.

The .RecordClass property returns a dynamically created editabletuple, which can be used to create a single record by calling it with each of the record's fields' values (or with *sequence where len(sequence) equals the number of fields).

Field

This class is used to store a TClass's fields, i.e., the fields for a Table.

Constructor

Field(name, vtype=None)

The name must start with a letter and be followed by 0-uxf.MAX_IDENTIFIER_LEN-1 letters, digits, or underscores; it may not be the same as a built-in type name or constant. A vtype of None means that the field may hold any valid UXF type (see Python UXF Types); otherwise it must be one of these strs: 'bool', 'bytes', 'date', 'datetime', 'int', 'real', 'list', 'map', 'str', or 'table'; or a ttype name.

Format

This class is used to specify aspects of the output format for the dump() and dumps() functions (and the equivalent methods).

Constructor

Format(indent=' ', wrap_width=96, realdp=None, max_short_len=32)

The dump() and dumps() functions use the default Format() which uses the defaults shown (indent is two spaces). However, by creating and passing your own Format object, you can change these to suit your needs. For realdp, None signifies use however many digits after the decimal point are needed for UXF real's (i.e., for Python float's); otherwise specify a value 0-15.

For example, if you had a Uxf object with a table and wanted the output to be one record per line, with reals output with 3 decimal places, you could pass a format of Format(wrap_width=None, realdp=3).

Error

This class is used to propagate errors.

(If this module produces an exception that isn't a uxf.Error or OSError subclass then it is probably a bug that should be reported.)

Functions

The functions are documented in convenience order. Here are alphabetically ordered links: append_to_parent(), canonicalize(), dump(), dumps(), isoformat(), isasciidigit(), is_scalar(), load(), loads(), naturalize(), on_error(), table().

load(filename_or_filelike, *, on_error=on_error, drop_unused=False, replace_imports=False)

Returns a Uxf object.

filename_or_filelike is sys.stdin or a filename (str or pathlib.Path) or an open readable file (text mode UTF-8 encoded, optionally gzipped).

on_error is a custom error handling function that defaults to on_error.

If drop_unused is True, then any ttype definitions in the file that aren't actually used in the data are dropped. So if the Uxf is later dumped, the dropped ttype definitions will not be present.

If replace_imports is True, then any imports are replaced by the ttype definitions that they import. So if the Uxf is later dumped, no imports will be present; instead any imported ttypes will be dumped as ttype definitions. Normally, if you use replace_imports=True, then you would also use drop_unused=True.

loads(uxt, filename='-', *, on_error=on_error, drop_unused=False, replace_imports=False)

Returns a Uxf object.

uxt must be a str of UXF data, e.g., uxf 1.0\n[1 2 3].

If given, the filename is used for error messages.

For more on the other argument see load().

dump(filename_or_filelike, data, *, on_error=on_error, format=Format())

filename_or_filelike is sys.stdout or a filename or an open writable file (text mode UTF-8 encoded). If filename_or_filelike is a filename with a .gz suffix then the output will be gzip-compressed.

data is a Uxf object, or a list, List, dict, Map, or Table, that this function will write to the filename_or_filelike in UXF format.

format specifies aspects of the output (e.g., indent); see the Format class for details.

dumps(data, *, on_error=on_error, format=Format())

data is a Uxf object, or a list, List, dict, Map, or Table that this function will write to a str in UXF format which will then be returned.

For more on the other arguments see dump().

table(ttype, fields, *, comment=None)

Convenience function for creating empty tables with a new TClass.

See also the Table constructor.

on_error(lino, code, message, *, filename, fail=False, verbose=True)

This is the default error handler which you can replace by passing a custom one to load(), loads(), dump(), or dumps().

This is called with the line number (lino), error code, error message, and filename. The filename may be '-' or empty if the UXF is created in memory rather than loaded from a file. If fail is True it means the error is unrecoverable, so the normal action would be to raise. If verbose is True the normal action is to print a textual version of the error data to stderr.

To make on_error() quieter:

on_error = functools.partial(uxf.on_error, verbose=False)

To make all errors fatal:

on_error = functools.partial(uxf.on_error, fail=True, verbose=False)

For further examples of custom on_error() functions, see t/test_errors.py, t/test_imports.py t/test_include.py, t/test_merge.py, and t/test_sqlite.py.

isasciidigit(s)

Returns True if s matches /^[0-9]+$/. (Python's str.isdigit() and str.isdecimal() both match additional Unicode digit characters which is why we use isasciidigit().)

isoformat(dt)

If dt is a date, returns the corresponding str with format 'YYYY-MM-DD'; if dt is a datetime, returns the corresponding str with format 'YYYY-MM-DDTHH:MM:SS'.

naturalize(s)

Given str s returns True if the str is 't', 'true', 'y', 'yes', or False if the str is 'f', 'false', 'n', 'no' (case-insensitive); or returns an int if s holds a parsable int, or a float if s holds a parsable float, or a datetime.datetime if s holds a parsable ISO8601 datetime, or a datetime.date if s holds a parsable ISO8601 date, or failing these returns the original str, s, unchanged.

This is provided as a helper function (e.g., it is used by uxfconvert.py).

is_scalar(x)

Returns True if x is None or a bool, bytes, bytearray, datetime.date, datetime.datetime, int, float, or str; otherwise returns False. Its main use is as a helper for the dump() and dumps() functions.

canonicalize(name, is_table_name)

Utility for UXF processors. Given a name, returns a name that is a valid table or field name. See uxfconvert.py for uses.

append_to_parent(parent, value)

Utility for UXF processors; see uxf.py and uxfconvert.py for examples of use.

Constants

Constant Description
VERSION The UXF file format version.
RESERVED_WORDS A set of names that cannot be used for table or field names.
UTF8 The string 'utf-8'.

Command Line Usage

The uxf module can be used as an executable. To see the command line help run:

python3 -m uxf -h

or

path/to/uxf.py -h

Changes

  • 1.1.2 Additional tests.
  • 1.1.1 Internal improvements plus moved .RecordClass to TClass (without changing Table's API).
  • 1.1.0 Dropped Format's max_fields_in_line and max_list_in_line settings since neither worked: use wrap_width instead.
  • 1.0.2 Bug fix: I missed a corner case in the previous fix; now fixed.
  • 1.0.1 Bug fix: when dump() and dumps() output ttype definitions, they now respect the format's wrap_width and indent settings.
  • 1.0.0 First stable release.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uxf-1.2.2.tar.gz (61.5 kB view hashes)

Uploaded Source

Built Distribution

uxf-1.2.2-py3-none-any.whl (51.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page