Extract a list from JSON or XML, save to excel, csv, etc.

These details have not been verified by PyPI

Project links

Source code

Project description

extract-list

Background

extract-list is a command line tool for turning selected data from JSON or XML files into a list of columns. It is useful when the source data is structured, but the result is needed in a spreadsheet, CSV file, text file, or another table-like format.

What it does

extract-list:

reads data from an XML file or a JSON file
extracts the configured parts of that input data
writes the extracted data as rows and columns

Examples of output formats include:

Excel
CSV
ODS
HTML
Markdown
plain text
JSON
XML

The exact output choices available in an installed environment can be listed with the cfg-example --help command. Some output formats are provided through TableIO, and some are handled directly by extract-list.

Installing it

extract-list requires Python 3.13 or newer. Python can be downloaded from https://www.python.org/downloads/.

Installing on macOS and Linux

pip3 install --upgrade extract-list

Installing on Microsoft Windows

pip install --upgrade extract-list

Version history

Version	Date	Python version	Description
0.2	06 Jan 2025	3.12.6 or newer	First released version
0.2.2	23 Mar 2025	3.13.2 or newer	Adapted to Python 3.13.2
0.2.3	23 Mar 2025	3.13.2 or newer	Fix in README only
0.2.5	09 Apr 2025	3.13.3 or newer	Using newer dependencies
0.2.7	09 Jun 2025	3.12.x	add version sub-command
0.2.8	09 Jun 2025	3.13 or newer	add version sub-command
0.2.10	23 Jul 2025	3.12.x	updated dependencies
0.2.11	23 Jul 2025	3.13 or newer	updated dependencies
0.2.13	23 Nov 2025	3.12.x	updated dependencies
0.2.14	23 Nov 2025	3.13 or newer	updated dependencies
0.3	01 Jun 2026	3.13 or newer	New output config; more formats

Running the application

Running on macOS and Linux

python3 -m extract_list --help
python3 -m extract_list cfg-example --help
python3 -m extract_list extract --help
python3 -m extract_list cfg-example -k example_json -t Excel -o example.cfg
python3 -m extract_list extract -c example.cfg -i input.json -o output.xlsx

Running on Microsoft Windows

python -m extract_list --help
python -m extract_list cfg-example --help
python -m extract_list extract --help
python -m extract_list cfg-example -k example_json -t Excel -o example.cfg
python -m extract_list extract -c example.cfg -i input.json -o output.xlsx

Suggested way to get started

Use the cfg-example sub-command to generate a few example configuration files.
Read each generated .cfg file together with its generated .txt description file.
Find the example that is closest to what you want to extract.
Modify that configuration file to match your input data and desired output.
Use the extract sub-command to read your data and write the output file.
Check the produced output. If needed, adjust the configuration and run the extraction again.

Example configuration files

When cfg-example creates a configuration file, it also creates a text file that describes the example and the configuration syntax. If the configuration file is named example.cfg, the description file is named example.txt.

The generated .txt file is the best place to look for detailed configuration reference text, because it is generated by the same installed version of extract-list that will read the configuration.

Migrating old configuration files

Version 0.3 uses the nested output configuration object. Older 0.2-style configuration files that use settings such as outfile_type, outfile_encoding, out_csv_dialect, or outfile_excel_library can be migrated with the migrate-cfg sub-command.

Migrating on macOS and Linux

python3 -m extract_list migrate-cfg -i old.cfg -o new.cfg

Migrating on Microsoft Windows

python -m extract_list migrate-cfg -i old.cfg -o new.cfg

extract-list can still read the old configuration format with backward compatibility handling, but migration is recommended before editing the configuration further.

Configuration file overview

The configuration file is written in JSON syntax: https://en.wikipedia.org/wiki/JSON. The keywords and nesting are important. The order of keywords is not significant. Indentation and line breaks are not significant.

The encoding for the configuration file must be UTF-8. US-ASCII is a subset of UTF-8.

It is recommended to generate a configuration file with cfg-example and then edit that file. Writing a configuration file from scratch is possible, but the generated examples are a much safer starting point.

Type of input file

The type of input file to read is determined by infile_type. infile_type can have these values:

"JSON"
"XML"

Both JSON and XML are text file formats, so the input file has a character encoding. This is specified with infile_encoding. Unless you know that you need another encoding, leave this as it is in the generated example configuration.

Type of output file

Output files are configured in the nested output object. The output format is selected with output.format_name. The value is case-insensitive and is normalized when the configuration is read.

Examples of commonly useful output.format_name values include:

"Excel"
"CSV"
"ODS"
"HTML"
"md"
"txt"
"JSON"
"XML"
"Plaintxt"

Run python3 -m extract_list cfg-example --help on macOS or Linux, or python -m extract_list cfg-example --help on Windows, to see the full list of output formats available in your installed environment.

JSON, XML, and Plaintxt output are handled inside extract-list. Other table-style output is handled through TableIO. The same output.format_name configuration member is used for both groups.

Text-based output formats can have a character encoding. This is configured with output.character_encoding. Unless you know that you need another encoding, leave this as it is in the generated example configuration.

Comma separated values files may differ slightly depending on the programs used to read and write them and the locale used. CSV settings are configured in the optional nested output.csv object.

TableIO may provide several implementations for a file format. output.implementation can force a specific implementation. If it is omitted, TableIO chooses the best available implementation.

The outfile_border and outfile_filtered_area values can be "NO", "IF_AVAILABLE", or "NEEDED". "NO" disables the feature request. "IF_AVAILABLE" requests the feature when the selected output implementation can provide it. "NEEDED" requires support for the feature.

Data to extract

The input file is likely to contain more data than should appear in the output. The data to extract is specified with main_line and linked_lines.

The main_line setting describes the main input records that become output rows. Linked lines describe additional records that are tied to the main line. A linked line is tied to a main line when a configured value in the linked line has the same value as a configured value in the main line.

The line member of main_line and linked_lines is a list of strings. This list is the path of keys to the input records. Directly below that path there is either a list or a dictionary of records.

The records of the main_line are indexed in the input, either by integer position in a list or by key in a dictionary. If you want this index or key in the output, set include_key to true. If not, set it to false. The output column name for this key is configured with column_name_for_key.

The columns member of main_line and linked_lines maps output column names to paths inside each input record. Each path is written as a list of strings.

Sometimes one input record contains several nested records. For example, a purchase order may contain several purchased items. Because the output is a list of columns, such an input record must be expanded into several output rows. The shared values from the input record are duplicated on those output rows. The expand_at member contains the relative paths where this expansion should happen.

There can be only one main_line. There may be any number of linked lines. The linked lines are configured as an array in linked_lines.

For each linked line, linked_column and linked_main_column define how it is tied to the main line. Both values are relative paths in the input records. The value at linked_column in the linked-line record must equal the value at linked_main_column in the main-line record.

The values used to link records may also be extracted as output columns, but they do not have to be.

Several linked lines can be tied to the same main line. one_output_line_per_main_line controls how this is handled. If it is false, the main-line data is duplicated so the output has one row for each tied combination of main-line and linked-line records. If it is true, multiple linked lines for one main line are reported as an error.

Input records may be missing optional values. missing_input_for_column controls how missing input data is handled. Its values are "EMPTY" and "ERROR". If it is "EMPTY", the output column resulting from missing input data is left empty.

XML attributes

XML member values may be written either as nested objects or as attributes. If the input has XML attributes, the attribute key has an @ prefix. To handle this, in_xml_strip_at can be set to true or false. If it is true, a leading @ character is stripped from each key.

The out_xml_attributes setting specifies column names that should be written as XML attributes in XML output instead of as nested objects.

Output column order

The order of the columns in the output is specified with column_order. The value is a list of strings.

Specifying a column in column_order that has not been extracted is an error. Extracting a column without specifying it in column_order is also an error.

Output line order

The order_rows_by setting specifies which columns should be used for sorting the output rows. The most significant column should be first in the list.

The default row order is based on the list of columns in column_order. Leave order_rows_by as an empty list unless you need another specific order.

Source code

Source code and tests are available at https://bitbucket.org/tom-bjorkholm/extract-list.

Test summary

Test result: 1247 passed in 36s
No flake8 warnings.
No mypy errors found.
No python layout warnings.
Built version(s): 0.3
Build and test using Python 3.14.5

Project details

These details have not been verified by PyPI

Project links

Source code

Release history Release notifications | RSS feed

0.5

Jun 8, 2026

0.4

Jun 3, 2026

This version

0.3

Jun 1, 2026

0.2.14

Nov 23, 2025

0.2.13

Nov 23, 2025

0.2.11

Jul 23, 2025

0.2.10

Jul 23, 2025

0.2.8

Jun 9, 2025

0.2.7

Jun 9, 2025

0.2.5

Apr 9, 2025

0.2.3

Mar 23, 2025

0.2.2

Mar 23, 2025

0.2

Jan 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_list-0.3.tar.gz (32.0 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

extract_list-0.3-py3-none-any.whl (40.5 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file extract_list-0.3.tar.gz.

File metadata

Download URL: extract_list-0.3.tar.gz
Upload date: Jun 1, 2026
Size: 32.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.14.5

File hashes

Hashes for extract_list-0.3.tar.gz
Algorithm	Hash digest
SHA256	`733f70ed4d516c2a746757d94b69aab4e00dab95fbf65ffea5b77f76a034b10b`
MD5	`598cd827fd776a9b61fb24eb1ac8bf99`
BLAKE2b-256	`0e318d855b1e4399b7bf9299c8ed03995c0e0bb33c5c3580fc711c6f0fa08170`

See more details on using hashes here.

File details

Details for the file extract_list-0.3-py3-none-any.whl.

File metadata

Download URL: extract_list-0.3-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 40.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.14.5

File hashes

Hashes for extract_list-0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c71ac0dc2d96c250afb5b4a0d8a16d119b80cb50152a83e27ae77cc288c0521c`
MD5	`04e553b6f2dcbff87d42c27f7c9446ef`
BLAKE2b-256	`f1a2ac4b03519c864c1532fa700d01cb7e51f027fa2faf34a922bf23abb1850b`

See more details on using hashes here.

extract-list 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

extract-list

Background

What it does

Installing it

Installing on macOS and Linux

Installing on Microsoft Windows

Version history

Running the application

Running on macOS and Linux

Running on Microsoft Windows

Suggested way to get started

Example configuration files

Migrating old configuration files

Migrating on macOS and Linux

Migrating on Microsoft Windows

Configuration file overview

Type of input file

Type of output file

Data to extract

XML attributes

Output column order

Output line order

Source code

Test summary

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes