Skip to main content

Contains all the python boilerplate you need to create a Brightway ecosystem package.

Project description

trailpack

PyPI Status Python Version License

Read the documentation at https://trailpaack.readthedocs.io/ Tests Codecov

pre-commit Black


Trailpack proposes a standard way to link data and specialized metadata in one single file. It provides a simple interface to link metadata to fixed ontologies, improving the accessibility and comparability of datasets from different sources.

What is Trailpack?

Trailpack combines metadata and data into a single Parquet file, making open data more accessible and sustainable. It validates metadata against developed standards including:

  • General metadata for the data package (name, license, contributors)
  • Specialized metadata for each data column - linking both column names and units to fixed descriptions in ontologies provided by PyST

The developed standard expands on and is compatible with the Frictionless Data Package specification. The metadata is included under the datapackage.json keyword in the Parquet file.

The output file is readable using PyArrow and other data handlers - and will be compatible and consumable using Sentier data tools.

Origin: Trailpack was initially built during the hackathon of Brightcon 2025 in Grenoble, as part of developing the standard data format for Départ de Sentier.

Installation

You can install trailpack via pip from PyPI:

$ pip install trailpack

Usage

Web Application

The easiest way to use Trailpack is through the web application.

The web app provides a step-by-step workflow:

  1. Upload File & Select Language: Upload an Excel file and select language for PyST mapping
  2. Select Sheet: Choose which sheet to process with data preview
  3. Map Columns: Map each column to PyST concepts with automatic suggestions
  4. General Details: Provide package metadata (name, title, license, contributors)
  5. Download: Get your standardized Parquet file with embedded metadata

For walkthrough videos demonstrating the workflow, see the documentation.

Local Web UI

You can also run the Streamlit UI locally:

# Run the UI
trailpack ui

For more details, see trailpack/ui/README.md.

Deploying to Streamlit Cloud? See STREAMLIT_DEPLOYMENT.md for complete deployment instructions.

Python API

For advanced users and programmatic workflows, you can use Trailpack directly in Python. See the example notebook for a complete walkthrough.

from trailpack.excel import ExcelReader
from trailpack.pyst.api.client import get_suggest_client

# Read Excel structure
reader = ExcelReader("data.xlsx")
sheets = reader.sheets()
columns = reader.columns("Sheet1")

# Get PyST suggestions
client = get_suggest_client()
suggestions = await client.suggest("carbon footprint", "en")

📦 DataPackage Schema Classes

Trailpack includes comprehensive schema classes for building Frictionless Data Package metadata:

Key Features

  • DataPackageSchema: Defines field types, validation rules, and UI configuration
  • MetaDataBuilder: Fluent interface for creating metadata programmatically
  • Field validation: Built-in validation for package names, versions, URLs
  • UI integration ready: Field definitions include labels, placeholders, patterns
  • Standards compliant: Follows Frictionless Data Package specification

Quick Example

from trailpack.packing.datapackage_schema import MetaDataBuilder, Resource

# Create metadata with fluent interface
metadata = (MetaDataBuilder()
    .set_basic_info(name="my-dataset", title="My Dataset")
    .add_license("CC-BY-4.0")
    .add_contributor("Your Name", "author")
    .add_resource(Resource(name="data", path="data.parquet"))
    .build())

# Use with Packing class
from trailpack.packing import Packing
packer = Packing(df, metadata)
packer.write_parquet("output.parquet")

UI Integration

The schema classes provide everything needed for UI frameworks:

  • Field definitions with types, labels, validation patterns
  • Enumerated options for dropdowns (licenses, profiles, etc.)
  • Built-in validation methods
  • Error messages for invalid input

🔍 Validation System

Trailpack includes a comprehensive validation system to ensure data quality and standards compliance:

Features

  • Metadata validation: Required fields, naming conventions, license checking
  • Data quality metrics: Missing values and duplicates (logged as info, not errors)
  • Type consistency: Mixed types and schema matching (raises errors)
  • Unit requirements: All numeric fields must have units (including dimensionless)
  • Compliance levels: STRICT, STANDARD, BASIC, or NON-COMPLIANT

Quick Example

from trailpack.validation import StandardValidator

# Create validator
validator = StandardValidator("1.0.0")

# Validate everything
result = validator.validate_all(
    metadata=metadata_dict,
    df=dataframe,
    schema=schema_dict
)

# Check results
if result.is_valid:
    print(f"{result.level}")  # e.g., "✅ STRICT COMPLIANCE"
else:
    print(result)  # Shows all errors and warnings

Unit Requirements

All numeric fields must specify units, even for dimensionless quantities:

  • Measurements: Use SI or domain units (kg, m, °C)
  • IDs/Counts: Use dimensionless unit (http://qudt.org/vocab/unit/NUM)
  • Percentages: Use percent or dimensionless

See trailpack/validation/README.md for complete documentation.

Contributing

Contributions are very welcome! To learn more, see the Contributor Guide.

Development Setup

Install the package with development requirements:

$ pip install -e ".[dev]"

Run tests:

$ pytest

For more information, see CONTRIBUTING.md.

License

Distributed under the terms of the MIT license, trailpack is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Building the Documentation

You can build the documentation locally by installing the documentation Conda environment:

conda env create -f docs/environment.yml

activating the environment

conda activate sphinx_trailpaack

and running the build command:

sphinx-build docs _build/html --builder=html --jobs=auto --write-all; open _build/html/index.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trailpack-0.2.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trailpack-0.2.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file trailpack-0.2.0.tar.gz.

File metadata

  • Download URL: trailpack-0.2.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for trailpack-0.2.0.tar.gz
Algorithm Hash digest
SHA256 015d35a3585cf0b0dab329682386592fed29f85b240bdfaa12b3992fc8413b8d
MD5 cca244a522b4bb6a45d6cf35c2f31c99
BLAKE2b-256 a9d19817846c47eb33be51d59c649072db75f497c5d42ac7abaf67309aae831a

See more details on using hashes here.

Provenance

The following attestation bundles were made for trailpack-0.2.0.tar.gz:

Publisher: python-package-deploy.yml on TimoDiepers/trailpack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trailpack-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: trailpack-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for trailpack-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ad41901f1b8df32da52810e744838d0fcbf97b2a728f9219c7d3749864e1864
MD5 1b89b99a19cb8f9637c3d78027ecc6bc
BLAKE2b-256 febab11fd36a5ece3c5269b6aa2e0d23ffd1537572a07a23690aa15002edb646

See more details on using hashes here.

Provenance

The following attestation bundles were made for trailpack-0.2.0-py3-none-any.whl:

Publisher: python-package-deploy.yml on TimoDiepers/trailpack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page