Configuration-driven statistical calculations and aggregations for non-SWS FAO data

These details have not been verified by PyPI

Project description

fao-analytics

Configuration-driven statistical calculations and aggregations for FAO (Food and Agriculture Organization of the United Nations) data, built on PySpark and validated with Pydantic.

Data sources

The package processes data from FAOSTAT -- the FAO corporate statistical database. Data can be loaded from:

Local files -- CSV, Parquet, or Delta format
SDMX API -- Connects to the FAO SDMX registry to retrieve dataflows with authoritative dimension ordering and attribute mappings (requires pysdmx)

Each FAOSTAT domain (FDI, LC, OER, CS, BE, etc.) has its own configuration directory under configs/domains/ defining the data mapping, aggregation rules, calculation definitions, and group overrides.

Features

fao_agg -- Geographic and dimensional aggregation engine
fao_calc -- Statistical indicator calculation engine (ratios, growth rates, transformations)
fao_common -- Shared data adapters (CSV, Parquet, Delta, SDMX) and configuration schemas

Installation

# From source (editable / development mode)
pip install -e .

# With SDMX support
pip install -e ".[sdmx]"

# With dev dependencies (pytest, coverage)
pip install -e ".[dev]"

Quick start

Configuration from file paths

from fao_agg import AggregationEngine
from fao_calc import CalculationEngine

# Aggregation -- load config from JSON files, data from a CSV
result = (
    AggregationEngine(
        data_mapping="configs/domains/FDI/data_mapping_fdi.json",
        aggregation_config="configs/domains/FDI/aggregation.json",
    )
    .load_data(path="data/domains/FDI/DataFDI.csv")
    .aggregate()
    .get_results()
)

# Calculation
result = (
    CalculationEngine(
        data_mapping="configs/domains/FDI/data_mapping_fdi.json",
        calculations="configs/domains/FDI/calculations_fdi.json",
    )
    .load_data(path="data/domains/FDI/DataFDI.csv")
    .calculate()
    .get_results()
)

Configuration from dictionaries

from fao_agg import AggregationEngine

data_mapping = {
    "data_source": {
        "type": "csv",
        "options": {"header": "true", "inferSchema": "true"},
    },
    "dimensions": [
        {"name": "area",    "column": "Var1Code", "var_position": 1},
        {"name": "item",    "column": "Var2Code", "var_position": 2},
        {"name": "element", "column": "Var3Code", "var_position": 3},
        {"name": "year",    "column": "Var4Code", "var_position": 4},
    ],
    "columns": {
        "value": "Value",
        "flag": "Flag",
        "agg_flag_int": "AggFlagInt",
        "agg_flag_ext": "AggFlagExt",
    },
}

aggregation_config = {
    "iterations": [
        {
            "iteration": 1,
            "agg_dimensions": ["area"],
        }
    ],
    "base_groups": "configs/groups/base_groups.json",
}

result = (
    AggregationEngine(
        data_mapping=data_mapping,
        aggregation_config=aggregation_config,
    )
    .load_data(path="data/domains/FDI/DataFDI.csv")
    .aggregate()
    .get_results()
)

Auto-generated configuration from SDMX

When you don't need to manually define the data mapping, the SdmxDataAdapter can build it automatically by querying the FAO SDMX registry for the dataflow schema:

from fao_agg import AggregationEngine
from fao_common.adapters.sdmx import SdmxDataAdapter
from fao_common.config.schema import SdmxDataSource

# Build the data mapping automatically from the SDMX registry
adapter = SdmxDataAdapter(
    SdmxDataSource(
        endpoint="https://private-fmr.aws.fao.org/sdmx/v2/",
        domain_code="FDI",
    )
)
data_mapping = adapter.build_data_mapping()

# Use the auto-generated mapping with the aggregation engine
result = (
    AggregationEngine(
        data_mapping=data_mapping,
        aggregation_config="configs/domains/FDI/aggregation.json",
    )
    .load_data()
    .aggregate()
    .get_results()
)

SDMX configuration with a local SDMX CSV

If you have an SDMX-formatted CSV file and want the adapter to handle column mapping via the registry:

from fao_agg import AggregationEngine

result = (
    AggregationEngine(
        data_mapping="configs/domains/FDI/data_mapping_sdmx.json",
        aggregation_config="configs/domains/FDI/aggregation.json",
    )
    .load_data()  # data path is in the mapping config
    .aggregate()
    .get_results()
)

Testing

# Run all tests
pytest

# Run only unit tests
pytest tests/fao_agg/

# Run only integration tests
pytest -m integration

# Run a single domain
pytest tests/domains/test_fdi.py -v

See README_TESTING.md for detailed testing documentation.

Project structure

src/
  fao_agg/        # Aggregation engine
  fao_calc/       # Calculation engine
  fao_common/     # Shared adapters, schemas, Spark utilities
configs/           # JSON configuration files per FAOSTAT domain
data/              # Sample/test data files (CSV)
tests/             # Unit and integration tests

Publishing to PyPI

# Install build tools
pip install build twine

# Build source distribution and wheel
python -m build

# Check the package
twine check dist/*

# Upload to Test PyPI first
twine upload --repository testpypi dist/*

# Upload to PyPI
twine upload dist/*

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.5

May 6, 2026

0.4.4

May 6, 2026

0.4.3

May 5, 2026

0.4.2

May 5, 2026

0.4.1

Apr 17, 2026

0.4.0

Apr 17, 2026

0.3.1

Apr 16, 2026

This version

0.3.0

Apr 16, 2026

0.2.9

Apr 16, 2026

0.2.8

Apr 16, 2026

0.2.7

Apr 16, 2026

0.2.6

Apr 16, 2026

0.2.5

Apr 15, 2026

0.2.4

Apr 8, 2026

0.2.3

Apr 8, 2026

0.2.2

Apr 8, 2026

0.2.1

Apr 7, 2026

0.2.0

Apr 7, 2026

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

non_sws_spark_calculations_engine-0.3.0.tar.gz (78.3 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

non_sws_spark_calculations_engine-0.3.0-py3-none-any.whl (88.5 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file non_sws_spark_calculations_engine-0.3.0.tar.gz.

File metadata

Download URL: non_sws_spark_calculations_engine-0.3.0.tar.gz
Upload date: Apr 16, 2026
Size: 78.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for non_sws_spark_calculations_engine-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`6efee4dddd9073f81c92805d5455a55b389f86ab862507afdd2f08f0b28814ea`
MD5	`edca3111d5d1fde6d177f08b035fe6fa`
BLAKE2b-256	`ed6c63ad2ca8b9135d3db6172a79b56c5847998c5ecad727f2f1883cf049ab90`

See more details on using hashes here.

File details

Details for the file non_sws_spark_calculations_engine-0.3.0-py3-none-any.whl.

File metadata

Download URL: non_sws_spark_calculations_engine-0.3.0-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 88.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for non_sws_spark_calculations_engine-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b00631eae434fc90259b220b34ee0fc6ff2fc58908704b8593fdb7873add7288`
MD5	`dfc33bb630ee4247d45c1ecac7665a9e`
BLAKE2b-256	`819992abe356131db94ea8f846520a2610d9267662234b37dcbd1edd71b4e95a`

See more details on using hashes here.

non-sws-spark-calculations-engine 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

fao-analytics

Data sources

Features

Installation

Quick start

Configuration from file paths

Configuration from dictionaries

Auto-generated configuration from SDMX

SDMX configuration with a local SDMX CSV

Testing

Project structure

Publishing to PyPI

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes