Tools to work with Data Commons. Part of the bblocks projects.

Project description

bblocks-datacommons-tools

Manage and load data to custom Data Commons instances

Custom Data Commons requires that you provide your data in a specific schema, format, and file structure.

At a high level, you need to provide the following:

All observations data must be in CSV format, using a predefined schema.
You must also provide a JSON configuration file, named config.json, that specifies how to map and resolve the CSV contents to the Data Commons schema knowledge graph.
Depending on how you define your statistical variables (metrics), you may need to provide MCF (Meta Content Framework) files.
You may also need to define new custom entities.

Managing this workflow by hand is tedious and easy to get wrong.

The bblocks.datacommons_tools package streamlines that process. It provides a Python API and command line utilities for building config files, generating MCF from CSV metadata and running the data load pipeline on Google Cloud.

Use this package when you want to:

Manage config.json files programmatically.
Define statistical variables, entities or groups using MCF files.
Programmatically upload CSVs, MCF files, and the config.json file to Cloud Storage, trigger the load job and redeploy your custom Data Commons service with code.

In short, datacommons-tools removes much of the manual work involved in setting up and maintaining a custom Data Commons Knowledge Graph.

bblocks-datacommons-tools is part of the bblocks ecosystem, a set of Python packages designed as building blocks for working with data in the international development and humanitarian sectors.

Read the documentation for more details on how to use the package and the motivation for its creation.

Installation

The package can be installed in various ways.

Directly as

pip install bblocks-datacommons-tools

Or from the main bblocks package with an extra:

pip install "bblocks[datacommons-tools]"

It can also be installed from GitHub:

pip install git+https://github.com/ONEcampaign/bblocks-datacommons-tools

Sample Usage

Here's a simple example covering how to use the "implicit" Data Commons schema to load a single dataset. Please see the full documentation page for a thorough introduction to the package, and to learn how to use it.

1. Create a CustomDataManager object.

The CustomDataManager object will handle generating the config.json file, as well as (optionally) taking Pandas DataFrames and exporting them as CSVs (in the right format) for loading to the Knowlede Graph.

In this example, we assume a config.json does not yet exist.

from bblocks.datacommons_tools import CustomDataManager

# Create the object and call it "manager"
manager = CustomDataManager()

# Configure it to include subdirectories
manager.set_includeInputSubdirs(True)

2. Add the provenance information for our data

You can add or manage provenance information on the config.py file.

In this example, we will add a provenance for ONE Data's Climate Finance Files.

manager.add_provenance(
    provenance_name="ONE Climate Finance",
    provenance_url="https://datacommons.one.org/data/climate-finance-files",
    source_name="ONE Data",
    source_url="https://data.one.org",
)

3. Add the data to the CustomDataManager object.

Next, you need to specify your data on the config.json file.

Adding actual data data to the CustomDataManager is an optional step.

For this example, we will assume a DataFrame is available via the data variable.

To add to the CustomDataManager, using the Implicit Schema:

manager.add_implicit_schema_file(
    file_name="climate_finance/one_cf_provider_commitments.csv",
    provenance="ONE Climate Finance",
    entityType="Country",
    data=data,
    ignoreColumns=["oecd_provider_code"],
    observationProperties={"unit": "USDollar"},
)

Adding the data in the step above is optional. You can also create the inputFile in the config and add the data tied to that inputFile at a later stage by running:

manager.add_data(data=data, file_name='one_cf_provider_commitments.csv')

Or you can manually add the relevant CSV file (matching what you declared as file_name).

4. Add the indicators to config

Next, you need to specify information about the StatVars (variables) contained in your data file(s).

When using the Implicit Schema, you can specify additional information.

For convenience, you could loop through a dictionary of indicators and information. For this example we'll add a single indicator.

manager.add_variable_to_config(
    statVar="climateFinanceProvidedCommitments",
    name="Climate Finance Commitments (bilateral)",
    group="ONE/Environment/Climate finance/Provider perspective/Commitments",
    description="Funding for climate adaptation and mitigation projects",
    searchDescriptions=[
        "Climate finance commitments provided",
        "Adaptation and mitigation finance provided",
    ],
    properties={"measurementMethod": "Commitment"},
    )

5. Export the `config.json` and (optionally) data CSVs

Next, once all the data is added and the config is set up, you can export the config.json and data. When you export, the config.json is validated automatically

manager.export_all("path/to/output/folder")

6. (Optionally) load to the Knowledge Graph

You can also programmatically push the data and config to a Google Cloud Storage Bucket, trigger the data load job, and redeploy your Data Commons instance.

To do this, you'll need to load information about your project, Storage Bucket, etc. You can use .env or .json files, or simply make the right information available as environment variables. A detailed description of the needed information, can be found in the documentation.

Load the settings

First, load the settings using get_kg_settings. In this example, we will load them from a .env file available in our working directory.

from bblocks.datacommons_tools.gcp_utilities import (
    upload_to_cloud_storage,
    run_data_load,
    redeploy_service,
    get_kg_settings,
)

settings = get_kg_settings(source="env", env_file="customDC.env")

Second, we'll upload the directory which contains the config.json file and any CSV and/or MCF files.

upload_to_cloud_storage(settings=settings, directory="path/to/output/folder")

Third, we'll run the data load job on Google Cloud Platform.

run_data_load(settings=settings)

Last, we need to redeploy the Custom Data Commons instance.

redeploy_service(settings=settings)

Visit the documentation page for the full package documentation and examples.

Contributing

Contributions are welcome! Please see the CONTRIBUTING page for details on how to get started, report bugs, fix issues, and submit enhancements.

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Feb 20, 2026

0.1.0

Feb 13, 2026

0.0.9

Sep 17, 2025

0.0.8

Sep 3, 2025

0.0.7

Aug 27, 2025

0.0.6

Aug 14, 2025

0.0.5

Aug 14, 2025

0.0.4

Jul 22, 2025

0.0.3

Jul 18, 2025

0.0.2

Jul 7, 2025

0.0.1

Jul 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bblocks_datacommons_tools-0.1.1.tar.gz (31.2 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bblocks_datacommons_tools-0.1.1-py3-none-any.whl (44.9 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file bblocks_datacommons_tools-0.1.1.tar.gz.

File metadata

Download URL: bblocks_datacommons_tools-0.1.1.tar.gz
Upload date: Feb 20, 2026
Size: 31.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bblocks_datacommons_tools-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e21626a8f687efa120675f5db919b5d5616792ee163f31b83398e42817530e33`
MD5	`66513cb81e370594663bc99c1291c919`
BLAKE2b-256	`3d9b4d4819abfe90a1d173bac8f364cfc55e901cf1b1e46824a8319570be6a52`

See more details on using hashes here.

File details

Details for the file bblocks_datacommons_tools-0.1.1-py3-none-any.whl.

File metadata

Download URL: bblocks_datacommons_tools-0.1.1-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 44.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bblocks_datacommons_tools-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fed358ee6d8a64e7ae00961c6b41dbcdf410488185471fec6c1267320904031d`
MD5	`097c5c7a6d4783d2d7a7cdff4bee0dad`
BLAKE2b-256	`863842ac477c383a68609324f40388f86629c4a732cf56e60fd854966426bf2f`

See more details on using hashes here.

bblocks-datacommons-tools 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

bblocks-datacommons-tools

Installation

Sample Usage

1. Create a CustomDataManager object.

2. Add the provenance information for our data

3. Add the data to the CustomDataManager object.

4. Add the indicators to config

5. Export the `config.json` and (optionally) data CSVs

6. (Optionally) load to the Knowledge Graph

Load the settings

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

bblocks-datacommons-tools 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

bblocks-datacommons-tools

Installation

Sample Usage

1. Create a CustomDataManager object.

2. Add the provenance information for our data

3. Add the data to the CustomDataManager object.

4. Add the indicators to config

5. Export the config.json and (optionally) data CSVs

6. (Optionally) load to the Knowledge Graph

Load the settings

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

5. Export the `config.json` and (optionally) data CSVs