A Python package for interfacing with the Mozilla Data Collective's API

These details have not been verified by PyPI

Project links

Project description

Project logo

Mozilla Data Collective Python Client Library

The official Python SDK for accessing and contributing to the Mozilla Data Collective platform.

[!WARNING] Our platform is evolving rapidly. Expect breaking changes while the Python SDK is on 0.X.X versions. Please ensure you are always on the latest version available.

Installation

pip install datacollective

Quick Start

IMPORTANT NOTE: Before trying to access any dataset, make sure you have thoroughly read and agreed to the specific dataset's conditions & licensing terms.

Get your API key from the Mozilla Data Collective dashboard
Set the API key in your environment variable:

Option A: Run this command in your terminal (replace your-api-key-here with your actual API key):

export MDC_API_KEY=your-api-key-here

Option B: Create a .env file in your project directory and add this line:

MDC_API_KEY=your-api-key-here

Get your dataset ID from the last section of the dataset URL at the MDC website.

[!TIP] You can find the dataset-id by looking at the URL of the dataset's page on MDC platform. The ID is the unique string of characters located at the very end of the URL, after the /datasets/ path. For example, for URL https://mozilladatacollective.com/datasets/cminc35no007no707hql26lzk dataset id will be cminc35no007no707hql26lzk.

Save a dataset locally:

from datacollective import download_dataset

dataset_path = download_dataset("your-dataset-id")

[!NOTE] download_dataset was previously called save_dataset_to_disk. The old name still works for backward compatibility, but it is deprecated and new code should use download_dataset.

[!TIP] Automatic Resume: If a download is interrupted (e.g., due to a network error or it gets stopped it manually), the next time you try download the same dataset at the same folder location, we will automatically resume from where the download left off!

[!TIP] Set enable_logging=True to emit detailed SDK logs to the console and a local log file at ~/.mozdata/datacollective.log with timestamped entries, a per-session id, and retention of 5 backup files at 10 MB each.

Get information & metadata about a dataset:

from datacollective import get_dataset_details

details = get_dataset_details("your-dataset-id")

Load the dataset into a pandas DataFrame (Alpha version: Only certain MDC datasets are supported right now):

from datacollective import load_dataset

dataset = load_dataset("your-dataset-id")

Programmatic submissions and uploads

You can create dataset submissions and upload files with resumable uploads into the MDC platform programmatically using our Python SDK:

from datacollective import DatasetSubmission, License, Task, create_submission_with_upload

submission = DatasetSubmission(
    name="Dataset Name",
    longDescription="A detailed description of the dataset.",
    shortDescription="A brief description of the dataset.",
    locale="en-US",
    task=Task.ASR,
    format="TSV",
    licenseAbbreviation=License.CC_BY_4_0,
    other="This text should provide a detailed description of the dataset, "
          "including its contents, structure, and any relevant information "
          "that would help users understand what the dataset is about "
          "and how it can be used.",
    restrictions="Any restrictions you want to impose on the dataset",
    forbiddenUsage="Use cases that are not allowed with this dataset",
    additionalConditions="Any additional conditions for using the dataset",
    pointOfContactFullName="Jane Doe",
    pointOfContactEmail="jane@example.com",
    fundedByFullName="Funder Name",
    fundedByEmail="funder@example.com",
    legalContactFullName="Legal Name",
    legalContactEmail="legal@example.com",
    createdByFullName="Creator Name",
    createdByEmail="creator@example.com",
    intendedUsage="Describe the intended usage of the dataset, including "
                  "potential applications and use cases.",
    ethicalReviewProcess="Describe the ethical review process that was "
                         "followed for this dataset, including any approvals "
                         "or considerations related to data collection and usage.",
    exclusivityOptOut=False,  # True = This dataset is non-exclusive to Mozilla Data Collective, 
                              # False = Dataset is exclusively hosted in Mozilla Data Collective
    agreeToSubmit=True,  # True = You confirm that you have the right to submit this dataset and 
                         # that all information provided in the datasheet is accurate. 
                         # Required to be True to complete the submission process
)

response = create_submission_with_upload(
    file_path="/path/to/dataset.tar.gz",
    submission=submission
)

print(response)

For predefined licenses, pass licenseAbbreviation=License.<VALUE> and leave licenseUrl and license unset. For custom licenses, pass a custom string to license and optionally include licenseUrl and licenseAbbreviation.

[!TIP] To upload a new .tar.gz version to an already approved dataset, call upload_dataset_file(file_path=..., submission_id=...) directly. Find the submission under Profile → Uploads, open the approved dataset, and copy the value after /profile/submissions/ in the URL. Note that this value is the submission ID, which is different from the public dataset ID.

For more details, visit our docs

License

This project is released under MPL (Mozilla Public License) 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.1

May 21, 2026

This version

0.5.0

Apr 9, 2026

0.4.5

Mar 24, 2026

0.4.4

Mar 24, 2026

0.4.3

Mar 18, 2026

0.4.2

Mar 12, 2026

0.4.1

Mar 10, 2026

0.4.0

Feb 27, 2026

0.3.0

Jan 28, 2026

0.2.0

Jan 15, 2026

0.1.0

Dec 4, 2025

0.0.34

Nov 13, 2025

0.0.33

Nov 13, 2025

0.0.32

Oct 30, 2025

0.0.27

Oct 29, 2025

0.0.23

Oct 21, 2025

0.0.16

Sep 18, 2025

0.0.11

Sep 16, 2025

0.0.8

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacollective-0.5.0.tar.gz (36.8 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datacollective-0.5.0-py3-none-any.whl (46.8 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file datacollective-0.5.0.tar.gz.

File metadata

Download URL: datacollective-0.5.0.tar.gz
Upload date: Apr 9, 2026
Size: 36.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datacollective-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`4ddf6b797873696c76e791795d5179a2f14f2ff87c7e3e0337a900a7e6914351`
MD5	`3534fb5d0c4c50b5ea7e8c88af0a29d9`
BLAKE2b-256	`6107713c06d8366f83049bb531235cc0b0b3e4b637a3a00fbea676f7bde4d915`

See more details on using hashes here.

File details

Details for the file datacollective-0.5.0-py3-none-any.whl.

File metadata

Download URL: datacollective-0.5.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 46.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datacollective-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13e0c1a086622fd5a011cb73c3fcc650592c1873533c24bb38cac98493c22583`
MD5	`bcaad4ba48ff8345f83ce3702e43145f`
BLAKE2b-256	`b174f0d4c9b56bdc5cb7c350ace6c00498e7febf054869fb74e94a13808c0bd0`

See more details on using hashes here.

datacollective 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mozilla Data Collective Python Client Library

Installation

Quick Start

Programmatic submissions and uploads

For more details, visit our docs

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes