A Python package for interfacing with the Mozilla Data Collective's API
Project description
Mozilla Data Collective Python API Library
Python library for interfacing with the Mozilla Data Collective REST API.
Installation
pip install datacollective
Quick Start
IMPORTANT NOTE: Before trying to access any dataset, make sure you have thoroughly read and agreed to the specific dataset's conditions & licensing terms.
-
Get your API key from the Mozilla Data Collective dashboard
-
Set the API key in your environment variable:
Option A: Run this command in your terminal (replace your-api-key-here with your actual API key):
export MDC_API_KEY=your-api-key-here
Option B: Create a .env file in your project directory and add this line:
MDC_API_KEY=your-api-key-here
- Get your dataset ID from the last section of the dataset URL at the MDC website.
[!TIP] You can find the
dataset-idby looking at the URL of the dataset's page on MDC platform. The ID is the unique string of characters located at the very end of the URL, after the/datasets/path. For example, for URLhttps://datacollective.mozillafoundation.org/datasets/cminc35no007no707hql26lzkdataset id will becminc35no007no707hql26lzk.
- Save a dataset locally:
from datacollective import save_dataset_to_disk
dataset_path = save_dataset_to_disk("your-dataset-id")
[!TIP] Automatic Resume: If a download is interrupted (e.g., due to a network error or it gets stopped it manually), the next time you try download the same dataset at the same folder location, we will automatically resume from where the download left off!
- Get information & metadata about a dataset:
from datacollective import get_dataset_details
details = get_dataset_details("your-dataset-id")
- Load the dataset into a pandas DataFrame (Only Common Voice datasets are supported right now):
from datacollective import load_dataset
dataset = load_dataset("your-dataset-id")
For more details, visit our docs
License
This project is released under MPL (Mozilla Public License) 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacollective-0.3.0.tar.gz.
File metadata
- Download URL: datacollective-0.3.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
011e236f059750c3c9055d02749a918c3209290049a302d38d4f772c38d719f0
|
|
| MD5 |
5c236b58a932518b6851cde708a5bf56
|
|
| BLAKE2b-256 |
0fcc33833266164c27dd6a03eb0bdd5872f2ce5f25b68fbb65347c9df6216003
|
File details
Details for the file datacollective-0.3.0-py3-none-any.whl.
File metadata
- Download URL: datacollective-0.3.0-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12eef31fbb7a941873e5f0fad2202c4cc6c4fad5968eeb39ff877860dc543d35
|
|
| MD5 |
5f92ce4f813bfa0388fed9b4fc634c92
|
|
| BLAKE2b-256 |
7b4690c285b96e7d5f8d302744aa3eee831523fef59e8b311261504a70f7001c
|