A Python package for interfacing with the Mozilla Data Collective's API
Project description
Mozilla Data Collective Python API Library
Python library for interfacing with the Mozilla Data Collective REST API.
Installation
Install the package using pip:
pip install datacollective
Quick Start
-
Get your API key from the Mozilla Data Collective dashboard
-
Set up your environment:
If you have cloned the repository, you can run the following command:
# Copy the example environment file
cp .env.example .env
Otherwise, copy and paste the following into a file called .env in your present working directory.
MDC_API_KEY=<MDC_API_KEY> # change to your MDC API Key
MDC_API_URL=https://datacollective.mozillafoundation.org/api # change to MDC API URL endpoint
MDC_DOWNLOAD_PATH=~/.mozdata/datasets # change to where you want to download datasets
-
Configure your API key by editing
.env:# Required: Your MDC API key MDC_API_KEY=your-api-key-here # Optional: Download path for datasets (defaults to ~/.mozdata/datasets) MDC_DOWNLOAD_PATH=~/.mozdata/datasets
-
Start using the library:
from datacollective import DataCollective # Initialize the client client = DataCollective() # Download a dataset client.get_dataset('mdc-dataset-id')
Configuration
The client loads configuration from environment variables or .env files:
MDC_API_KEY- Your Mozilla Data Collective API key (required)MDC_API_URL- API endpoint (defaults to production)MDC_DOWNLOAD_PATH- Where to download datasets (defaults to~/.mozdata/datasets)
Environment Files
Create a .env file in your project root:
# MDC API Configuration
MDC_API_KEY=your-api-key-here
MDC_API_URL=https://datacollective.mozillafoundation.org/api
MDC_DOWNLOAD_PATH=~/.mozdata/datasets
Note: Never commit .env files to version control as they contain sensitive information.
Basic Usage
from datacollective import DataCollective
# Initialize client (loads from .env automatically)
client = DataCollective()
# Verify your configuration
print(f"API URL: {client.api_url}")
print(f"Download path: {client.download_path}")
# Download a dataset
dataset = client.get_dataset('your-dataset-id')
Load and query datasets
note: today, this feature only works with Mozilla Common Voice datasets
from datacollective import DataCollective
client = DataCollective()
dataset = client.load_dataset("<dataset-id>") # Load dasaset into memory
df = dataset.to_pandas() # Convert to pandas for queryable form
dataset.splits # A list of all splits available in the dataset
Multiple Environments
You can use different environment configurations:
# Production environment (default, uses .env)
client = DataCollective()
# Development environment (uses .env.development)
client = DataCollective(environment='development')
# Staging environment (uses .env.staging)
client = DataCollective(environment='staging')
License
This project is released under MPL (Mozilla Public License) 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacollective-0.0.23.tar.gz.
File metadata
- Download URL: datacollective-0.0.23.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e63f6ce33dc5563493a2b0d0c604584328fb786c9fa5851f03c861b808d45be
|
|
| MD5 |
2f8c862c93d70c674d97dd38fffedb17
|
|
| BLAKE2b-256 |
3c66450a0cd72d2cfdbb360a8aaee32df036c53dda6139e115129f3e44046302
|
File details
Details for the file datacollective-0.0.23-py3-none-any.whl.
File metadata
- Download URL: datacollective-0.0.23-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0263854646bb320cc232113f0fb92b62bb6f9043562af8086fab05a0ebdb7d2e
|
|
| MD5 |
b5650c4817d6541e6ab29621a19aa571
|
|
| BLAKE2b-256 |
b33201bc338bedb9a35ccf04c4ba39a24691ef60803bab5c1dd056afa0a40fca
|