Skip to main content

A Python package for interfacing with the Mozilla Data Collective's API

Project description

Mozilla Data Collective Python API Library

Python library for interfacing with the Mozilla Data Collective REST API.

Installation

Install the package using pip:

pip install datacollective

Quick Start

  1. Get your API key from the Mozilla Data Collective dashboard

  2. Set up your environment:

If you have cloned the repository, you can run the following command:

# Copy the example environment file
cp .env.example .env

Otherwise, copy and paste the following into a file called .env in your present working directory.

MDC_API_KEY=<MDC_API_KEY> # change to your MDC API Key
MDC_API_URL=https://datacollective.mozillafoundation.org/api # change to MDC API URL endpoint
MDC_DOWNLOAD_PATH=~/.mozdata/datasets # change to where you want to download datasets
  1. Configure your API key by editing .env:

    # Required: Your MDC API key
    MDC_API_KEY=your-api-key-here
    
    # Optional: Download path for datasets (defaults to ~/.mozdata/datasets)
    MDC_DOWNLOAD_PATH=~/.mozdata/datasets
    
  2. Start using the library:

    from datacollective import DataCollective
    
    # Initialize the client
    client = DataCollective()
    
    # Download a dataset
    client.get_dataset('mdc-dataset-id')
    

Configuration

The client loads configuration from environment variables or .env files:

  • MDC_API_KEY - Your Mozilla Data Collective API key (required)
  • MDC_API_URL - API endpoint (defaults to production)
  • MDC_DOWNLOAD_PATH - Where to download datasets (defaults to ~/.mozdata/datasets)

Environment Files

Create a .env file in your project root:

# MDC API Configuration
MDC_API_KEY=your-api-key-here
MDC_API_URL=https://datacollective.mozillafoundation.org/api
MDC_DOWNLOAD_PATH=~/.mozdata/datasets

Note: Never commit .env files to version control as they contain sensitive information.

Basic Usage

from datacollective import DataCollective

# Initialize client (loads from .env automatically)
client = DataCollective()

# Verify your configuration
print(f"API URL: {client.api_url}")
print(f"Download path: {client.download_path}")

# Download a dataset
dataset = client.get_dataset('your-dataset-id')

Load and query datasets

note: today, this feature only works with Mozilla Common Voice datasets

from datacollective import DataCollective

client = DataCollective()

dataset = client.load_dataset("<dataset-id>") # Load dasaset into memory
df = dataset.to_pandas() # Convert to pandas for queryable form
dataset.splits # A list of all splits available in the dataset

Multiple Environments

You can use different environment configurations:

# Production environment (default, uses .env)
client = DataCollective()

# Development environment (uses .env.development)
client = DataCollective(environment='development')

# Staging environment (uses .env.staging)  
client = DataCollective(environment='staging')

License

This project is released under MPL (Mozilla Public License) 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacollective-0.0.23.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datacollective-0.0.23-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file datacollective-0.0.23.tar.gz.

File metadata

  • Download URL: datacollective-0.0.23.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for datacollective-0.0.23.tar.gz
Algorithm Hash digest
SHA256 1e63f6ce33dc5563493a2b0d0c604584328fb786c9fa5851f03c861b808d45be
MD5 2f8c862c93d70c674d97dd38fffedb17
BLAKE2b-256 3c66450a0cd72d2cfdbb360a8aaee32df036c53dda6139e115129f3e44046302

See more details on using hashes here.

File details

Details for the file datacollective-0.0.23-py3-none-any.whl.

File metadata

File hashes

Hashes for datacollective-0.0.23-py3-none-any.whl
Algorithm Hash digest
SHA256 0263854646bb320cc232113f0fb92b62bb6f9043562af8086fab05a0ebdb7d2e
MD5 b5650c4817d6541e6ab29621a19aa571
BLAKE2b-256 b33201bc338bedb9a35ccf04c4ba39a24691ef60803bab5c1dd056afa0a40fca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page