Skip to main content

A Python package for interfacing with the Mozilla Data Collective's API

Project description

Mozilla Data Collective Python API Library

Python library for interfacing with the Mozilla Data Collective REST API.

Installation

Install the package using pip:

pip install datacollective

Quick Start

  1. Get your API key from the Mozilla Data Collective dashboard

  2. Set up your environment:

If you have cloned the repository, you can run the following command:

# Copy the example environment file
cp .env.example .env

Otherwise, copy and paste the following into a file called .env in your present working directory.

MDC_API_KEY=<MDC_API_KEY> # change to your MDC API Key
MDC_API_URL=https://datacollective.mozillafoundation.org/api # change to MDC API URL endpoint
MDC_DOWNLOAD_PATH=~/.mozdata/datasets # change to where you want to download datasets
  1. Configure your API key by editing .env:

    # Required: Your MDC API key
    MDC_API_KEY=your-api-key-here
    
    # Optional: Download path for datasets (defaults to ~/.mozdata/datasets)
    MDC_DOWNLOAD_PATH=~/.mozdata/datasets
    
  2. Start using the library:

    from datacollective import DataCollective
    
    # Initialize the client
    client = DataCollective()
    
    # Download a dataset
    client.get_dataset('mdc-dataset-id')
    

Configuration

The client loads configuration from environment variables or .env files:

  • MDC_API_KEY - Your Mozilla Data Collective API key (required)
  • MDC_API_URL - API endpoint (defaults to production)
  • MDC_DOWNLOAD_PATH - Where to download datasets (defaults to ~/.mozdata/datasets)

Environment Files

Create a .env file in your project root:

# MDC API Configuration
MDC_API_KEY=your-api-key-here
MDC_API_URL=https://datacollective.mozillafoundation.org/api
MDC_DOWNLOAD_PATH=~/.mozdata/datasets

Note: Never commit .env files to version control as they contain sensitive information.

Basic Usage

from datacollective import DataCollective

# Initialize client (loads from .env automatically)
client = DataCollective()

# Verify your configuration
print(f"API URL: {client.api_url}")
print(f"Download path: {client.download_path}")

# Download a dataset
dataset = client.get_dataset('your-dataset-id')

Load and query datasets

note: today, this feature only works with Mozilla Common Voice datasets

from datacollective import DataCollective

client = DataCollective()

dataset = client.load_dataset("<dataset-id>") # Load dasaset into memory
df = dataset.to_pandas() # Convert to pandas for queryable form
dataset.splits # A list of all splits available in the dataset

Multiple Environments

You can use different environment configurations:

# Production environment (default, uses .env)
client = DataCollective()

# Development environment (uses .env.development)
client = DataCollective(environment='development')

# Staging environment (uses .env.staging)  
client = DataCollective(environment='staging')

License

This project is released under MPL (Mozilla Public License) 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacollective-0.0.27.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datacollective-0.0.27-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file datacollective-0.0.27.tar.gz.

File metadata

  • Download URL: datacollective-0.0.27.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.12

File hashes

Hashes for datacollective-0.0.27.tar.gz
Algorithm Hash digest
SHA256 bf12f2a80af548ff04723a7429ea61f69da04c49191a3dc17155e6b0e8902797
MD5 0c6639ebffa2261a9638f941220023e7
BLAKE2b-256 a0b23368d0c5edc1a9015a7eeaf6fb88ed544fcdf0fe75053d46f33f4f7a5c42

See more details on using hashes here.

File details

Details for the file datacollective-0.0.27-py3-none-any.whl.

File metadata

File hashes

Hashes for datacollective-0.0.27-py3-none-any.whl
Algorithm Hash digest
SHA256 9c5b84f6326ae24c813fe95c5e999500a70de1a00af699be7d8c8285ddfd0baa
MD5 e65b26cf0986939d5f13ed54a94de1f4
BLAKE2b-256 7e2d8f82bc963a87d67e225dbc1320a51520625e1c2efcc71058dcda36200aa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page