A Python package for interfacing with the Mozilla Data Collective's API
Project description
Mozilla Data Collective Python API Library
Python library for interfacing with the Mozilla Data Collective REST API.
Pre-requisites
This package uses uv to convert our source code into a distribution package. Please download and install uv before moving further.
Installing and Testing Locally
First create a local virtual environment with uv
uv venv
Activate your virtual environment
source .venv/bin/activate
Install the package locally with uv
uv pip install -e .
Now you can import the package and test functionality. You will need to rerun uv pip install -e . again after any edits to the package for those updates to show up.
Environment Configuration
The DataCollective client supports multiple environments through environment-specific .env files. This allows you to easily switch between different API endpoints, API keys, and configurations.
Setting up Environment Files
-
Create your environment file(s):
# For production (default) cp .env.example .env # For development cp .env.example .env.development # For staging cp .env.example .env.staging
Note: If you don't have a
.env.examplefile yet, create one with the following template:# MDC API Configuration MDC_API_KEY=your-api-key-here MDC_API_URL=https://datacollective.mozillafoundation.org/api MDC_DOWNLOAD_PATH=~/.mozdata/datasets ENVIRONMENT=production
-
Configure your environment variables:
Edit your
.envfile (or environment-specific file) with your configuration:# Required: Your MDC API key MDC_API_KEY=your-api-key-here # Optional: API endpoint (defaults to production) MDC_API_URL=https://datacollective.mozillafoundation.org/api # Optional: Download path for datasets (defaults to ~/.mozdata/datasets) MDC_DOWNLOAD_PATH=~/.mozdata/datasets # Optional: Environment name (used for .env file selection) ENVIRONMENT=production
Using Different Environments
The client automatically loads the appropriate .env file based on the environment:
- Production (default): Uses
.env - Development: Uses
.env.development - Staging: Uses
.env.staging - Custom: Uses
.env.{environment_name}
Example Usage
from datacollective import DataCollective
# Use production environment (loads .env)
client = DataCollective()
# Use development environment (loads .env.development)
client = DataCollective(environment='development')
# Use staging environment (loads .env.staging)
client = DataCollective(environment='staging')
# Use custom environment (loads .env.custom)
client = DataCollective(environment='custom')
Testing Your Configuration
Test that your API key and configuration are being loaded correctly:
>>> from datacollective import DataCollective
>>> client = DataCollective()
>>> client.api_key
'your-api-key-here'
>>> client.api_url
'https://datacollective.mozillafoundation.org/api'
>>> client.download_path
'/Users/username/.mozdata/datasets'
Environment File Priority
The client loads environment variables in the following order:
- Environment-specific file (e.g.,
.env.development) - Default
.envfile (if environment-specific file doesn't exist) - System environment variables (highest priority)
Best Practices
- Never commit
.envfiles to version control - they contain sensitive information - Always commit
.env.exampleas a template for other developers - Use descriptive environment names (e.g.,
development,staging,production) - Keep environment-specific configurations minimal - only override what's different
- Use system environment variables for CI/CD pipelines and production deployments
Once your done, exit your virtual environment
deactivate
Testing
Tests are run by first installing the dev dependencies
uv pip install -e ".[dev]"
and then running tests with pytest
pytest
or
pytest --cov=datacollective
Development
This project uses modern Python development tools for code quality, formatting, and type checking.
Development Dependencies
Install all development dependencies:
uv pip install -e ".[dev]"
Code Formatting with Black
This project uses Black for consistent code formatting.
Format all code:
uv run black src/ tests/
Check formatting without making changes:
uv run black --check src/ tests/
Linting with Ruff
This project uses Ruff for fast linting and import sorting.
Lint all code:
uv run ruff check src/ tests/
Fix linting issues automatically:
uv run ruff check --fix src/ tests/
Format imports:
uv run ruff format src/ tests/
Type Checking with MyPy
This project uses MyPy for static type checking.
Type check all code:
uv run mypy src/
Pre-commit Hooks
Set up automated formatting and linting on every commit:
# Install pre-commit hooks
uv run pre-commit install
# Run hooks manually on all files
uv run pre-commit run --all-files
# Run hooks on staged files only
uv run pre-commit run
Development Scripts
Use the convenient development script for common tasks:
# Format code
uv run python scripts/dev.py format
# Lint code
uv run python scripts/dev.py lint
# Fix linting issues
uv run python scripts/dev.py fix
# Type check
uv run python scripts/dev.py typecheck
# Run tests
uv run python scripts/dev.py test
# Run all checks (format, lint, type check, and test)
uv run python scripts/dev.py all
# Version management
uv run python scripts/dev.py version
uv run python scripts/dev.py bump-patch
uv run python scripts/dev.py bump-minor
uv run python scripts/dev.py bump-major
# Build and publishing (without version bump)
uv run python scripts/dev.py clean
uv run python scripts/dev.py build
uv run python scripts/dev.py publish-test
uv run python scripts/dev.py publish
# Publishing with automatic version bump (recommended)
uv run python scripts/dev.py publish-bump-test # TestPyPI
uv run python scripts/dev.py publish-bump # PyPI
Configuration
All tool configurations are defined in pyproject.toml:
- Black: 88-character line length, Python 3.9+ target
- Ruff: Comprehensive linting rules including pycodestyle, pyflakes, isort, and more
- MyPy: Strict type checking with proper error handling
- Pre-commit: Automated formatting and linting on every commit
IDE Integration
For the best development experience, configure your IDE to use these tools:
VS Code - Add to your settings.json:
{
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.ruffEnabled": true,
"python.linting.mypyEnabled": true,
"python.linting.lintOnSave": true,
"editor.formatOnSave": true
}
PyCharm - Install the Black and Ruff plugins and configure them to run on save.
Version Management
This project uses bump2version for automated version management. This tool automatically updates version numbers in all relevant files and creates git commits and tags.
Installing bump2version
bump2version is included in the dev dependencies. Install it with:
uv pip install -e ".[dev]"
Automated Publishing Workflow (Recommended)
The easiest way to publish is using the automated workflow that handles version bumping and publishing:
# For TestPyPI (testing)
uv run python scripts/dev.py publish-bump-test
# For PyPI (production)
uv run python scripts/dev.py publish-bump
This will automatically:
- Bump the patch version (0.0.3 → 0.0.4)
- Run all quality checks (format, lint, type check, tests)
- Clean, build, and publish the package
- Create a git commit and tag
Manual Version Management
If you prefer more control, you can manage versions manually:
Patch version (0.0.1 → 0.0.2) - Bug fixes:
uv run bump2version patch
Minor version (0.0.2 → 0.1.0) - New features (backward compatible):
uv run bump2version minor
Major version (0.1.0 → 1.0.0) - Breaking changes:
uv run bump2version major
Semantic Versioning
Follow semantic versioning principles:
- MAJOR (1.0.0): Breaking changes that are not backward compatible
- MINOR (0.1.0): New features that are backward compatible
- PATCH (0.0.1): Bug fixes that are backward compatible
Examples:
0.0.1→0.0.2: Fixed a bug in dataset download0.0.2→0.1.0: Added new method for listing datasets0.1.0→1.0.0: Changed API interface (breaking change)
Building
To create the distribution package, run:
uv build
This will create the distribution package in an auto-generated dist directory.
Publishing to PyPI
This section covers how to publish the datacollective package to both TestPyPI (for testing) and PyPI (for production releases).
Prerequisites
Before publishing, ensure you have:
- TestPyPI Account: Create an account at test.pypi.org
- PyPI Account: Create an account at pypi.org
- API Tokens: Generate API tokens for both services (recommended over passwords)
- uv: The package uses
uvfor building and publishing
Setting up API Tokens
-
TestPyPI Token:
- Go to test.pypi.org/manage/account/token/
- Create a new token with scope "Entire account" (for testing)
- Save the token securely
-
PyPI Token:
- Go to pypi.org/manage/account/token/
- Create a new token with scope "Entire account" (or limit to specific projects)
- Save the token securely
Pre-Publication Checklist
Before publishing, ensure you've completed these steps:
-
Update Version (if needed):
# For bug fixes uv run python scripts/dev.py bump-patch # For new features uv run python scripts/dev.py bump-minor # For breaking changes uv run python scripts/dev.py bump-major
-
Run Quality Checks:
# Run all checks (format, lint, type check, tests) uv run python scripts/dev.py all
-
Clean Build Artifacts:
# Remove old build files to avoid conflicts uv run python scripts/dev.py clean
-
Build Package:
# Build fresh package uv run python scripts/dev.py build
-
Review Package:
# Check what files will be published ls -la dist/ # Verify version uv run python scripts/dev.py version
Publishing to TestPyPI
TestPyPI is a separate instance of PyPI for testing package uploads. Always test here first!
-
Configure TestPyPI credentials:
# Set your TestPyPI token export UV_PUBLISH_TOKEN_testpypi="your-testpypi-token-here"
-
Publish to TestPyPI (automated workflow):
# This will clean, build, and publish in one command uv run python scripts/dev.py publish-test
Or manually:
# Clean old build artifacts uv run python scripts/dev.py clean # Build fresh package uv run python scripts/dev.py build # Publish to TestPyPI uv publish --index testpypi
-
Verify the upload:
- Visit test.pypi.org/project/datacollective/
- Check that your package appears correctly
-
Test installation from TestPyPI:
# Create a fresh virtual environment uv venv test-env source test-env/bin/activate # Install from TestPyPI uv pip install --index-url https://test.pypi.org/simple/ datacollective # Test the package python -c "from datacollective import DataCollective; print('Installation successful!')"
Publishing to PyPI
Once you've successfully tested on TestPyPI, you can publish to the main PyPI:
-
Configure PyPI credentials:
# Set your PyPI token export UV_PUBLISH_TOKEN_pypi="your-pypi-token-here"
-
Publish to PyPI (automated workflow):
# This will clean, build, and publish in one command uv run python scripts/dev.py publish
Or manually:
# Clean old build artifacts uv run python scripts/dev.py clean # Build fresh package uv run python scripts/dev.py build # Publish to PyPI uv publish
-
Verify the upload:
- Visit pypi.org/project/datacollective/
- Check that your package appears correctly
-
Test installation from PyPI:
# Create a fresh virtual environment uv venv prod-env source prod-env/bin/activate # Install from PyPI uv pip install datacollective # Test the package python -c "from datacollective import DataCollective; print('Installation successful!')"
Version Management
This project uses automated version management with bump2version to ensure version numbers stay synchronized across all files.
Automated Version Bumping
Use the development script to bump versions automatically:
# Show current version
uv run python scripts/dev.py version
# Bump patch version (0.0.1 -> 0.0.2) - for bug fixes
uv run python scripts/dev.py bump-patch
# Bump minor version (0.0.1 -> 0.1.0) - for new features
uv run python scripts/dev.py bump-minor
# Bump major version (0.0.1 -> 1.0.0) - for breaking changes
uv run python scripts/dev.py bump-major
These commands will automatically:
- Update version numbers in
pyproject.tomlandsrc/datacollective/__init__.py - Create a git commit with the version bump
- Create a git tag for the new version
Manual Version Management
If you need to update versions manually:
-
Update version numbers:
pyproject.toml: Update theversionfieldsrc/datacollective/__init__.py: Update the__version__variable
-
Follow semantic versioning:
MAJOR.MINOR.PATCH(e.g., 1.0.0, 1.0.1, 1.1.0, 2.0.0)- MAJOR: Breaking changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes (backward compatible)
-
Create a git tag:
git tag v1.0.0 git push origin v1.0.0
Troubleshooting
Common Issues:
-
Package already exists: PyPI doesn't allow overwriting existing versions. Increment the version number.
-
Authentication failed: Verify your API tokens are correct and have the right permissions.
-
Build errors: Ensure all dependencies are properly specified in
pyproject.toml. -
TestPyPI vs PyPI: Remember that TestPyPI and PyPI are separate - you need to upload to both.
Useful Commands:
# Check package metadata
uv build --help
# Validate package before upload
uv publish --dry-run
# Check what files will be included
uv build --help
Security Notes
- Never commit API tokens to version control
- Use environment variables or secure credential storage
- Rotate tokens regularly for security
- Use scoped tokens when possible (limit to specific projects)
Automated Publishing (Optional)
For automated publishing, consider using GitHub Actions with secrets:
# .github/workflows/publish.yml
name: Publish to PyPI
on:
release:
types: [published]
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
- name: Publish to PyPI
run: uv publish
env:
UV_PUBLISH_TOKEN_pypi: ${{ secrets.PYPI_API_TOKEN }}
This workflow would automatically publish when you create a GitHub release.
License
This repository is released under MPL (Mozilla Public License) 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacollective-0.0.8.tar.gz.
File metadata
- Download URL: datacollective-0.0.8.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3bc185248b1341b1ade40f580254bea22e9ffd38f611037cfcc3eb446498f8b
|
|
| MD5 |
846de6526181fdf1b2ac4713b7fa8870
|
|
| BLAKE2b-256 |
0d846e474491c76b96025f69712c19a182838e86b2d6c3a2af7bd544abe248ec
|
File details
Details for the file datacollective-0.0.8-py3-none-any.whl.
File metadata
- Download URL: datacollective-0.0.8-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1841bf86fda3e301838ed4f04691f1219b56b491f885d872b327125fc9dfc7b
|
|
| MD5 |
aca8272f3d06546c08e46304d58dff6b
|
|
| BLAKE2b-256 |
8065606ddd7e7d9b372e3c6f25df0d31ac7734d88b2a5d7284262be89d332b57
|