Skip to main content

Python SDK and command-line toolkit for GTFS data processing, validation, and analysis. Provides programmatic access to Databús APIs, GTFS manipulation utilities, data conversion tools, and automated testing frameworks for transit data workflows and research applications.

Project description

Databús Python SDK

Python Version License Code Style

Python SDK and command-line toolkit for GTFS data processing, validation, and analysis. Provides programmatic access to Databús APIs, GTFS manipulation utilities, data conversion tools, and automated testing frameworks for transit data workflows and research applications.

Features

🚌 GTFS Data Processing

  • Load and manipulate GTFS feeds from ZIP files or directories
  • Filter feeds by geographic bounds or date ranges
  • Export processed feeds to various formats
  • Statistical analysis and reporting

✅ Data Validation

  • Comprehensive GTFS specification compliance checking
  • Custom validation rules and quality metrics
  • Detailed validation reports with scoring
  • Integration with standard validation tools

🌐 API Integration

  • Full access to Databús API endpoints
  • Automatic feed discovery and metadata retrieval
  • Bulk download and synchronization capabilities
  • Rate limiting and retry mechanisms

🛠️ Command-Line Tools

  • Intuitive CLI for common operations
  • Rich output formatting and progress indicators
  • Batch processing and automation support
  • Integration with shell scripts and workflows

Installation

Using uv (recommended)

# Install from PyPI (when published)
uv pip install databus

# Install from source
git clone https://github.com/fabianabarca/databus-py.git
cd databus-py
uv pip install -e .

Using pip

# Install from PyPI (when published)
pip install databus

# Install from source
git clone https://github.com/fabianabarca/databus-py.git
cd databus-py
pip install -e .

Quick Start

Python API

from databus import DatabusClient, GTFSProcessor, GTFSValidator

# Connect to Databús API
client = DatabusClient("https://api.databus.cr")
feeds = client.get_feeds(country="CR")

# Process a GTFS feed
processor = GTFSProcessor("costa_rica_gtfs.zip")
processor.load_feed()

# Get feed statistics
stats = processor.get_feed_stats()
print(f"Routes: {stats['routes']}, Stops: {stats['stops']}")

# Validate the feed
validator = GTFSValidator(processor)
report = validator.validate()
print(f"Validation score: {report.score}/100")

# Filter by geographic area
san_jose_area = processor.filter_by_bounding_box(
    9.8, -84.2, 10.1, -83.9
)
san_jose_area.export_to_zip("san_jose_gtfs.zip")

Command Line Interface

# List available feeds
databus api feeds --country CR

# Download a feed
databus api download costa-rica-gtfs

# Get feed information
databus gtfs info costa_rica_gtfs.zip

# Validate a feed
databus gtfs validate costa_rica_gtfs.zip

# Filter feed by bounding box
databus gtfs filter costa_rica_gtfs.zip san_jose.zip \
    --bbox "-84.2,9.8,-83.9,10.1"

# Filter by date range
databus gtfs filter costa_rica_gtfs.zip current_service.zip \
    --dates "2024-01-01,2024-12-31"

Documentation

Core Classes

DatabusClient

The main interface for interacting with Databús APIs:

client = DatabusClient(
    base_url="https://api.databus.cr",
    api_key="your_api_key",  # Optional
    timeout=30
)

# Discover feeds
feeds = client.get_feeds()
costarica_feeds = client.get_feeds(country="CR")

# Get detailed feed information
feed = client.get_feed("costa-rica-gtfs")

# Access GTFS data
agencies = client.get_agencies("costa-rica-gtfs")
routes = client.get_routes("costa-rica-gtfs")
stops = client.get_stops("costa-rica-gtfs")

# Download feeds
client.download_feed("costa-rica-gtfs", "costa_rica.zip")

GTFSProcessor

Load, manipulate, and analyze GTFS feeds:

processor = GTFSProcessor("feed.zip")
processor.load_feed()

# Access GTFS tables as DataFrames
routes = processor.get_routes()
stops = processor.get_stops(as_geodataframe=True)
trips = processor.get_trips(route_id="route_1")

# Get comprehensive statistics
stats = processor.get_feed_stats()
route_stats = processor.get_route_stats("route_1")

# Filter and transform
filtered = processor.filter_by_bounding_box(
    min_lat=9.8, min_lon=-84.2,
    max_lat=10.1, max_lon=-83.9
)
date_filtered = processor.filter_by_dates(
    "2024-01-01", "2024-12-31"
)

# Export results
processor.export_to_zip("processed_feed.zip")

GTFSValidator

Validate GTFS feeds for compliance and quality:

validator = GTFSValidator(processor)
report = validator.validate()

print(f"Status: {report.status}")
print(f"Score: {report.score}/100")
print(f"Errors: {len(report.errors)}")
print(f"Warnings: {len(report.warnings)}")

# Access detailed issues
for error in report.errors:
    print(f"Error: {error['message']}")

# Save report
with open("validation_report.json", "w") as f:
    f.write(report.to_json())

Configuration

Configure the library using environment variables or configuration files:

# Environment variables
export DATABUS_API_URL="https://api.databus.cr"
export DATABUS_API_KEY="your_api_key"
export DATABUS_LOG_LEVEL="INFO"

Or create a configuration file at ~/.databus/config.json:

{
  "api": {
    "base_url": "https://api.databus.cr",
    "api_key": "your_api_key",
    "timeout": 30
  },
  "logging": {
    "level": "INFO"
  },
  "processing": {
    "chunk_size": 10000
  }
}

Development

Setup Development Environment

git clone https://github.com/fabianabarca/databus-py.git
cd databus-py

# Install with development dependencies
uv pip install -e ".[dev,test]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=databus --cov-report=html

# Run specific test file
pytest tests/test_gtfs_processor.py

Code Quality

# Format code
black src/databus tests/

# Sort imports
isort src/databus tests/

# Lint code
flake8 src/databus tests/

# Type checking
mypy src/databus

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please read CONTRIBUTING.md for details on our code of conduct and development process.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databus-0.1.0.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databus-0.1.0-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file databus-0.1.0.tar.gz.

File metadata

  • Download URL: databus-0.1.0.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for databus-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3c164bea428730f20521b5985e5b34d8d60dd52e6779d8fff1fb502a6b75228a
MD5 52027f7d6631515068c436a91a7f94cf
BLAKE2b-256 a5807ca138bdfcb55dc6325a03b988ee77a01a6e5539902f1d2b3c4d5d942449

See more details on using hashes here.

File details

Details for the file databus-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: databus-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for databus-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f443f8c2482a82713f2821eb9b8feb5490956cabcffaf6241eeff69758e09446
MD5 7d397c38d403b39d25a0bd2f272162b3
BLAKE2b-256 f3e72d49270df8157881240eb346c9abdc938ad93e03b2f6fd2bc276ea9e0728

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page