TransLink's Python Tools - A comprehensive toolkit for transportation modeling and forecasting
Project description
TLPyTools
TransLink's Python Tools - A comprehensive toolkit for transportation modeling and forecasting developed by the TransLink Forecasting Team.
Overview
TLPyTools provides a suite of utilities and tools designed to support various aspects of transportation modeling workflows, from data management and processing to cloud synchronization and model orchestration. Built specifically for the TransLink Forecasting Team's modeling needs.
Installation
Note: TLPyTools requires Python 3.10 or higher. ActivitySim and PopulationSim are available for development installations using uv sync.
Using uv (Recommended)
TLPyTools uses uv for fast and reliable dependency management.
Installing uv
Option 1: Direct download (Recommended for Windows)
- Download uv Windows executable - uv-x86_64-pc-windows-msvc.zip
- Move all files within the zip file downloaded (including
uv.exe,uvw.exe,uvx.exe) into a new folderC:\ProgramData\uv - Add
C:\ProgramData\uvto the system environment variable PATH. - Open a new command prompt and test the command
uv --help
Option 2: Install script (Recommended for Linux/macOS)
curl -LsSf https://astral.sh/uv/install.sh | sh
After installation, restart your terminal or run:
source ~/.bashrc # or ~/.zshrc depending on your shell
Installing TLPyTools with uv
# Clone and install the package
git clone https://github.com/TransLinkForecasting/tlpytools.git
cd tlpytools
# Install core package only
uv sync
# Install with ORCA orchestrator support
uv sync --extra orca
# Install with full development environment (includes GIS tools, visualization, etc.)
uv sync --extra dev
# Install multiple extras (common combinations)
uv sync --extra dev --extra orca
# For development with ActivitySim and PopulationSim (git dependencies available via uv)
# Note: ActivitySim/PopulationSim are only available for development, not through PyPI
uv sync --extra dev --extra orca --group activitysim
Using pip (Alternative)
You can still use pip for installation, but ActivitySim and PopulationSim are not available as extras due to PyPI restrictions on git dependencies:
# Core package only
pip install tlpytools
# With ORCA orchestrator support
pip install tlpytools[orca]
# With full development environment
pip install tlpytools[dev]
# Multiple extras
pip install tlpytools[dev,orca]
# Development installation (without ActivitySim - use uv for that)
git clone https://github.com/TransLinkForecasting/tlpytools.git
cd tlpytools
pip install -e .[dev,orca]
Core Modules
Data Management (tlpytools.data)
Utilities for data processing and manipulation:
- DataFrame Operations: Enhanced pandas functionality for transportation data
- Spatial Data Support: Optional GIS operations (requires
geopandas) - Data Validation: Tools for checking data integrity and consistency
from tlpytools.data import read_spatial_data, validate_dataframe
# Load spatial data (if geopandas available)
gdf = read_spatial_data("zones.shp")
# Validate data structure
is_valid = validate_dataframe(df, required_columns=['zone_id', 'households'])
Data Storage (tlpytools.data_store)
Comprehensive data storage and retrieval functionality:
- Multiple Backends: Support for various storage formats
- Metadata Management: Automatic tracking of data lineage
- Version Control: Built-in data versioning capabilities
from tlpytools.data_store import DataStore
store = DataStore("my_project")
store.save_data(df, "travel_times", metadata={"source": "model_run_1"})
retrieved_df = store.load_data("travel_times")
SQL Server Integration (tlpytools.sql_server)
Tools for working with SQL Server databases:
- Connection Management: Simplified database connections with Azure AD authentication
- Query Utilities: Helper functions for common operations
- Bulk Operations: Efficient data loading and extraction
- Azure SQL Support: ActiveDirectoryInteractive authentication for secure cloud access
from tlpytools.sql_server import azure_sql_tables
# Azure SQL with interactive authentication (browser popup)
df_dict = azure_sql_tables.read_tables(schema="dbo", table="travel_data")
# Write tables to Azure SQL
table_spec = {"my_table": "dbo.destination_table"}
df_dict = {"my_table": dataframe}
azure_sql_tables.write_tables(table_spec, df_dict)
Environment Variables Required:
TLPT_AZURE_SQL_URI: Your Azure SQL Server URITLPT_AZURE_SQL_USER: Your Azure AD email (optional, prompts if not set)
Cloud Storage (tlpytools.adls_server)
Azure Data Lake Storage integration:
- File Synchronization: Upload/download with conflict resolution
- Batch Operations: Efficient handling of large datasets
- Authentication: Secure connection management
from tlpytools.adls_server import adls_util
# Upload files to cloud storage
adls_util.upload_files(local_path="data/", remote_path="project/data/")
# Download with pattern matching
adls_util.download_files(remote_pattern="outputs/*.csv", local_path="results/")
Configuration Management (tlpytools.config)
Centralized configuration handling:
- YAML Support: Human-readable configuration files
- Environment Variables: Runtime configuration overrides
- Validation: Schema validation for configuration files
from tlpytools.config import load_config, validate_config
config = load_config("model_config.yaml")
if validate_config(config, schema="model_schema.json"):
print("Configuration is valid")
Logging (tlpytools.log)
Enhanced logging capabilities:
- Structured Logging: Consistent log formatting across projects
- Multiple Outputs: Console and file logging with different levels
- Performance Tracking: Built-in timing and profiling support
from tlpytools.log import setup_logger
logger = setup_logger("my_model", log_file="model.log")
logger.info("Starting model run")
logger.performance("Model completed", execution_time=120.5)
ORCA Model Orchestration
TLPyTools includes the ORCA (Orchestrated Regional Comprehensive Analysis) transportation model orchestrator as an optional component.
Quick Start with ORCA
# Install with ORCA support
pip install tlpytools[orca]
# Initialize a new model scenario
python -m tlpytools.orca --action init_model --scenario db_example
# Run the complete model workflow
python -m tlpytools.orca --action run_model --scenario db_example
ORCA Features
- Multi-Model Coordination: Orchestrates ActivitySim, commercial vehicle models, and traffic assignment
- Cloud Integration: Automatic synchronization with Azure Data Lake Storage
- State Management: Resume interrupted model runs from any point
- Configurable Workflows: YAML-based configuration for complex modeling pipelines
For detailed ORCA documentation, see README_ORCA.md.
Key Features
🔧 Modular Design
- Independent modules with optional dependencies
- Use only what you need without heavy dependency chains
- Clear separation of concerns for easier maintenance
🚀 Performance Optimized
- Efficient data processing with pandas and NumPy
- Chunked operations for large datasets
- Optional performance monitoring and profiling
☁️ Cloud Ready
- Native Azure integration for data storage and processing
- Secure authentication and connection management
- Efficient file synchronization with conflict resolution
🔄 Production Ready
- Comprehensive error handling and logging
- State management for long-running processes
- Configurable retry logic and timeout handling
🧪 Testing Support
- Built-in validation utilities
- Mock objects for testing cloud operations
- Comprehensive test suite included
Usage Examples
Basic Data Pipeline
from tlpytools.data import process_survey_data
from tlpytools.data_store import DataStore
from tlpytools.log import setup_logger
# Setup logging
logger = setup_logger("data_pipeline")
# Process survey data
processed_data = process_survey_data("survey_2023.csv")
logger.info(f"Processed {len(processed_data)} survey records")
# Store results
store = DataStore("survey_analysis")
store.save_data(processed_data, "processed_survey_2023")
Cloud Synchronization Workflow
from tlpytools.adls_server import adls_util
from tlpytools.config import load_config
# Load cloud configuration
config = load_config("cloud_config.yaml")
# Sync local results to cloud
adls_util.upload_directory(
local_path="model_outputs/",
remote_path=f"projects/{config['project_name']}/outputs/",
conflict_resolution="timestamp"
)
SQL Server Integration
from tlpytools.sql_server import SQLServerConnection
from tlpytools.data import validate_dataframe
# Connect and validate data
with SQLServerConnection("prod_server", "transport_db") as conn:
# Load reference data
zones = conn.query("SELECT * FROM zones WHERE active = 1")
# Validate structure
if validate_dataframe(zones, required_columns=['zone_id', 'area_type']):
print(f"Loaded {len(zones)} valid zones")
Dependencies
Core Dependencies
pandas>=1.1- Data manipulation and analysisnumpy>=1.18- Numerical computingsqlalchemy>=1.4- SQL toolkit and ORMpyodbc>=4.0- SQL Server connectivitypyyaml>=5.4- YAML configuration filesazure-core>=1.34- Azure SDK core functionalityazure-identity>=1.23- Azure authenticationazure-storage-blob>=12.24- Azure Blob Storageazure-storage-file-datalake>=12.18- Azure Data Lake Storage
Optional Dependencies
ORCA Module (uv sync --extra orca or pip install tlpytools[orca]):
psutil>=5.8.0- System monitoring and performance trackingunittest-xml-reporting>=3.2.0- Enhanced test reporting
ActivitySim Module (Development only - via uv sync --group activitysim):
activitysim- TransLink's customized ActivitySim (from GitHub, not available on PyPI)populationsim- Synthetic population generation tool (from GitHub, not available on PyPI)
Note: ActivitySim and PopulationSim are only available through
uv sync --group activitysimfor development installations due to PyPI restrictions on git dependencies. They are not available as pip extras.
Development Environment (uv sync --extra dev or pip install tlpytools[dev]):
Geospatial Analysis Tools:
geopandas>=0.13.0- Geospatial data manipulationGDAL>=3.6.0- Geospatial data abstraction libraryShapely>=2.0.0- Geometric operationsFiona>=1.9.0- Vector data I/Opyproj>=3.4.0- Cartographic projectionsRtree>=1.0.0- Spatial indexingCartopy>=0.21.0- Cartographic projections for matplotlibcontextily>=1.5.0- Web map tiles for matplotlibfolium>=0.14.0- Interactive maps
Visualization and Dashboard Tools:
plotly>=5.17.0- Interactive plottingdash>=2.14.0- Web application frameworkdash-extensions>=1.0.0- Additional Dash componentsdash-leaflet>=0.1.0- Leaflet maps for Dashpanel>=1.3.0- High-level dashboard framework
Development and Code Quality:
black>=23.0.0- Code formattingruff>=0.1.0- Fast Python linterpytest>=7.4.0- Testing frameworkpytest-cov>=4.1.0- Coverage reportingmypy>=1.5.0- Static type checkingpre-commit>=3.4.0- Git pre-commit hooks
Other Utilities:
polyline>=2.0.0- Polyline encoding/decodingjupyter>=1.0.0- Jupyter ecosystemipykernel>=6.25.0- IPython kernel for Jupyter
Manual GDAL Installation
GDAL is not included in the default dependencies due to compilation complexity, but can be added manually for advanced geospatial operations:
# Add GDAL to your project
uv add "GDAL>=3.6.0"
Prerequisites for successful GDAL compilation:
Windows:
- Install Visual Studio Build Tools or Visual Studio Community
- Ensure C++ build tools are included
- Add Visual Studio tools to your PATH environment variable
Linux (Ubuntu/Debian):
# Install build dependencies
sudo apt-get update
sudo apt-get install build-essential libgdal-dev gdal-bin
Linux (RHEL/CentOS/Fedora):
# Install build dependencies
sudo dnf install gcc-c++ gdal-devel gdal
# or for older systems: sudo yum install gcc-c++ gdal-devel gdal
Note: GDAL compilation can be time-consuming and may fail on some systems due to missing system libraries. If you encounter issues, consider using system package managers or Docker environments for more reliable installations.
Configuration
TLPyTools uses YAML configuration files for most components. Example configuration:
# tlpytools_config.yaml
data_store:
backend: "local"
base_path: "data/"
versioning: true
cloud:
provider: "azure"
storage_account: "your_account"
container: "your_container"
logging:
level: "INFO"
file_output: true
console_output: true
Load configuration in your code:
from tlpytools.config import load_config
config = load_config("tlpytools_config.yaml")
Error Handling
TLPyTools provides graceful error handling for optional dependencies:
# This works even if geopandas is not installed
from tlpytools.data import read_spatial_data
try:
gdf = read_spatial_data("zones.shp")
except ImportError as e:
print(f"Spatial operations not available: {e}")
# Fallback to regular CSV reading
df = pd.read_csv("zones.csv")
Testing
Run the test suite:
# Using uv (recommended)
uv run pytest
# Run all tests with coverage
uv run pytest --cov=tests
# Run specific module tests
uv run pytest tests/test_data.py
# Run ORCA tests specifically
uv run pytest src/tlpytools/orca/tests/
# Using pip (alternative)
python -m pytest
# Run with coverage
python -m pytest --cov=tests
Contributing
We welcome contributions to TLPyTools! Please follow these guidelines:
- Fork the repository and create a feature branch
- Install development dependencies:
pip install -e .[dev] - Write tests for new functionality
- Follow code style guidelines (run
blackandruff) - Update documentation as needed
- Submit a pull request with a clear description
Development Setup
Using uv (Recommended)
# Install uv if not already installed
pip install uv
# Clone the repository
git clone https://github.com/TransLinkForecasting/tlpytools.git
cd tlpytools
# Quick setup using Makefile
make dev-setup
# Or manual setup:
# Install full development environment (includes GIS tools, visualization, etc.)
uv sync --extra dev --extra orca
# Note: ActivitySim and PopulationSim can be added with --group activitysim
# when working in a development setup (they're not PyPI extras but are uv dependency groups)
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Run tests
uv run pytest
# Run code formatting
uv run black src/
uv run ruff check src/
# Run type checking
uv run mypy src/
Common Development Tasks
# Using the provided Makefile (recommended)
make help # Show all available commands
make install-all # Install all dependencies
make test # Run tests
make test-cov # Run tests with coverage
make lint # Check code style
make format # Format code
make type-check # Run type checking
make check-all # Run all quality checks
Using pip (Alternative)
# Clone the repository
git clone https://github.com/TransLinkForecasting/tlpytools.git
cd tlpytools
# Create development environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode (without ActivitySim - use uv for ActivitySim)
pip install -e .[orca,dev]
# Run tests
python -m pytest
Support
- Documentation: Comprehensive module documentation available
- Examples: See
examples/directory for usage examples - Issues: Report bugs and feature requests on GitHub Issues
- Email: Contact the TransLink Forecasting Team at forecasting@translink.ca
License
This project is proprietary software developed by TransLink. All rights reserved.
Version History
- 0.1.19:
- ORCA enhancements and utilities:
- Added
OrcaUtilmodule with utilities for full cycle model runs - Added
unpack_omx_from_zipfunction for efficient OMX file extraction from archives - Improved OMX handling: now uses temporary files instead of BytesIO for better compatibility with openmatrix
- Enhanced zip file operations with automatic cleanup of existing OMX files before extraction
- Added zipfile compression level support throughout ORCA module
- Added
- Advanced workflow controls:
- Added support for advanced run by step or iteration
- Added resume from earlier checkpoint functionality
- Configuration reload and safety checks for iteration handling
- Safety checks to prevent model_steps and total_iterations changes during mid-run resume
- Refactor iteration control logic and update integration tests for sub-component execution
- State management improvements:
- Added output archive registry to model run state (stored in JSON)
- Enhanced scenario summary reporting
- Bug fixes:
- Fixed Python command path resolution issues
- Normalized path separators for consistent comparison in path filtering operations
- Removed non-standard characters from scripts for better compatibility
- ORCA enhancements and utilities:
- 0.1.19:
- ORCA CLI simplification and workflow improvements:
- Removed
--overwriteparameter: conflicting scenario folders must be manually deleted before initialization - Removed
--dry-runparameter: all commands now execute actions directly - Removed
--iterationsand--stepparameters: model runs now resume from beginning or last checkpoint only - Active state and config file names within scenario folders are now hardcoded for consistency
- Change docker container working directory and vm mount point to
/home/ubuntu/wd
- Removed
- Code reorganization:
- Moved land use unpacking functionality to
activitysim/simulation.pyand related utilities in the ORCA repository
- Moved land use unpacking functionality to
- Logging enhancements:
- Standardized sub-component logging with consistent naming conventions and dedicated log files
- Added console output suppression for sub-components to simplify tracking of overall ORCA model run status
- Improved log message clarity for file upload/download operations and post-run archiving
- ORCA CLI simplification and workflow improvements:
- 0.1.17:
- Terminology consistency improvements in ORCA module:
- Renamed
model_run_nametoscenario_namethroughout codebase for clarity - Updated ORCA actions to use
init_modelandrun_model(previously init_scenario/run_scenario) - Changed ADLS folder reference from
orca_model_runstoorca_scenariosin documentation - Removed all references to "databanks" in ORCA module
- Renamed
- Cloud synchronization performance improvements:
- Added caching for remote state files to eliminate redundant ADLS downloads during sync operations
- Enhanced file existence checking: now uses ADLS list operations before attempting downloads
- Improved logging: reduced ERROR messages to DEBUG level when files don't exist (normal for new scenarios)
- Upload operations now 3x faster by checking file existence once instead of downloading multiple times
- Updated documentation and diagrams to reflect terminology changes
- Terminology consistency improvements in ORCA module:
- 0.1.16:
- update available az auth options and update defaults of az auth options
- developer desktop is the default and will use azure cli
- production server should use managed identity (require updates to .env)
- update dependency versions
- update available az auth options and update defaults of az auth options
- 0.1.15:
- Add user options for Azure SQL Server
- breaking change: class
azure_td_tableshas been renamed toazure_sql_tables.
- breaking change: class
- Add user options for Azure SQL Server
- 0.1.14:
- Fix credential timeout issues with azure
- 0.1.13:
- Add polars, openpyxl, dash-bootstrap-components as dev dependencies
- Bump versions -
- all dependency versions
- activitysim version to https://github.com/TransLinkForecasting/activitysim/commit/5963d038c5c05798ef061ea9ad469a62326e363c
- populationsim version to https://github.com/TransLinkForecasting/populationsim/commit/cc22d25499e7c54ee5ea184a7ecd0f9ee7f20231
- 0.1.12:
- Improve logger for orca to be instance only, and update log filename to include timestamp
- Fixed multi run bug when tasks are queued back to back on the same VM, some changes were made to ORCA implementation in conjuction with this change:
- 0.1.11:
- Bump versions -
- all dependency versions
- activitysim version to https://github.com/TransLinkForecasting/activitysim/commit/5e781a705800fecc627a4a580781d596cd1a54e3
- populationsim version to https://github.com/TransLinkForecasting/populationsim/commit/2d274d72985024096cb4637573261718c6684a13
- Fix
--projectargument inconsistency across adls util and batch api caller - Test different Azure Auth methods, added notes to recommend AZ CLI in .env.examples
- Bump versions -
- 0.1.10:
- Add support for uploading of inputs folder to ADLS, this will allow orca commands to access shared inputs data
- 0.1.9:
- Add support for package-level .env file support to simplify set up and provide more flexibility
- 0.1.8:
- Set Python version to 3.10 to align with activitysim
- Add Azure Batch API caller part of orca
- Add unified logger for better maintainability
- Improve Azure credential handling to allow for differences between local testing and cloud production
- Fix minor bugs with workflows and release pipelines
- 0.1.7:
- Migration to uv for dependency management
- Fixed PyPI deployment: Moved ActivitySim and PopulationSim to uv dependency groups (no longer PyPI extras)
- ActivitySim now available via
uv sync --group activitysimfor development only - Comprehensive dev dependencies for geospatial analysis tools (GDAL, Shapely, GeoPandas, etc.)
- Added visualization tools (Plotly, Dash, Panel, Folium)
- Enhanced development tooling (Black, Ruff, MyPy, Pre-commit)
- ORCA namespace reorganization, improved modularity
- 0.1.6.1: Add data store support for RESUME_AFTER functionality, add ADLS and Azure SQL Server support
- 0.1.6.0: Enhanced cloud synchronization, performance monitoring
- 0.1.5.x: Core module stabilization, testing improvements
- 0.1.4.x: Initial SQL Server integration, configuration management
- 0.1.3.x: Data storage utilities, logging enhancements
- 0.1.2.x: Cloud storage integration, ADLS support
- 0.1.1.x: Core data processing utilities
- 0.1.0.x: Initial release with basic functionality
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tlpytools-0.1.19.tar.gz.
File metadata
- Download URL: tlpytools-0.1.19.tar.gz
- Upload date:
- Size: 279.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27493001e477e408fd1471d0471f1ddb48c9c7210751fdccbe217e2f40e73de2
|
|
| MD5 |
6eb3531d918089e81e943428ae20b3ee
|
|
| BLAKE2b-256 |
2ee7077a8568a865c5d9f09b8b59d2dc5a32a7b05e3cef62b9cba9cd68ab1f17
|
File details
Details for the file tlpytools-0.1.19-py3-none-any.whl.
File metadata
- Download URL: tlpytools-0.1.19-py3-none-any.whl
- Upload date:
- Size: 114.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f07d2ba84384b0ef05133f380da91ecb9a6ac2c07ad12726091f273e1e149443
|
|
| MD5 |
5b95ec71bee2054577acc53e3f1e209f
|
|
| BLAKE2b-256 |
66bf99e90c636bc8829156459bd6d4cd1550b07d7fa3c17a50c35156820c54a2
|