Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.
Project description
Canonada
⚠️ Canonada is currently under development and is not ready for production use.
Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.
Why Canonada?
- Standardized: Canonada provides a standardized way to build your data projects
- Modular: Canonada is modular and allows you to build and visualize data pipelines with ease
- Memory Efficient: Canonada is memory efficient and can handle large datasets by streaming data through the pipeline instead of loading it all at once
Features
- Centralized control of data sources: Manage all your data sources in one place, enabling you to keep your team in sync
- Centralized control of the project configuration: Manage all your project configurations in one place
- Easy dataloading: Load data from various sources like CSV, JSON, Parquet, etc.
- Use functions as nodes: Functions are the building blocks of Canonada. You can use any function as a node in your pipeline
- Create streaming data pipelines: Create parallel and sequential data pipelines with ease
- Visualize your data pipeline: Visualize your data pipeline
- Documentation: Collect and display the documentation of your project [⚠️ under development]
Project Structure
canonada.toml
config/
catalog.toml
parameters.toml
credentials.toml
data/
...
datahandlers/
__init__.py
custom_datahandler_1.py
custom_datahandler_2.py
...
notebooks/
...
pipelines/
__init__.py
pipeline_1.py
pipeline_2.py
nodes_1/
__init__.py
node_1.py
node_2.py
...
nodes_2/
__init__.py
node_3.py
node_4.py
...
...
systems/
__init__.py
system_1.py
system_2.py
...
tests/
test_node_group_1.py
test_node_group_2.py
...
Usage
Available commands:
Usage: canonada <command> <args>
Commands:
new <project_name> - Create a new project
catalog [list/params] - List all available datasets or get the project parameters
registry [pipelines/systems] - List all available pipelines or systems
run [pipelines/systems] <name(s)> - Run a pipeline or system
view [pipelines/systems] <name(s)> - View a pipeline or system
docs - Generate and serve documentation [not implemented]
version - Print the version of Canonada
Installation
Canonada is available on PyPI and can be installed using pip:
pip install canonada
Check out the Getting Started guide to learn how to create a new project with Canonada.
Documentation
Check out the project's documentation here
Contributing
Coming soon...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
canonada-0.0.6.tar.gz
(17.3 kB
view hashes)
Built Distribution
canonada-0.0.6-py3-none-any.whl
(17.3 kB
view hashes)