Manage your dataflows seamlessly
Project description
Dataflow Awesome Managing Engine
The easiest dataflow managing framework - currently under construction.
DAME solves/facilitates:
- Building datasets from files / folders
- Transforming data in the right order
- Saving transformed data - once computed never compute it again
- Choosing the best transformation from a few configurations
Great for working with numpy, pyTorch and more.
Vision
Technically:
- Compute stages:
- Sources - get data element
- Transforms - compute something out of available data
- Reducers - compute something on the whole dataset
- Combining data sources
- Compute only what you need - optimized performance via DAGs
- Backup and cache, after stages, support for custom serializers
- Ranking various configurations
- (Optional) Parallel processing
Priorities:
- Easy to use
- Batteries included
- Little overhead - take advantage of fastest tools available
- Integrates seamlessly with other tools
- Expandable
Nice to have:
- Few python dependencies
- Integrate tqdm
- DAG output
Backlog:
1.0.0:
- - Dataset - compute items via Sources and Transforms
- - Dataset - compute stage by stage, (assequence)
- - Dataset - validate Transforms
- - Dataset - (_Stages) DAG computations
- - Dataset - Automatic (Transform) versioning based on source and attrs
- - Workers - MultiThreading / MultiProcessing
- - Dataset - Building context for transforms
- - Storage - SQLite
- - WIP - Dataset - Enable Storage & Caching
- - Reducer - Scoring
- - Reducer - Ranking configurations, Find optimal parameters
- - Stages - Make an actual DAG instead of topsort
- - Cache - Ring
- - Dataset - Compute by chunks for efficient cache
- - Transform - Mapping Transform, Sequential transform
- - Transform - Delete intermediate result
- - Dataset - Autodelete unrequired objects form memory (Autosequential)
- - Docs - Dame tutorial & more tests
- - TODOS - Solve left todos from the code
Storage/Cache options:
- Pickle
- Joblib
- Redis
- Sqlite
- PyTables
- Parquet/Dask
2.0.0 Ideas:
- Easy reuse Dame transforms in Luigi/Dask/Apache Hadoop
- More built-in storage and cache options
- Built-in datasets like torchvision.MNIST etc
- Module for managing on disk datasets. GUI? Conversion between:
- Pytorch ImageFolder
- Images + csv
- Some Other
Development:
- - tox - build
- - tox - publish
- - hosting docs on readthedocs
- - tox - publish docs
- - coverage
- - badges
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dame-0.0.2.tar.gz
(13.3 kB
view hashes)
Built Distribution
dame-0.0.2-py3-none-any.whl
(15.4 kB
view hashes)