Manage your dataflows seamlessly
Project description
Dataflow Awesome Managing Engine
The easiest dataflow managing framework - currently under construction.
DAME solves/facilitates:
- Building datasets from files / folders
- Transforming data in the right order
- Saving transformed data - once computed never compute it again
- Choosing the best transformation from a few configurations
Great for working with numpy, pyTorch and more.
Vision
Technically:
- Compute stages:
- Sources - get data element
- Transforms - compute something out of available data
- Reducers - compute something on the whole dataset
- Combining data sources
- Compute only what you need - optimized performance via DAGs
- Backup and cache, after stages, support for custom serializers
- Ranking various configurations
- (Optional) Parallel processing
Priorities:
- Easy to use
- Batteries included
- Little overhead - take advantage of fastest tools available
- Integrates seamlessly with other tools
- Expandable
Nice to have:
- Few python dependencies
- Integrate tqdm
- DAG output
Backlog:
1.0.0:
- [x] - Dataset - compute items via Sources and Transforms
- [x] - Dataset - compute stage by stage, (assequence)
- [x] - Dataset - validate Transforms
- [x] - Dataset - (_Stages) DAG computations
- [x] - Dataset - Automatic (Transform) versioning based on source and attrs
- [x] - Workers - MultiThreading / MultiProcessing
- [x] - Dataset - Building context for transforms
- [x] - Storage - SQLite
- [ ] - WIP - Dataset - Enable Storage & Caching
- [ ] - Reducer - Scoring
- [ ] - Reducer - Ranking configurations, Find optimal parameters
- [ ] - Stages - Make an actual DAG instead of topsort
- [ ] - Cache - Ring
- [ ] - Dataset - Compute by chunks for efficient cache
- [ ] - Transform - Mapping Transform, Sequential transform
- [ ] - Transform - Delete intermediate result
- [ ] - Dataset - Autodelete unrequired objects form memory (Autosequential)
- [ ] - Docs - Dame tutorial & more tests
- [ ] - TODOS - Solve left todos from the code
Storage/Cache options:
- [ ] Pickle
- [ ] Joblib
- [ ] Redis
- [ ] Sqlite
- [ ] PyTables
- [ ] Parquet/Dask
2.0.0 Ideas:
- Easy reuse Dame transforms in Luigi/Dask/Apache Hadoop
- More built-in storage and cache options
- Built-in datasets like torchvision.MNIST etc
- Module for managing on disk datasets. GUI? Conversion between:
- Pytorch ImageFolder
- Images + csv
- Some Other
Development:
- [ ] - tox - build
- [ ] - tox - publish
- [ ] - hosting docs on readthedocs
- [ ] - tox - publish docs
- [ ] - coverage
- [ ] - badges
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size dame-0.0.2-py3-none-any.whl (15.4 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size dame-0.0.2.tar.gz (13.3 kB) | File type Source | Python version None | Upload date | Hashes View |