Manage your dataflows seamlessly
Project description
Dataflow Awesome Managing Engine
The easiest dataflow managing framework - currently under construction.
DAME solves/facilitates:
- Building datasets from files / folders
- Transforming data in the right order
- Saving transformed data - once computed never compute it again
- Choosing the best transformation from a few configurations
Great for working with numpy, pyTorch and more.
Vision
Technically:
- Compute stages:
- Sources - get data element
- Transforms - compute something out of available data
- Reducers - compute something on the whole dataset
- Combining data sources
- Compute only what you need - optimized performance via DAGs
- Backup and cache, after stages, support for custom serializers
- Ranking various configurations
- (Optional) Parallel processing
Priorities:
- Easy to use
- Batteries included
- Little overhead - take advantage of fastest tools available
- Integrates seamlessly with other tools
- Expandable
Nice to have:
- Few python dependencies
- Integrate tqdm
- DAG output
Backlog:
1.0.0:
- - Dataset - compute items via Sources and Transforms
- - Dataset - compute stage by stage, (assequence)
- - Dataset - validate Transforms
- - Dataset - (_Stages) DAG computations
- - Dataset - Automatic (Transform) versioning based on source and attrs
- - Workers - MultiThreading / MultiProcessing
- - Dataset - Building context for transforms
- - Storage - SQLite
- - WIP - Dataset - Enable Storage & Caching
- - Reducer - Scoring
- - Reducer - Ranking configurations, Find optimal parameters
- - Stages - Make an actual DAG instead of topsort
- - Cache - Ring
- - Dataset - Compute by chunks for efficient cache
- - Transform - Mapping Transform, Sequential transform
- - Transform - Delete intermediate result
- - Dataset - Autodelete unrequired objects form memory (Autosequential)
- - Docs - Dame tutorial & more tests
- - TODOS - Solve left todos from the code
Storage/Cache options:
- Pickle
- Joblib
- Redis
- Sqlite
- PyTables
- Parquet/Dask
2.0.0 Ideas:
- Easy reuse Dame transforms in Luigi/Dask/Apache Hadoop
- More built-in storage and cache options
- Built-in datasets like torchvision.MNIST etc
- Module for managing on disk datasets. GUI? Conversion between:
- Pytorch ImageFolder
- Images + csv
- Some Other
Development:
- - tox - build
- - tox - publish
- - hosting docs on readthedocs
- - tox - publish docs
- - coverage
- - badges
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dame-0.0.2.tar.gz
(13.3 kB
view details)
Built Distribution
dame-0.0.2-py3-none-any.whl
(15.4 kB
view details)
File details
Details for the file dame-0.0.2.tar.gz
.
File metadata
- Download URL: dame-0.0.2.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2b510cbb5b8f0d400a6bc4bc39362527721b747cc81fa0e53d1a2741bbe67e4 |
|
MD5 | de8cfb6d062cb899377838424546f52c |
|
BLAKE2b-256 | 9866295b62ea15d051c7db732679564d48cdcc46c9584d9b9ceffb7162f48bc5 |
File details
Details for the file dame-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: dame-0.0.2-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ed2e16120c9583663b0a94ab2a53e80d939f778e8fca2074f84879703042c52 |
|
MD5 | 245d24cc8739a27695c6285a49972d96 |
|
BLAKE2b-256 | 2f4f3e84f88b3bc0616b1f434fabac4d7c6b0aa72cd1693aaafe787de9751e92 |