End-to-end machine learning on your desktop or server.
Project description
📚 Documentation
AIQC accelerates research teams with an open source framework for deep learning pipelines.
A simple Python framework for conducting rapid, rigorous, and reproducible experiments.
Deep learning is difficult to implement because leading tools skip the following data wrangling challenges:
- Preprocessing - Data must be encoded into a machine-readable format. Encoders don't handle multiple dimensions, columns, & types. Leakage occurs if splits/folds aren't encoded separately. Lack of validation splits causes evaluation bias. Which samples were used for training?
- Experiment Tracking - Tuning parameters and architectures requires evaluating many training runs with metrics and charts. However, leading tools are only designed for a single run and don't keep track of performance. Validation splits and/or cross-validation folds compound these problems.
- Postprocessing - If the encoder-decoder pairs weren't saved, then how should new samples be encoded and predictions be decoded? Do new samples have the same schema as the training samples? Did encoders spawn extra columns? Multiple encoders compound these problems.
Adding to the complexity, different protocols are required based on: analysis type (e.g. categorize, quantify, generate), data type (e.g. spreadsheet, sequence, image), and data dimensionality (e.g. timepoints per sample).
In attempting to solve these problems ad hoc, individuals end up writing lots of tangled code and stitching together a Frankenstein set of tools. Doing so requires knowledge of not only data science but also software engineering, which places a skillset burden on the research team. The DIY approach is not maintainable.
Thanks to the support & sponsorship of:
As seen at PyData Global 2021:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.