Investigation into using AutoML and Topological Data Analysis for Automated Annotation
Project description
Create a MVP as fast as possible. Goal is to investigate whether TDA can be used to create an unsupervised algorithm for automatically producing annotations and labels for relatively simple data. Key features include:
Find datasets which can be used to for classification and or clustering purposes [X] I am going to use Taskmaster 2 data set.
Clean and explore the data
Wrangle the data to get into a form compatible for tools such as word2vec or deepwalk the former most likely accessed via gensim or pytorch
Use these tools to encode data as vectors.
Use simple models as baselines for classification and clustering. (KNN, other algs)
Apply Topological data analysis on these vectors to produce clusters (ToMATo algorithm maybe a custom algorithm)
Compare TDA results with baselines using statistical/error analysis. Could use external clustering based metrics or more familiar metrics like F-measure.
As the main goal here is to investigate whether unsupervised and or automated machine learning can be used for the purpose of annotation of data, the model will be graded as a classification algorithm. Success will be defined by
The TDA algorithm has better results (in some metric to be determined) than other unsupervised learning algorithms
The clustering/annotation is of sufficient accuracy to be deployed as a business solution. I.e. meets accuracy threshold.
For speed, this will primarily be done in notebook format.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file autoanno-0.0.1.dev1.tar.gz
.
File metadata
- Download URL: autoanno-0.0.1.dev1.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.9 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16a08fe5b97d388ed3158bdea5cff3a8a000d36b1454b297859e64f645567559 |
|
MD5 | 6c3c4325c0377a4b5fa869f4e0ebfb88 |
|
BLAKE2b-256 | 3d79f9f4adf3efa80a363ada54d78d7d4365c885c52e5bce3e9d9f90c0aea925 |
File details
Details for the file autoanno-0.0.1.dev1-py3-none-any.whl
.
File metadata
- Download URL: autoanno-0.0.1.dev1-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.9 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 504c75d827111f58b3d84fd7e88267bff04aa81b5d0023213a1b36058b8399a8 |
|
MD5 | 40ad080678f67a9ac397a1c80159d45b |
|
BLAKE2b-256 | 89ec4523c6be0f0ce2c1600f45fc67f8d0ee526a764c6bc2f532997c881525d6 |