Skip to main content

Investigation into using AutoML and Topological Data Analysis for Automated Annotation

Project description

Create a MVP as fast as possible. Goal is to investigate whether TDA can be used to create an unsupervised algorithm for automatically producing annotations and labels for relatively simple data. Key features include:

  • Find datasets which can be used to for classification and or clustering purposes [X] I am going to use Taskmaster 2 data set.

  • Clean and explore the data

  • Wrangle the data to get into a form compatible for tools such as word2vec or deepwalk the former most likely accessed via gensim or pytorch

  • Use these tools to encode data as vectors.

  • Use simple models as baselines for classification and clustering. (KNN, other algs)

  • Apply Topological data analysis on these vectors to produce clusters (ToMATo algorithm maybe a custom algorithm)

  • Compare TDA results with baselines using statistical/error analysis. Could use external clustering based metrics or more familiar metrics like F-measure.

As the main goal here is to investigate whether unsupervised and or automated machine learning can be used for the purpose of annotation of data, the model will be graded as a classification algorithm. Success will be defined by

  1. The TDA algorithm has better results (in some metric to be determined) than other unsupervised learning algorithms

  2. The clustering/annotation is of sufficient accuracy to be deployed as a business solution. I.e. meets accuracy threshold.

For speed, this will primarily be done in notebook format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoanno-0.0.1.dev1.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

autoanno-0.0.1.dev1-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file autoanno-0.0.1.dev1.tar.gz.

File metadata

  • Download URL: autoanno-0.0.1.dev1.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.9 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.6

File hashes

Hashes for autoanno-0.0.1.dev1.tar.gz
Algorithm Hash digest
SHA256 16a08fe5b97d388ed3158bdea5cff3a8a000d36b1454b297859e64f645567559
MD5 6c3c4325c0377a4b5fa869f4e0ebfb88
BLAKE2b-256 3d79f9f4adf3efa80a363ada54d78d7d4365c885c52e5bce3e9d9f90c0aea925

See more details on using hashes here.

File details

Details for the file autoanno-0.0.1.dev1-py3-none-any.whl.

File metadata

  • Download URL: autoanno-0.0.1.dev1-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.9 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.6

File hashes

Hashes for autoanno-0.0.1.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 504c75d827111f58b3d84fd7e88267bff04aa81b5d0023213a1b36058b8399a8
MD5 40ad080678f67a9ac397a1c80159d45b
BLAKE2b-256 89ec4523c6be0f0ce2c1600f45fc67f8d0ee526a764c6bc2f532997c881525d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page