Skip to main content

Multithreading TF-IDF vectorization for similarity search using sparse matrices for computations.

Project description

Threaded-Sparse-TFIDF

Creating a repository for multithreading TF-IDF vectorization for similarity search using sparse matrices for computations.

Usage:

from TF_IDF import TF_IDF_Vectorizer

tf_idf = TF_IDF_Vectorizer(use_cached=True, print_output=False)
_, ranking = tf_idf.get_similarity_score("science fiction super hero movie", num_workers=k)

Performance:

Image:

image

Table:

num_workers time partition_size
1.0 1.1117637634277344 6.778499999999999
2.0 0.8195240020751953 3.4149000000000003
3.0 0.7357232332229614 2.2773
4.0 0.7232689380645752 1.7081
5.0 0.7375946760177612 1.3555999999999997
6.0 0.7682486534118652 1.1307000000000003
7.0 0.7640876531600952 0.9618
8.0 0.7513441801071167 0.8506
9.0 0.7795052766799927 0.7587
10.0 0.8141436100006103 0.6807
11.0 0.8003325223922729 0.6195000000000002
12.0 0.8441393852233887 0.5697
13.0 0.8490614175796509 0.5258000000000002
14.0 0.9322290658950806 0.48739999999999994
15.0 0.8824400186538697 0.45729999999999993

Data

A subset of the Information Retrieval Dataset - Internet Movie Database (IMDB) specifically movies after the year 2007.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Threaded_Sparse_TFIDF-0.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Threaded_Sparse_TFIDF-0.2-py2.py3-none-any.whl (7.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file Threaded_Sparse_TFIDF-0.2.tar.gz.

File metadata

  • Download URL: Threaded_Sparse_TFIDF-0.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.5 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.5

File hashes

Hashes for Threaded_Sparse_TFIDF-0.2.tar.gz
Algorithm Hash digest
SHA256 cf0491f15cb60f8460092e62dbcac699583d9bae154cbf6d74ff9ac6d46367f0
MD5 5f6e987edf34301ddc92c8fee52eb9b1
BLAKE2b-256 2a1e5dbf77455132525d214cb71ec44550cd3546188e3a163c06c47fea4ea21d

See more details on using hashes here.

File details

Details for the file Threaded_Sparse_TFIDF-0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: Threaded_Sparse_TFIDF-0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.5 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.5

File hashes

Hashes for Threaded_Sparse_TFIDF-0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b509f44fa51a97eae25b1974e46c488ec8a3b02bb71d8799d452a63c606e0478
MD5 9db27031a4122526ca8fab0b841942cd
BLAKE2b-256 b23e7e80051362febe646470602249448278add08129c0876c8dc36718dcf56c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page