Skip to main content

A simple implimentation of TF-IDF

Project description

TF_pyDF

TF_pyDF is a Python package that provides a module for calculating TF-IDF (Term Frequency-Inverse Document Frequency). It allows you to create a document model and perform various operations such as adding and removing documents, searching for documents based on queries, and computing TF-IDF scores.

Usage Warning

The document contents and search query have to be pre-tokenized into list[str] or in a Tokenizer Iterator. You can use your own tokenizer or use one related to this package - LexiPy, it provides a simple way to tokenize a string of text into tokens.

Installation

You can install TF_pyDF using pip:

pip install tf_pydf

Usage

from tf_pydf import Model

documents = {
    "fruits":
        ["apple", "banana", "orange"],
    "vegetables":
        ["tomato", "cucumber", "radish"],
    "pasta":
        ["tagliatelle", "rotini", "rigatoni"],
    }

# Create a new instance of the document model
model = Model()

# Add documents to the model
for doc_id, doc_content in documents.items():
    model.add_doc(doc_id, doc_content)

# Or use convenience method "from_dict"
model = Model.from_dict(documents)

# Remove a document from the model
doc_id = "pasta"
model.remove_doc(doc_id)

# Check if a document is in the model
if doc_id in model:
    ...

# Search the model for documents matching a query
results = model.search_query(query)

results
>>> [('fruits', 0.10034333188799373), ('vegetables', 0.0)]

Contributing

Contributions to LexiPy are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

This package is licensed under the MIT License.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf_pydf-0.1.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

tf_pydf-0.1.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file tf_pydf-0.1.0.tar.gz.

File metadata

  • Download URL: tf_pydf-0.1.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for tf_pydf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7ba0ee61a8f2b1bb7a1d45914ab18424886156a41d0021d7c6a4fc071d297eee
MD5 9655a31946dca9229e07b304a130b9c9
BLAKE2b-256 29fb28b1d23f2bc4cdadf9409b23ccb4c5a028ba4e5b4369c2196c9f379f33a8

See more details on using hashes here.

File details

Details for the file tf_pydf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tf_pydf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10

File hashes

Hashes for tf_pydf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d64adb72e0fdae6852e155b97fd71a183faabc5c3b6ca61cd343890875a9847d
MD5 2cb1704390d8b78aec585e0e4c370616
BLAKE2b-256 6cc8ee612769231d6aa7b998153a32d361a44d933ffb2b6590952ea3a619022c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page