A simple implimentation of TF-IDF
Project description
TF_pyDF
TF_pyDF is a Python package that provides a module for calculating TF-IDF (Term Frequency-Inverse Document Frequency). It allows you to create a document model and perform various operations such as adding and removing documents, searching for documents based on queries, and computing TF-IDF scores.
Usage Warning
The document contents and search query have to be pre-tokenized into list[str] or in a Tokenizer Iterator. You can use your own tokenizer or use one related to this package - LexiPy, it provides a simple way to tokenize a string of text into tokens.
Installation
You can install TF_pyDF using pip:
pip install tf_pydf
Usage
from tf_pydf import Model
documents = {
"fruits":
["apple", "banana", "orange"],
"vegetables":
["tomato", "cucumber", "radish"],
"pasta":
["tagliatelle", "rotini", "rigatoni"],
}
# Create a new instance of the document model
model = Model()
# Add documents to the model
for doc_id, doc_content in documents.items():
model.add_doc(doc_id, doc_content)
# Or use convenience method "from_dict"
model = Model.from_dict(documents)
# Remove a document from the model
doc_id = "pasta"
model.remove_doc(doc_id)
# Check if a document is in the model
if doc_id in model:
...
# Search the model for documents matching a query
results = model.search_query(query)
results
>>> [('fruits', 0.10034333188799373), ('vegetables', 0.0)]
Contributing
Contributions to LexiPy are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
License
This package is licensed under the MIT License.
References
- Github Repository
- PyPI package
- LexiPy - tokenizer
- TF-IDF - wikipedia
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tf_pydf-0.1.0.tar.gz
.
File metadata
- Download URL: tf_pydf-0.1.0.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ba0ee61a8f2b1bb7a1d45914ab18424886156a41d0021d7c6a4fc071d297eee |
|
MD5 | 9655a31946dca9229e07b304a130b9c9 |
|
BLAKE2b-256 | 29fb28b1d23f2bc4cdadf9409b23ccb4c5a028ba4e5b4369c2196c9f379f33a8 |
File details
Details for the file tf_pydf-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: tf_pydf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d64adb72e0fdae6852e155b97fd71a183faabc5c3b6ca61cd343890875a9847d |
|
MD5 | 2cb1704390d8b78aec585e0e4c370616 |
|
BLAKE2b-256 | 6cc8ee612769231d6aa7b998153a32d361a44d933ffb2b6590952ea3a619022c |