A package for performing TF-IDF transformation on text data
Project description
TFIDF Transformer Afiniti
A package for performing TF-IDF transformation on text data. This project developing for an assessment:
Create a framework that does tf-idf transformation, you can use sklearn's tfidf function. Keep the following in mind
- Handle Edge Cases : what happens when new text data arrives
- Create Unit Tests : check failure scenarios
- Add Docstrings : assume you will hand this code to some other SWE
- Obey Engineering Best Practices
- Use necessary inheritances
Create a (pypi) package out of this framework.
Installation
Use the package manager pip to install.
pip install tfidf-transformer-afiniti
Usage
from tfidf_transformer_afiniti.main import TfidfFramework
framework = TfidfFramework()
# Append some data to the data list
data = ["This is the first document.", "This document is the second document.", "And this is the third one."]
for d in data:
framework.append_data(d)
# Print the tf-idf matrix
print(framework.tfidf_matrix.toarray())
# Add new one
new_data = "this is a new test document"
framework.append_data(new_data)
# Print the tf-idf matrix
print(framework.tfidf_matrix.toarray())
# Add new list
new_list_data = ["And this is the realy fifth one.","And this is the finaly sixth one."]
framework.append_list_data(new_list_data)
# Print the tf-idf matrix
print(framework.tfidf_matrix.toarray())
Usage
python -m unittest tfidf_transformer_afiniti/tfidf_test.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file tfidf_transformer_afiniti-0.4-py3-none-any.whl
.
File metadata
- Download URL: tfidf_transformer_afiniti-0.4-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f7d0b6a31bb80dbde99473a236fc58144b1b80c35c2f9d977e943e464369319 |
|
MD5 | b2ad2512deeaeb9220ac361d30415090 |
|
BLAKE2b-256 | 15eae7909c869da6ba6924528b554d62433d95a15aed11f24c6f7331af0a956b |