A toolkit to get or remove similar items from the csv file
Project description
CSV-Similarity
Intro
A toolkit to get or remove similar items from the csv file
Example
from csv_similarity.similarity import *
get_similar(
input_path=f'data/list_company_news1.csv',
similarity=0.8,
save_path=f'data/similarity_report1.csv',
# stopwords_path=f'{root_path}/stopwords/stopwords',
stopwords_path='',
analyze_field='title'
)
remove_similar(
similarity_report_path=f'data/similarity_report1.csv',
input_csv_path=f'data/list_company_news1.csv',
output_path=f'data/list_company_news_without_similar.csv',
)
License
The csv-similarity
toolkit is developed by Donghua Chen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csv-similarity-0.0.1a0.tar.gz
(15.8 kB
view details)
Built Distribution
File details
Details for the file csv-similarity-0.0.1a0.tar.gz
.
File metadata
- Download URL: csv-similarity-0.0.1a0.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 246e47b43adc751cb1aa96b3392180e0073dafff4ed03b03e7cc2156f1ea65b4 |
|
MD5 | a700d2b54c825aa714549717d7ccd950 |
|
BLAKE2b-256 | c12fab4d17096b94abfacc9e31241fbe83600a90ca755390383cd48d3f9a1a18 |
File details
Details for the file csv_similarity-0.0.1a0-py3-none-any.whl
.
File metadata
- Download URL: csv_similarity-0.0.1a0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f3169ab663f4ff57d1f1b6662a120d10abe05e0d779f56c485642ed549b58f4 |
|
MD5 | 07934199c869447334221e7c7c113dc4 |
|
BLAKE2b-256 | 7d87b6f2c8d8c18970d788aec4113e15d5b56e6044f743872f765352390c9bcb |