A toolkit to get or remove similar items from the csv file
Project description
CSV-Similarity
Intro
A toolkit to get or remove similar items from the csv file
Example
from csv_similarity.similarity import *
get_similar(
input_path=f'data/list_company_news1.csv',
similarity=0.8,
save_path=f'data/similarity_report1.csv',
# stopwords_path=f'{root_path}/stopwords/stopwords',
stopwords_path='',
analyze_field='title'
)
remove_similar(
similarity_report_path=f'data/similarity_report1.csv',
input_csv_path=f'data/list_company_news1.csv',
output_path=f'data/list_company_news_without_similar.csv',
)
License
The csv-similarity
toolkit is developed by Donghua Chen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csv-similarity-0.0.1a0.tar.gz
(15.8 kB
view hashes)
Built Distribution
Close
Hashes for csv_similarity-0.0.1a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f3169ab663f4ff57d1f1b6662a120d10abe05e0d779f56c485642ed549b58f4 |
|
MD5 | 07934199c869447334221e7c7c113dc4 |
|
BLAKE2b-256 | 7d87b6f2c8d8c18970d788aec4113e15d5b56e6044f743872f765352390c9bcb |