Skip to main content

A toolkit to get or remove similar items from the csv file

Project description

CSV-Similarity

Intro

A toolkit to get or remove similar items from the csv file

Example

from csv_similarity.similarity import *

get_similar(
    input_path=f'data/list_company_news1.csv',
    similarity=0.8,
    save_path=f'data/similarity_report1.csv',
    # stopwords_path=f'{root_path}/stopwords/stopwords',
    stopwords_path='',
    analyze_field='title'
)

remove_similar(
    similarity_report_path=f'data/similarity_report1.csv',
    input_csv_path=f'data/list_company_news1.csv',
    output_path=f'data/list_company_news_without_similar.csv',
)

License

The csv-similarity toolkit is developed by Donghua Chen.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv-similarity-0.0.1a0.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

csv_similarity-0.0.1a0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file csv-similarity-0.0.1a0.tar.gz.

File metadata

  • Download URL: csv-similarity-0.0.1a0.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for csv-similarity-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 246e47b43adc751cb1aa96b3392180e0073dafff4ed03b03e7cc2156f1ea65b4
MD5 a700d2b54c825aa714549717d7ccd950
BLAKE2b-256 c12fab4d17096b94abfacc9e31241fbe83600a90ca755390383cd48d3f9a1a18

See more details on using hashes here.

File details

Details for the file csv_similarity-0.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: csv_similarity-0.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for csv_similarity-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f3169ab663f4ff57d1f1b6662a120d10abe05e0d779f56c485642ed549b58f4
MD5 07934199c869447334221e7c7c113dc4
BLAKE2b-256 7d87b6f2c8d8c18970d788aec4113e15d5b56e6044f743872f765352390c9bcb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page