Skip to main content

Removes rows containing blacklisted words from a CSV file.

Project description

CSV Cleaner is an Apache 2.0 licensed Python library that removes rows containing blacklisted words from a CSV file.

Instructions

`python >>> import csvcleaner >>> f = csvcleaner.CSVCleaner() >>> f.run('/path/to/file.csv') `

When run is called, CSV Cleaner will loop through each row within the CSV file and search for blacklisted words.

When a row is rejected because it contains a blacklisted word, it’s moved to a [name]-rejected.csv file. Accepted rows are moved to a [name]-accepted.csv file. Both files are saved in the same directory as the original CSV file.

Installation

To install CSV Cleaner, simply run:

`bash $ pip install csvcleaner `

Parmateres

CSVCleaner accepts several parameters:

`python >>> import csvcleaner >>> f = csvcleaner(blacklist=[], replace_chars=[], configure=True, lowercase=True, strict=False) `

#### blacklist

A list of characters or words that are used to determine if a row is rejected.

Default: [] (unless configure is True)

#### replace_chars

A list of words or characters that are replaced by a space in order to make word detection more accurate and effective.

Default: [] (unless configure is True)

#### configure

When True, CSV Cleaner will use recommended lists for blacklist and replace_chars. These recommended lists will only be used if blacklist and replace_chars are ommitted during class instantiation or contain an empty list. Set to False if you intend to supply custom lists for blacklist and replace_chars.

Default: True.

#### lowercase

When True, all characters and strings will be converted to lowercase for more accurate word detection. When a row is inserted into [name]-accepted.csv or [name]-rejected.csv, its original case remains. Set to False if case matching is important.

Default: True.

#### strict

When True, rows that may contain (e.g., fuzzy matches) blacklisted words or characters are rejected.

Default: False.

Blacklist

CSV Cleaner includes a blacklist that’s used when configure is True and blacklist is left empty. This blacklist is maintained by [Shutterstock](https://github.com/shutterstock/) on [Github](https://github.com/shutterstock/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvcleaner-1.0.6.tar.gz (12.4 kB view details)

Uploaded Source

File details

Details for the file csvcleaner-1.0.6.tar.gz.

File metadata

  • Download URL: csvcleaner-1.0.6.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for csvcleaner-1.0.6.tar.gz
Algorithm Hash digest
SHA256 bf7467a2c7af2316341ebb5f4cfabb235d69d37498032326922245da93693330
MD5 41c5d0746c8e12722e95ecfcbcd3a891
BLAKE2b-256 d039a3576c74dc4160f0225ce858751d4905a98d9f3230f444042a002484d54f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page