Removes rows containing blacklisted words from a CSV file.
Project description
CSV Cleaner is an Apache 2.0 licensed Python library that removes rows containing blacklisted words from a CSV file.
Instructions
`python >>> import csvcleaner >>> f = csvcleaner.CSVCleaner() >>> f.run('/path/to/file.csv') `
When run is called, CSV Cleaner will loop through each row within the CSV file and search for blacklisted words.
When a row is rejected because it contains a blacklisted word, it’s moved to a [name]-rejected.csv file. Accepted rows are moved to a [name]-accepted.csv file. Both files are saved in the same directory as the original CSV file.
Installation
To install CSV Cleaner, simply run:
`bash $ pip install csvcleaner `
Parmateres
CSVCleaner accepts several parameters:
`python >>> import csvcleaner >>> f = csvcleaner(blacklist=[], replace_chars=[], configure=True, lowercase=True, strict=False) `
#### blacklist
A list of characters or words that are used to determine if a row is rejected.
Default: [] (unless configure is True)
#### replace_chars
A list of words or characters that are replaced by a space in order to make word detection more accurate and effective.
Default: [] (unless configure is True)
#### configure
When True, CSV Cleaner will use recommended lists for blacklist and replace_chars. These recommended lists will only be used if blacklist and replace_chars are ommitted during class instantiation or contain an empty list. Set to False if you intend to supply custom lists for blacklist and replace_chars.
Default: True.
#### lowercase
When True, all characters and strings will be converted to lowercase for more accurate word detection. When a row is inserted into [name]-accepted.csv or [name]-rejected.csv, its original case remains. Set to False if case matching is important.
Default: True.
#### strict
When True, rows that may contain (e.g., fuzzy matches) blacklisted words or characters are rejected.
Default: False.
Blacklist
CSV Cleaner includes a blacklist that’s used when configure is True and blacklist is left empty. This blacklist is maintained by [Shutterstock](https://github.com/shutterstock/) on [Github](https://github.com/shutterstock/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file csvcleaner-1.0.6.tar.gz
.
File metadata
- Download URL: csvcleaner-1.0.6.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf7467a2c7af2316341ebb5f4cfabb235d69d37498032326922245da93693330 |
|
MD5 | 41c5d0746c8e12722e95ecfcbcd3a891 |
|
BLAKE2b-256 | d039a3576c74dc4160f0225ce858751d4905a98d9f3230f444042a002484d54f |