A script to scan HTML documents for forbidden phrases stored in a CSV.
Project description
About Docscanner
Docscanner allows you to streamline your documentation workflow by detecting words or phrases listed in a .csv file in an inputted .html file. If any words/phrases are detected, Docscanner returns both the word/phrase as well as the line number that it is located on. This helps eliminate human error when applying style guide content standards in a documentation project.
The script is case insensitive and is also capable of finding duplicate words/phrases on the same line in the .html file.
Installing Docscanner
If you haven't already, download Python here.
NOTE: If installing Python for the first time, make sure you select the Add Python 3.7 to PATH checkbox.
Install Docscanner by running the following in your Command Prompt terminal:
pip install docscanner
Using Docscanner
-
Start Docscanner by running the following in your Command Prompt terminal:
docscanner
You will be prompted with the following:
Use default csv file of forbidden phrases? (respond Y or N): -
Choose whether to use the default .csv file or to add a custom path:
- To use the default csv file, enter "Y". The script will jump to the next argument where you input the .html file path.
- To use a custom csv file, enter "N". You will be prompted with the following:
Enter path of your custom csv file:
Copy the path from File Explorer by holding <SHIFT>, right-clicking your desired file, and selecting Copy as path.NOTE: The path address is automatically stripped of unnecessary characters such as quotations marks or spaces. You do not have to format your file paths after pasting them.
-
After choosing to use either the default or custom .csv file, you will be prompted with the following:
Enter path of html file:
Copy the path from File Explorer by holding <SHIFT>, right-clicking your desired file, and selecting Copy as path.
Docscanner will return whatever words/phrases it found in your .html file along with the line numbers on which they were found.
Formatting Data
Ensure that your .csv file has no header columns and that each individual word/phrase occupies a single row within the first column.
Example .CSV file:
word_1
phrase 1
word_2
Changing the Default File
- Locate the root directory storing docscanner.py.
- Open the data folder.
- Delete or alter the existing .csv file and save your changes.
- Open the src folder.
- Change the file names in the
get_forbidden_phrases_pathfunction to match the name of your new default file in the data folder. Alternately, you can edit the default .csv file directly. - Save your changes.
Troubleshooting
If Docscanner is not running properly:
- Ensure that all file paths are correct.
- Verify you are using a Command Prompt terminal, not PowerShell.
- Check whether the
docscannerdirectory structure on your workstation matches what is on GitHub.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doc_scanner-0.5.tar.gz.
File metadata
- Download URL: doc_scanner-0.5.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
554bde9781d68bebbaa91600ace63c10f8ff72c96c4cb42c5471ddc0e46d7bc7
|
|
| MD5 |
b08adb03bfa8aaf2b2e7385acfbb089b
|
|
| BLAKE2b-256 |
6f6608629f3d8230f48e2c1a052954abdef8a413015a88b26ffa9529182234b6
|
File details
Details for the file doc_scanner-0.5-py3-none-any.whl.
File metadata
- Download URL: doc_scanner-0.5-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c745477d4382f62145d787df9ffd404fba9b7e29eb2f6f5c9790fa9fb6305aa
|
|
| MD5 |
b5a946eca03c9fd6a30ebe0a3662e9bf
|
|
| BLAKE2b-256 |
941a432f01c6c932d8fb695e5b9c12022373ef453e9b7806e33641fe8bc102d2
|