A script to scan HTML documents for forbidden phrases stored in a CSV.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

About Docscanner

Docscanner allows you to streamline your documentation workflow by detecting words or phrases listed in a .csv file in an inputted .html file. If any words/phrases are detected, Docscanner returns both the word/phrase as well as the line number that it is located on. This helps eliminate human error when applying style guide content standards in a documentation project.

The script is case insensitive and is also capable of finding duplicate words/phrases on the same line in the .html file.

Installing Docscanner

If you haven't already, download Python here.

NOTE: If installing Python for the first time, make sure you select the Add Python 3.7 to PATH checkbox.

Install Docscanner by running the following in your Command Prompt terminal:
pip install docscanner

Using Docscanner

Start Docscanner by running the following in your Command Prompt terminal:
docscanner
You will be prompted with the following:
Use default csv file of forbidden phrases? (respond Y or N):
Choose whether to use the default .csv file or to add a custom path:
- To use the default csv file, enter "Y". The script will jump to the next argument where you input the .html file path.
- To use a custom csv file, enter "N". You will be prompted with the following:
  Enter path of your custom csv file:
  Copy the path from File Explorer by holding <SHIFT>, right-clicking your desired file, and selecting Copy as path.
  
  NOTE: The path address is automatically stripped of unnecessary characters such as quotations marks or spaces. You do not have to format your file paths after pasting them.
After choosing to use either the default or custom .csv file, you will be prompted with the following:
Enter path of html file:
Copy the path from File Explorer by holding <SHIFT>, right-clicking your desired file, and selecting Copy as path.

Docscanner will return whatever words/phrases it found in your .html file along with the line numbers on which they were found.

Formatting Data

Ensure that your .csv file has no header columns and that each individual word/phrase occupies a single row within the first column.

Example .CSV file:

word_1
phrase 1
word_2

Changing the Default File

Locate the root directory storing docscanner.py.
Open the data folder.
Delete or alter the existing .csv file and save your changes.
Open the src folder.
Change the file names in the get_forbidden_phrases_path function to match the name of your new default file in the data folder. Alternately, you can edit the default .csv file directly.
Save your changes.

Troubleshooting

If Docscanner is not running properly:

Ensure that all file paths are correct.
Verify you are using a Command Prompt terminal, not PowerShell.
Check whether the docscanner directory structure on your workstation matches what is on GitHub.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.5

Jul 22, 2024

0.4

Jul 22, 2024

0.3

Jul 22, 2024

0.2

Jul 22, 2024

0.1

Jul 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_scanner-0.5.tar.gz (5.7 kB view details)

Uploaded Jul 22, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

doc_scanner-0.5-py3-none-any.whl (5.8 kB view details)

Uploaded Jul 22, 2024 Python 3

File details

Details for the file doc_scanner-0.5.tar.gz.

File metadata

Download URL: doc_scanner-0.5.tar.gz
Upload date: Jul 22, 2024
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for doc_scanner-0.5.tar.gz
Algorithm	Hash digest
SHA256	`554bde9781d68bebbaa91600ace63c10f8ff72c96c4cb42c5471ddc0e46d7bc7`
MD5	`b08adb03bfa8aaf2b2e7385acfbb089b`
BLAKE2b-256	`6f6608629f3d8230f48e2c1a052954abdef8a413015a88b26ffa9529182234b6`

See more details on using hashes here.

File details

Details for the file doc_scanner-0.5-py3-none-any.whl.

File metadata

Download URL: doc_scanner-0.5-py3-none-any.whl
Upload date: Jul 22, 2024
Size: 5.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for doc_scanner-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c745477d4382f62145d787df9ffd404fba9b7e29eb2f6f5c9790fa9fb6305aa`
MD5	`b5a946eca03c9fd6a30ebe0a3662e9bf`
BLAKE2b-256	`941a432f01c6c932d8fb695e5b9c12022373ef453e9b7806e33641fe8bc102d2`

See more details on using hashes here.

doc-scanner 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About Docscanner

Installing Docscanner

Using Docscanner

Formatting Data

Example .CSV file:

Changing the Default File

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes