Valide the integrity of UTF-8 text files based on language-specific character sets.
Project description
Text Integrity Inspector
The Text Integrity Inspector package provides a tool for validating the integrity of UTF-8 text files based on language-specific character sets.
Tnstallation
pip install textIntegrityInspector
Usage
In Commande line
textIntegrityInspector path_to_inspect_dir --extensions py txt
# usage
textIntegrityInspector --help
This will validate all files in the current directory with the .py
or .txt
extension.
In python script
from textIntegrityInspector.validator import TextIntegrityChar
validator = TextIntegrityChar()
validator.validate_directory(
root=".",
extensions=["py"],
exclude_dirs=[],
exclude_files=[],
language="fr",
additional_chars="",
)
This will validate all files in the current directory with the .py
extension.
Configuration
By default, textIntegrityInspector
looks for the configuration file .textIntegrityInspector.[yaml|toml]
in the current directory. The file format is as follows.
YAML format
roots:
- dir1
- dir2
extensions:
- txt
- md
exclude-dirs:
- tests
- '**/temp*'
exclude-files:
- example.txt
language: fr
additional-chars: 'ü,ö,ß'
verbose: true
It is possible to specify a different configuration file with the --config-file option. The TOML format is also supported.
TOML format
roots = ["dir1", "dir2"]
extensions = ["txt", "md"]
exclude-dirs = ["tests", "**/temp*"]
exclude-files = ["example.txt"]
language = "fr"
additional-chars = "ü,ö,ß"
verbose = true
silence = false
[!IMPORTANT] The paths passed as arguments replace the
roots
list in the configuration file, while the other options are combined.
Docker
Usage
docker run -it -v path_to_inspect_dir:/data text_integrity_inspector --extensions py txt
Gitlab-ci integration
check-utf-8:
image:
name: text_integrity_inspector
entrypoint: [""]
stage: test
script:
- textIntegrityInspector . --extensions py txt json conf --language fr
Contributing
Contributions are welcome! Please open a pull request or issue if you have any feedback or suggestions.
License
The Text Integrity Inspector package is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file textintegrityinspector-0.1.5.tar.gz
.
File metadata
- Download URL: textintegrityinspector-0.1.5.tar.gz
- Upload date:
- Size: 216.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6815f1518c72e7b7ddd814630cc6453c1a06abc8ebd45d651535dd25c00f06ab |
|
MD5 | 5187bdf3ff1de3d51919f094366d5cb3 |
|
BLAKE2b-256 | 9d6933d8ef2fb54d3c2a5b3e328c45c13765e8123326f2c8e7b6c97900a2a62d |
File details
Details for the file textIntegrityInspector-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: textIntegrityInspector-0.1.5-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3493e6a6b04a9dca29dcb0efe34bdd4fca0e03734310826b4476fb8c03fb5176 |
|
MD5 | 4f7ef2c87484d8fe44ad637878bd6a0c |
|
BLAKE2b-256 | 09acb90aaa7aa37008b60c3e5454b05c576687d935c02f39d31c0843272f2581 |