Skip to main content

Valide the integrity of UTF-8 text files based on language-specific character sets.

Project description

Text Integrity Inspector

ci codecov

Text Integrity Inspector

The Text Integrity Inspector package provides a tool for validating the integrity of UTF-8 text files based on language-specific character sets.

Tnstallation

pip install textIntegrityInspector

Usage

In Commande line

textIntegrityInspector path_to_inspect_dir --extensions py txt 

# usage 
textIntegrityInspector --help

This will validate all files in the current directory with the .py or .txt extension.

In python script

from textIntegrityInspector.validator import TextIntegrityChar

validator = TextIntegrityChar()
validator.validate_directory(
    root=".",
    extensions=["py"],
    exclude_dirs=[],
    exclude_files=[],
    language="fr",
    additional_chars="",
)

This will validate all files in the current directory with the .py extension.

Configuration

By default, textIntegrityInspector looks for the configuration file .textIntegrityInspector.[yaml|toml] in the current directory. The file format is as follows.

YAML format

roots:
  - dir1
  - dir2
extensions:
  - txt
  - md
exclude-dirs:
  - tests
  - '**/temp*'
exclude-files:
  - example.txt
language: fr
additional-chars: 'ü,ö,ß'
verbose: true

It is possible to specify a different configuration file with the --config-file option. The TOML format is also supported.

TOML format

roots = ["dir1", "dir2"]
extensions = ["txt", "md"]
exclude-dirs = ["tests", "**/temp*"]
exclude-files = ["example.txt"]
language = "fr"
additional-chars = "ü,ö,ß"
verbose = true
silence = false

[!IMPORTANT] The paths passed as arguments replace the roots list in the configuration file, while the other options are combined.

Docker

Usage

docker run -it -v path_to_inspect_dir:/data text_integrity_inspector --extensions py txt 

Gitlab-ci integration

GitLab CI

check-utf-8:
 image: 
   name: text_integrity_inspector
   entrypoint: [""]
 stage: test
 script:
 - textIntegrityInspector . --extensions py txt json conf  --language fr

Contributing

Contributions are welcome! Please open a pull request or issue if you have any feedback or suggestions.

License

The Text Integrity Inspector package is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textintegrityinspector-0.1.5.tar.gz (216.9 kB view details)

Uploaded Source

Built Distribution

textIntegrityInspector-0.1.5-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file textintegrityinspector-0.1.5.tar.gz.

File metadata

  • Download URL: textintegrityinspector-0.1.5.tar.gz
  • Upload date:
  • Size: 216.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for textintegrityinspector-0.1.5.tar.gz
Algorithm Hash digest
SHA256 6815f1518c72e7b7ddd814630cc6453c1a06abc8ebd45d651535dd25c00f06ab
MD5 5187bdf3ff1de3d51919f094366d5cb3
BLAKE2b-256 9d6933d8ef2fb54d3c2a5b3e328c45c13765e8123326f2c8e7b6c97900a2a62d

See more details on using hashes here.

File details

Details for the file textIntegrityInspector-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for textIntegrityInspector-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3493e6a6b04a9dca29dcb0efe34bdd4fca0e03734310826b4476fb8c03fb5176
MD5 4f7ef2c87484d8fe44ad637878bd6a0c
BLAKE2b-256 09acb90aaa7aa37008b60c3e5454b05c576687d935c02f39d31c0843272f2581

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page