Skip to main content

NLP Processor & SBERT Training Tool

Project description

Python License Last Commit Stars

🧠 Vatrix

Vatrix is a NLP log processor, rendering natural language descriptions from machine data, and serves several use cases:

  • streaming NLP & vector embedding
  • batch NDJSON file processing
  • augmented data injection
  • generating training pairs for fine-tuning Sentence Transformers (SBERT)

✨ Features

  • CLI-powered NDJSON log processing
  • Modular template system powered by Jinja2
  • SBERT data generation and similarity scoring
  • Supports file mode, stream mode, and CLI flags
  • Exports training pairs to CSV
  • Exports highly similar sentence pairs for SBERT fine-tuning
  • Flexible and colorful logging with log rotation
  • Direct integration with Qdrant vector database (OSAI-Demo Stack)
  • Unit & integration testing

📦 Installation

pip install vatrix

Or install the latest from source:

git clone https://github.com/brianbatesactual/vatrix.git
cd vatrix
make setup

🛠️ Usage

vatrix --mode file \
       --render-mode all \
       --input data/input_logs.json \
       --output data/processed_logs.csv \
       --unmatched data/unmatched_logs.json \
       --generate-sbert-data \
       --log-level DEBUG \
       --log-file logs/vatrix_debug.log

Makefile Commands

make setup         # Create venv and install dependencies
make run           # Run log processor on default file
make stream        # Start reading NDJSON from stdin
make retrain       # Export SBERT sentence pairs
make freeze        # Regenerate requirements.txt
make clean         # Clean environment and build artifacts
make nuke          # Full reset of the project environment

🧠 Example


🧪 Testing

make test

📁 Logs

All logs are saved to the logs/ directory with daily rotation.


🧼 Cleanup

make clean    # Clean temp data
make nuke     # Wipe and rebuild virtualenv

📚 License

MIT © Brian Bates

Built with ❤️ for log intelligibility and NLP adventures.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vatrix-0.2.1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vatrix-0.2.1-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file vatrix-0.2.1.tar.gz.

File metadata

  • Download URL: vatrix-0.2.1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.21

File hashes

Hashes for vatrix-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9228dc4bbd539d47b5146981fc12573971dc4bc5c67baf242e539c1633511e19
MD5 04d6f8d9acfe1e96115c10d53f8a026d
BLAKE2b-256 4d6ef9e0248e52d6d0f083da01613471b830d7d4a9b5971c52e795f25f52f5b5

See more details on using hashes here.

File details

Details for the file vatrix-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: vatrix-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.21

File hashes

Hashes for vatrix-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf0b05fd25a8e86f475f5c3ff90e8aa25d282e4b3e8969d58a0618c263fd74b1
MD5 54506ca1406c99a81e74bee7fa1bd2bb
BLAKE2b-256 50c5414dd1457f6bd4af3a0ce075f2abcf60515e7d75877b2cf070304257ccf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page