Skip to main content

AutoPX – Automatic NLP Preprocessing with Explainable Reports

Project description

AutoPX — Automatic Preprocessing with eXplainability

AutoPX is an intelligent Python library designed to automatically preprocess raw text data and transform it into model-ready representations while providing complete explainability for every preprocessing decision.

The library eliminates the need for manually writing repetitive preprocessing logic by analyzing the input data, adapting preprocessing rules dynamically, and selecting the most suitable transformation strategy. Unlike traditional preprocessing tools that act as black boxes, AutoPX generates human-readable reports that explain what actions were applied, why they were chosen, and how they impact the final output.


🚀 Features

  • Automatic Language Detection: Support for English, Urdu, Roman Urdu, and more, with language-specific preprocessing rules.
  • Task Inference: Automatically detects the intended NLP task (Sentiment Analysis, Topic Modeling, Chatbot/Dialog) based on text characteristics.
  • Adaptive Text Cleaning: Intelligent lowercasing, symbol handling, emoji/URL preservation, and context-aware normalization.
  • Stopword & Token Management: Decides stopword retention/removal and optimal tokenization automatically.
  • Vectorization & Output Preparation: Chooses between TF-IDF, CountVectorizer, Word2Vec, FastText, or transformer embeddings; handles padding/truncation for ML/DL models.
  • Fail-Safe & Reliability: Detects preprocessing failures and applies fallback strategies transparently.
  • Explainable Report Generation: Generates step-by-step reasoning reports in JSON, Markdown, or PDF formats.
  • Real-Time Adaptive Learning: Designed to improve task inference and preprocessing accuracy over repeated runs.
  • Framework Compatibility: Seamlessly integrates with scikit-learn, TensorFlow, PyTorch, and HuggingFace Transformers.

🛠 Installation

You can install AutoPX directly from source (or via pip in the future):

pip install AutoPX==1.0.1

Or for development:

git clone https://github.com/MudassarGill/AutoPX.git
cd AutoPX
pip install -e .

📖 Usage Example

Preprocessing your text is as simple as one line:

from autopx import AutoPX

# Initialize AutoPX (automatically infers task and language)
auto = AutoPX()

# Multi-lingual raw data
texts = [
    "I absolutely love this product! 😄 Visit http://example.com for more info.",
    "یہ ایک بہترین کتاب ہے!",
    "Main bohat khush hoon today! 😄"
]

# Process data
vectors = auto.fit_transform(texts)

# Generate an explainable report
report_path = auto.report(format="markdown")
print(f"Report generated at: {report_path}")

📁 Folder Structure

AutoPX/
│
├── autopx/                         # Main package
│   ├── core/                       # Core decision-making logic (DataAnalysis, DecisionEngine)
│   ├── preprocessing/              # Text preprocessing (Cleaner, Tokenizer, Stopwords)
│   ├── vectorizers/                # Vectorization strategies (TF-IDF, Count, Embeddings)
│   ├── reports/                    # Explainable reporting system (JSON, Markdown, PDF)
│   ├── fallback/                   # Fail-safe logic
│   ├── utils/                      # Helper utilities, constants, and logging
│   └── config/                     # Configuration management
│
├── examples/                       # Usage examples
├── tests/                          # Unit & integration tests
├── setup.py                        # Package installation script
└── README.md                       # Main documentation

🤝 Contact Information

For any queries, feedback, or collaboration, feel free to reach out:


📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autopx-1.0.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autopx-1.0.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file autopx-1.0.1.tar.gz.

File metadata

  • Download URL: autopx-1.0.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for autopx-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0df47b2b6448605c42991543bace89c6a055e396cc6f54b32d9a18b29cc9cde6
MD5 67420287259a4a6af4b3f39d164e1e0e
BLAKE2b-256 75f1c8f9d7914c978a5ac0ae8487313ee4234d55dc8686e1bc2e74d272da5977

See more details on using hashes here.

File details

Details for the file autopx-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: autopx-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for autopx-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2a119efbddbc24b61781b1a966f4e9a8ead630049e9197a2361c912a01542821
MD5 bd0aaa5c61a6f283ffb3e09d49957503
BLAKE2b-256 79d922701d0ae4a3cf7432eca99f42b61c41c2c603174d90c8614c0aff861ef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page