Skip to main content

A tool for working with text data

Project description

LANCETNIC 2.0.2

PyPI Package Version PyPi status Downloads Downloads MIT License

LANCETNIC is a library with built-in neural network models for working with text and numeric data. Lancetnic provides convenient tools for:

  • Data preparation and vectorization
  • Training classification models
  • Visualization of metrics
  • Forecasting on new data

The library allows you to work with purely textual data, as well as with a combination of textual and numerical features, trend and price analysis. Usage examples: text classification, identification of spam, fraudulent messages, working with numerical series and time signs.

🚀 Installing:

Install with CUDA

To work with the GPU, it is recommended to install PyTorch with CUDA support (OPTIONAL):

pip install torch==2.5.1+cu124 torchaudio==2.5.1+cu124 torchvision==0.20.1+cu124 --index-url https://download.pytorch.org/whl/cu124

Then install lancetnic:

pip install lancetnic

👥 Autors

📄 Documentation

Документация на русском

Documentation in English

Quick start

Text classification example

from lancetnic.models import LancetMC
from lancetnic import TextClass

text_model = TextClass(
            text_column='description',  # Column name containing text data
            label_column='category',    # Column name containing labels
            split_ratio=0.2,            # Train/validation split ratio (if no val_path)
            random_state=42             # Random seed for reproducibility
            )

text_model.train(model_name=LancetMC,   # Model architecture for text classification
                train_path="train.csv", # Path to training data (CSV format)
                val_path="val.csv",     # Path to validation data (None for auto-split)
                num_epochs=50,          # Total training epochs
                hidden_size=256,        # Size of hidden layers
                num_layers=1,           # Number of hidden layers
                batch_size=256,         # Batch size for training
                learning_rate=0.001,    # Learning rate for optimizer
                dropout=0,              # Dropout rate (0-1)
                optim_name='Adam',      # Optimizer ('Adam', 'SGD', 'RAdam', etc.)
                crit_name='CELoss'      # Loss function ('CELoss' or 'BCELoss')
                )
           

Making predictions

from lancetnic import TextClass

text_model = TextClass()
text_pred = text_model.predict(
                model_path="model.pth", # Path to saved model
                text="Sample text to classify" # Text input for prediction
                )

Combined text and numeric features example

from lancetnic.models import LancetMC
from lancetnic import TextScalarClass

mixed_model = TextScalarClass(
                text_column='description',  # Text column name (None if only numeric)
                data_column=['feat1', 'feat2'], # List of numeric feature columns
                label_column='target',     # Target variable column
                split_ratio=0.2,            # Train/val split ratio
                random_state=42             # Random seed
                )

mixed_model.train(model_name=LancetMC,   # Model architecture for text classification
                train_path="train.csv", # Path to training data (CSV format)
                val_path="val.csv",     # Path to validation data (None for auto-split)
                num_epochs=50,          # Total training epochs
                hidden_size=256,        # Size of hidden layers
                num_layers=1,           # Number of hidden layers
                batch_size=256,         # Batch size for training
                learning_rate=0.001,    # Learning rate for optimizer
                dropout=0,              # Dropout rate (0-1)
                optim_name='Adam',      # Optimizer ('Adam', 'SGD', 'RAdam', etc.)
                crit_name='CELoss'      # Loss function ('CELoss' or 'BCELoss')
                )

Making predictions

from lancetnic import TextScalarClass

mixed_model = TextScalarClass()
mixed_pred = mixed_model.predict(
                model_path="mixed_model.pth", # Path to saved model
                text="Product description",  # Text input (None if only numeric)
                numeric=[0.5, 1.2]            # Numeric features as list
                )

There are two classes of basic models in LANCETNIC:

  • LancetMC
from lancetnic.models import LancetMC
  • LancetMCA
from lancetnic.models import LancetMC
Key Differences Between Models LancetMC LancetMCA
Feature
Core Architecture Basic LSTM LSTM + Attention
Complexity Lower Higher
Computational Cost Less resource-intensive More resource-intensive
Best For Pure text classification Mixed data or complex patterns
Interpretability Standard Provides attention weights
Sequence Handling Good Excellent for long sequences

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lancetnic-2.0.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lancetnic-2.0.2-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file lancetnic-2.0.2.tar.gz.

File metadata

  • Download URL: lancetnic-2.0.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.9

File hashes

Hashes for lancetnic-2.0.2.tar.gz
Algorithm Hash digest
SHA256 c095c1a2d505ed888db9961cc70df3f70ea441b7ff2e524eda37249bf6f8b7df
MD5 a680a020bcbb063c639da1c3dd813111
BLAKE2b-256 5280b490c44632ffaf5d5458e6059981c6b0ab9b1070b9e8a4cc96d62211f8b1

See more details on using hashes here.

File details

Details for the file lancetnic-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: lancetnic-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.9

File hashes

Hashes for lancetnic-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ee50f62bcd76e133f79c0d1abf06dbde0ff8969322802b948d84480014d4335
MD5 c043eefaa2b1fa9e924ec743302b8946
BLAKE2b-256 065eeae6286351b253435b4e34019dc4e1ae5821d7010cb8da8eb9750370a07c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page