Skip to main content

A multilingual text and voice processing toolkit

Project description

LinguaLab

Python Version License PyPI Version

LinguaLab is a Python toolkit designed for natural language processing and linguistic analysis. It provides a comprehensive set of tools for text processing, language analysis, and linguistic research, making it easier to work with text data in Python applications.

Features

  • Text Processing:
    • Text cleaning and normalization
    • Tokenization and lemmatization
    • Stop word removal and filtering
  • Language Analysis:
    • Syntax analysis
    • Morphological analysis
    • Semantic analysis
  • Corpus Management:
    • Corpus creation and management
    • Text collection and organization
    • Metadata handling
  • Statistical Analysis:
    • Frequency analysis
    • Word distribution analysis
    • Text similarity measures

Installation

Prerequisites

Before installing, please ensure the following dependencies are available on your system:

  • Required Third-Party Libraries:

    pip install nltk spacy pandas numpy scikit-learn
    

    Or via Anaconda (recommended channel: conda-forge):

    conda install -c conda-forge nltk spacy pandas numpy scikit-learn
    

Installation (from PyPI)

Install the package using pip:

pip install lingualab

Development Installation

For development purposes, you can install the package in editable mode:

git clone https://github.com/yourusername/lingualab.git
cd lingualab
pip install -e .

Usage

Basic Example

from lingualab.processing import TextProcessor
from lingualab.analysis import LanguageAnalyzer

# Process text
processor = TextProcessor("This is a sample text.")
processed_text = processor.clean().tokenize().lemmatize()

# Analyze language
analyzer = LanguageAnalyzer(processed_text)
syntax_tree = analyzer.parse_syntax()

Advanced Example

from lingualab.corpus import CorpusManager
from lingualab.statistics import TextAnalyzer

# Manage corpus
corpus = CorpusManager("my_corpus")
corpus.add_document("doc1.txt", metadata={"author": "John Doe"})

# Analyze text statistics
analyzer = TextAnalyzer(corpus)
word_freq = analyzer.word_frequency()
similarity = analyzer.text_similarity("doc1.txt", "doc2.txt")

Project Structure

The package is organised into several sub-packages:

LinguaLab/
├── processing/
│   ├── text_processor.py
│   └── tokenizer.py
├── analysis/
│   ├── syntax_analyzer.py
│   └── semantic_analyzer.py
├── corpus/
│   ├── manager.py
│   └── collector.py
└── statistics/
    ├── frequency_analyzer.py
    └── similarity_analyzer.py

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Natural Language Processing community
  • Open-source contributors
  • Linguistic research community

Contact

For any questions or suggestions, please open an issue on GitHub or contact the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lingualab-3.4.3.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lingualab-3.4.3-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file lingualab-3.4.3.tar.gz.

File metadata

  • Download URL: lingualab-3.4.3.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for lingualab-3.4.3.tar.gz
Algorithm Hash digest
SHA256 17b7d218fa3e84a1c67a8b1ef9bb934e457a50c70283b4fef5c8cbf4890b46f1
MD5 f5cf3cd43c367f7b58c460319927d7ce
BLAKE2b-256 ac0ff22bdd22ebbda8237a488d448f2396868c7b6ab56d9d5d246374e37eecef

See more details on using hashes here.

File details

Details for the file lingualab-3.4.3-py3-none-any.whl.

File metadata

  • Download URL: lingualab-3.4.3-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for lingualab-3.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4a612862aab8e5b238736cf75319753bc4e289ed40fa046144508568bd1f99fd
MD5 101103adfd55c5ea4a46d5d462051a18
BLAKE2b-256 ccd9b23c980b91e6e66d15153f359202df4289508f9c8dce03e8699ee128c94d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page