A multilingual text and voice processing toolkit
Project description
LinguaLab
LinguaLab is a Python toolkit designed for natural language processing and linguistic analysis. It provides a comprehensive set of tools for text processing, language analysis, and linguistic research, making it easier to work with text data in Python applications.
Features
- Text Processing:
- Text cleaning and normalization
- Tokenization and lemmatization
- Stop word removal and filtering
- Language Analysis:
- Syntax analysis
- Morphological analysis
- Semantic analysis
- Corpus Management:
- Corpus creation and management
- Text collection and organization
- Metadata handling
- Statistical Analysis:
- Frequency analysis
- Word distribution analysis
- Text similarity measures
Installation
Prerequisites
Before installing, please ensure the following dependencies are available on your system:
-
Required Third-Party Libraries:
pip install nltk spacy pandas numpy scikit-learn
Or via Anaconda (recommended channel:
conda-forge):conda install -c conda-forge nltk spacy pandas numpy scikit-learn
Installation (from PyPI)
Install the package using pip:
pip install lingualab
Development Installation
For development purposes, you can install the package in editable mode:
git clone https://github.com/yourusername/lingualab.git
cd lingualab
pip install -e .
Usage
Basic Example
from lingualab.processing import TextProcessor
from lingualab.analysis import LanguageAnalyzer
# Process text
processor = TextProcessor("This is a sample text.")
processed_text = processor.clean().tokenize().lemmatize()
# Analyze language
analyzer = LanguageAnalyzer(processed_text)
syntax_tree = analyzer.parse_syntax()
Advanced Example
from lingualab.corpus import CorpusManager
from lingualab.statistics import TextAnalyzer
# Manage corpus
corpus = CorpusManager("my_corpus")
corpus.add_document("doc1.txt", metadata={"author": "John Doe"})
# Analyze text statistics
analyzer = TextAnalyzer(corpus)
word_freq = analyzer.word_frequency()
similarity = analyzer.text_similarity("doc1.txt", "doc2.txt")
Project Structure
The package is organised into several sub-packages:
LinguaLab/
├── processing/
│ ├── text_processor.py
│ └── tokenizer.py
├── analysis/
│ ├── syntax_analyzer.py
│ └── semantic_analyzer.py
├── corpus/
│ ├── manager.py
│ └── collector.py
└── statistics/
├── frequency_analyzer.py
└── similarity_analyzer.py
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Natural Language Processing community
- Open-source contributors
- Linguistic research community
Contact
For any questions or suggestions, please open an issue on GitHub or contact the maintainers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lingualab-3.4.3.tar.gz.
File metadata
- Download URL: lingualab-3.4.3.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17b7d218fa3e84a1c67a8b1ef9bb934e457a50c70283b4fef5c8cbf4890b46f1
|
|
| MD5 |
f5cf3cd43c367f7b58c460319927d7ce
|
|
| BLAKE2b-256 |
ac0ff22bdd22ebbda8237a488d448f2396868c7b6ab56d9d5d246374e37eecef
|
File details
Details for the file lingualab-3.4.3-py3-none-any.whl.
File metadata
- Download URL: lingualab-3.4.3-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a612862aab8e5b238736cf75319753bc4e289ed40fa046144508568bd1f99fd
|
|
| MD5 |
101103adfd55c5ea4a46d5d462051a18
|
|
| BLAKE2b-256 |
ccd9b23c980b91e6e66d15153f359202df4289508f9c8dce03e8699ee128c94d
|