A Python text analysis library for relevance and subtheme detection
Project description
TextScope
TextScope is a Python package that helps determine the relevance of a text to predefined profiles of interest and aligns it with specific subthemes. The package is designed to be flexible and configurable via a config.py
file.
Installation
You can install TextScope using pip:
pip install textscope
Configuration
Before using TextScope, define your profiles of interest and subthemes in the config.py file. Example:
THEMES = ['technology', 'AI', 'machine learning', 'software']
SUBTHEMES = {
'1': 'programming',
'2': 'data science'
'3': 'cybersecurity'
}
Relevance Analysis
To determine if a text is relevant to any of the predefined profiles:
from textscope.relevance_analyzer import RelevanceAnalyzer
model_name = 'intfloat/multilingual-e5-large-instruct'
text = "This article discusses the latest advancements in AI and machine learning."
analyzer = RelevanceAnalyzer(model_name)
rel_score = analyzer.analyze(text)
print(rel_score)
Future versions of the package will include a filter_corpus method that is currently under development. NOTE: We support different embedding models, but we highly recommend to use e5.
Subtheme Analysis
To find which subthemes within a profile a text aligns with:
from textscope.subtheme_analyzer import SubthemeAnalyzer
model_name = 'hiiamsid/sentence_similarity_spanish_es'
text = "LA IA es un campo en auge"
analyzer = SubthemeAnalyzer(model_name)
subth_pres = analyzer.analyze_bin(text)
print(subth_pres)
Testing
To run tests for TextScope, use the following command:
pytest tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for textscope-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b68fce789e8d9c8e6add7cf21e78de5cf60596382be302cdb2c5e3241204f2a |
|
MD5 | c0da30bf5f113538f765efdd6f2e9f28 |
|
BLAKE2b-256 | 59ae6b48864385e46d3bf5d6eeff2c5d4aa3504c4834dcb18019ad97a9b88a0e |