A text analysis library for relevance and subtheme detection
Project description
TextScope
TextScope is a Python package that helps determine the relevance of a text to predefined profiles of interest and aligns it with specific subthemes. The package is designed to be flexible and configurable via a config.py
file.
Installation
You can install TextScope using pip:
pip install textscope
Configuration
Before using TextScope, define your profiles of interest and subthemes in the config.py file. Example:
THEMES = ['technology', 'AI', 'machine learning', 'software']
SUBTHEMES = {
'1': 'programming',
'2': 'data science'
'3': 'cybersecurity'
}
Relevance Analysis
To determine if a text is relevant to any of the predefined profiles:
from textscope.relevance_analyzer import RelevanceAnalyzer
model_name = 'intfloat/multilingual-e5-large-instruct'
text = "This article discusses the latest advancements in AI and machine learning."
analyzer = RelevanceAnalyzer(model_name)
rel_score = analyzer.analyze(text)
print(rel_score)
Future versions of the package will include a filter_corpus method that is currently under development. NOTE: We support different embedding models, but we highly recommend to use e5.
Subtheme Analysis
To find which subthemes within a profile a text aligns with:
from textscope.subtheme_analyzer import SubthemeAnalyzer
model_name = 'hiiamsid/sentence_similarity_spanish_es'
text = "LA IA es un campo en auge"
analyzer = SubthemeAnalyzer(model_name)
subth_pres = analyzer.analyze_bin(text)
print(subth_pres)
Testing
To run tests for TextScope, use the following command:
pytest tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for textscope-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04587ce1e825363662d054e8ebcbbb133f7968cf5df2b1ea201ffef5ffd04d89 |
|
MD5 | 4ce7a36a10d9a85be5aab00484974ac5 |
|
BLAKE2b-256 | eb0f2d876108386f39a1a4b6e264813c093f7223aa0e1bd80b1541aec5702ebc |