A text analysis library for relevance and subtheme detection
Project description
TextScope
TextScope is a Python package that helps determine the relevance of a text to predefined profiles of interest and aligns it with specific subthemes. The package is designed to be flexible and configurable via a config.py
file.
Installation
You can install TextScope using pip:
pip install textscope
Configuration
Before using TextScope, define your profiles of interest and subthemes in the config.py file. Example:
THEMES = ['technology', 'AI', 'machine learning', 'software']
SUBTHEMES = {
'1': 'programming',
'2': 'data science'
'3': 'cybersecurity'
}
Relevance Analysis
To determine if a text is relevant to any of the predefined profiles:
from textscope.relevance_analyzer import RelevanceAnalyzer
model_name = 'intfloat/multilingual-e5-large-instruct'
text = "This article discusses the latest advancements in AI and machine learning."
analyzer = RelevanceAnalyzer(model_name)
rel_score = analyzer.analyze(text)
print(rel_score)
Future versions of the package will include a filter_corpus method that is currently under development. NOTE: We support different embedding models, but we highly recommend to use e5.
Subtheme Analysis
To find which subthemes within a profile a text aligns with:
from textscope.subtheme_analyzer import SubthemeAnalyzer
model_name = 'hiiamsid/sentence_similarity_spanish_es'
text = "LA IA es un campo en auge"
analyzer = SubthemeAnalyzer(model_name)
subth_pres = analyzer.analyze_bin(text)
print(subth_pres)
Testing
To run tests for TextScope, use the following command:
pytest tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for textscope-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cd117ebb78c2ce68b9ef4fae32c9f9a306eceafd0f86a2a96e51b95059a6de7 |
|
MD5 | 1e653176dd21e22cdbdbbb087b67024a |
|
BLAKE2b-256 | 8f89073f4a73626df209491acb77394d52904bbe5026b68a4322263e921bd799 |