Skip to main content

A Python text analysis library for relevance and subtheme detection

Project description

TextScope 📖🔍

PyPI - Downloads

TextScope is a Python package that helps determine the relevance of a text to predefined profiles of interest and aligns it with specific subthemes. The package is designed to be flexible and configurable via a config.yaml file. Features:

  • Multilingual support 🗣️
  • Powered by Transformers techonology 🤖
  • Easily customizable for other profiles of interest ⚙️🔧

Installation

You can install TextScope using pip (🐍):

pip install textscope

Configuration

Default Configuration

By default, TextScope looks for a configuration file included in the package. You can create your own configuration file if you need to customize the profiles and subthemes. The default config.yaml included with the package provides the following profiles:

  • gambling 🎲 in Spanish. Subthemes are DSM-V questionnaire items.
  • mental_health 🧠 in English. Subthemes are BDI-II questionnaire items.
  • climat 🌦️ in French. Subthemes related to climate change phenomena.

You can find more info in the textscope/data/config.yaml file.

Custom Configuration

If you want to use a custom configuration file, you can specify the file path in your code:

from textscope.config_loader import load_config

# Load custom configuration
load_config('path/to/custom_config.yaml')

Example of custom config.yaml:

PROFILES:
    ai: ['technology', 'AI', 'machine learning', 'software']

SUBTHEMES:
    ai: ['Natural Language Processing', 'Transfomer-based architecture', 'Computer Vision and multimodality']

Relevance Analysis

To determine if a text is relevant to any of the predefined profiles. One possible application of this method would be to filter out texts that are not highly relevant to the topic. Future versions of the TextScope will include a filter_corpus method that will remove the out-of-scope texts from a corpus (currently under development).

Default Configuration

from textscope.relevance_analyzer import RelevanceAnalyzer

text = "La adicción al juego es una enfermedad, pero es la única enfermedad que te puede hacer rico. La artritis no te va a hacer ganar un centavo"
profile = 'gambling'

analyzer = RelevanceAnalyzer()
rel_score = analyzer.analyze(text, profile)
print(rel_score)  ## it will return a high score of relevance for the profile (> 86.)

Custom Configuration

from textscope.config_loader import load_config
from textscope.relevance_analyzer import RelevanceAnalyzer

load_config('path/to/custom_config.yaml')

# Realizar un análisis de relevancia
text = "Transformers based architecture is the sota in NLP."
profile = 'ai'

relevance_analyzer = RelevanceAnalyzer()
is_relevant = relevance_analyzer.analyze(text, profile)
print(rel_score)  ## it will return a high score of relevance for the profile (> 86.)

Subtheme Analysis

This class allows to test whether a text discuss or not the subthemes defined in the profile.

Default Configuration

from textscope.subtheme_analyzer import SubthemeAnalyzer

text = 'Perdía el raciocinio apostando cantidades cada vez mayores para sentir estímulos más intensos. He mentido a mi familia.'
profile = 'gambling'

analyzer = SubthemeAnalyzer()
subth_pres = analyzer.analyze_bin(text, profile) # default threshold set to 86.
print(subth_pres)  # For this sentence and subthemes it should output [0, 1, 0, 0, 0, 1, 0, 0, 0]

Custom Configuration

from textscope.config_loader import load_config
from textscope.subtheme_analyzer import SubthemeAnalyzer

load_config('path/to/custom_config.yaml')
text = "Transformer-based architecture is the state-of-the-art in NLP."
profile = 'ai'

analyzer = SubthemeAnalyzer()
subth_pres = analyzer.analyze_bin(text, profile) # default threshold set to 86.
print(subth_pres)  # For this sentence and subthemes it should output [1,1,0]

If we do not want a binary output, we also provide a method that outputs the similarity:

from textscope.subtheme_analyzer import SubthemeAnalyzer

text = 'Perdía el raciocinio apostando cantidades cada vez mayores para sentir estímulos más intensos. He mentido a mi familia.'
profile = 'gambling'

analyzer = SubthemeAnalyzer()
subth_scoring = analyzer.analyze(text, profile) # default threshold set to 86.
print(subth_scoring)  # For this sentence and subthemes it should output [82.50125885009766, 87.5889663696289, 82.89108276367188, 81.27981567382812, 84.01229095458984, 86.728271484375, 82.63910675048828, 82.18984985351562, 82.15728759765625]

Testing

To run tests for TextScope, use the following command:

pytest -s tests/

Collaborate

This is an under development project, PR are welcome and feel free to contact me at marcosfernandez.pichel@usc.es.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textscope-0.1.7.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textscope-0.1.7-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file textscope-0.1.7.tar.gz.

File metadata

  • Download URL: textscope-0.1.7.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for textscope-0.1.7.tar.gz
Algorithm Hash digest
SHA256 a558056764a139fb5eac01e5963a34581836514cbd1b8c9d75bd5d97e8073934
MD5 920d1cc8dbdceb8a30a906467da141c1
BLAKE2b-256 227ce7b0e25e0b00811b22ad2505cdd9a343bbf66d82e16a639f88ab8a3ff5ab

See more details on using hashes here.

File details

Details for the file textscope-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: textscope-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for textscope-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 85f91b7005b75af0b9112c26d8517b5c22f4c55ed5233f653f2b26777bde6e7e
MD5 bbb460ac22960fc33d3d1cd76fc2d5df
BLAKE2b-256 ebf2212421ca2384b3c476fac7b3ebc455b21ebdc55b2e404f0aa8f511b7246e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page