Skip to main content

Track and measure constructs, concepts or categories in text documents.

Project description

Build codecov Ruff

PyPI Python Version License

construct-tracker

Track and measure constructs, concepts or categories in text documents. Built on top of the litellm package to use most Generative AI models.

If you use, please cite: Low DM, Rankin O, Coppersmith DDL, Bentley KH, Nock MK, Ghosh SS (2024). Building lexicons with generative AI result in lightweight and interpretable text models with high content validity. arXiv.

Installation

pip install construct-tracker

Quick usage

Open in Google Colab

Create a lexicon: keywords prototypically associated to a construct

We want to know if these documents contain mentions of certain construct "insight"

documents = [
 	"Every time I speak with my cousin Bob, I have great moments of clarity and wisdom", # mention of insight
 	"He meditates a lot, but he's not super smart" # related to mindfulness, only somewhat related to insight
	"He is too competitive"] #not very related

Choose model here and obtain an API key from that provider. Cohere offers a free trial API key, 5 requests per minute. I'm going to choose GPT-4o:

os.environ["OPENAI_API_KEY"]  = 'YOUR_OPENAI_API_KEY'
gpt4o = "gpt-4o-2024-05-13"

Two lines of code to create a lexicon

l = lexicon.Lexicon()         # Initialize lexicon
l.add('Insight', section = 'tokens', value = 'create', source = gpt4o)

See results:

print(l.constructs['Insight']['tokens'])
['acuity', 'acumen', 'analysis', 'apprehension', 'awareness', 'clarity', 'comprehension', 'contemplation', 'depth', 'discernment', 'enlightenment', 'epiphany', 'foresight', 'grasp', 'illumination', 'insightfulness', 'interpretation', 'introspection', 'intuition', 'meditation', 'perception', 'perceptiveness', 'perspicacity', 'profoundness', 'realization', 'recognition', 'reflection', 'revelation', 'shrewdness', 'thoughtfulness', 'understanding', 'vision', 'wisdom']

We'll repeat for other constructs ("Mindfulness", "Compassion"). Now count whether tokens appear in document:

feature_vectors, matches_counter_d, matches_per_doc, matches_per_construct  = lexicon.extract(
	documents,
	l.constructs,
	normalize = False)

display(feature_vectors)
Lexicon counts

This traditional approach is perfectly interpretable. The first document contains three matches related to insight. Let's see which ones with highlight_matches():

lexicon.highlight_matches(documents, 'Insight', matches_construct2doc, max_matches = 1)
Highlight matches



We provide many features to add/remove tokens, generate definitions, validate with human ratings, and much more (see tutorials/construct_tracker.ipynb) Open in Google Colab


Suicide Risk Lexicon

Lexicon is available in multiple formats:

  • https://github.com/danielmlow/construct-tracker/blob/main/src/construct_tracker/data/lexicons/suicide_risk_lexicon_v1-0/suicide_risk_lexicon_validated_24-08-02T21-27-35.csv
  • https://github.com/danielmlow/construct-tracker/blob/main/src/construct_tracker/data/lexicons/suicide_risk_lexicon_v1-0/suicide_risk_lexicon_validated_24-08-02T21-27-35.json

Or you can load lexicon object from the pickle file to extract features from new document.

Open in Google Colab

We have created a lexicon with 49 risk factors for suicidal thoughts and behaviors validated by clinicians who are experts in suicide research.

from construct_tracker import lexicon
# Load lexicon
srl = lexicon.load_lexicon(name = 'srl_v1-0')
# Load only tokens that are highly prototypical of each construct
srl_prototypes = lexicon.load_lexicon(name = 'srl_prototypes_v1-0')

Structure of the lexicon.Lexicon() object

# Save general info on the lexicon
my_lexicon = lexicon.Lexicon()			# Initialize lexicon
my_lexicon.name = 'Insight'		# Set lexicon name
my_lexicon.description = 'Insight lexicon with constructs related to insight, mindfulness, and compassion'
my_lexicon.creator = 'DML' 				# your name or initials for transparency in logging who made changes
my_lexicon.version = '1.0'				# Set version. Over time, others may modify your lexicon, so good to keep track. MAJOR.MINOR. (e.g., MAJOR: new constructs or big changes to a construct, Minor: small changes to a construct)

# Each construct is a dict. You can save a lot of metadata depending on what you provide for each construct, for instance:
print(my_lexicon.constructs)
{
 'Insight': {
	'variable_name': 'insight', # a name that is not sensitive to case with no spaces
	'prompt_name': 'insight',
	'domain': 'psychology', 	 # to guide Gen AI model as to sense of the construct (depression has different senses in psychology, geology, and economics)
	'examples': ['clarity', 'enlightenment', 'wise'], # to guide Gen AI model
	'definition': "the clarity of understanding of one's thoughts, feelings and behavior", # can be used in prompt and/or human validation
	'definition_references': 'Grant, A. M., Franklin, J., & Langford, P. (2002). The self-reflection and insight scale: A new measure of private self-consciousness. Social Behavior and Personality: an international journal, 30(8), 821-835.',
	'tokens': ['acknowledgment',
	'acuity',
	'acumen',
	'analytical',
	'astute',
	'awareness',
	'clarity',
	...],
	'tokens_lemmatized': [], # when counting you can lemmatize all tokens for better results
	'remove': [], #which tokens to remove
	'tokens_metadata': {'gpt-4o-2024-05-13, temperature-0, ...': {
								'action': 'create',
								'tokens': [...],
								'prompt': 'Provide many single words and some short phrases ...',
								'time_elapsed': 14.21},
						{'gpt-4o-2024-05-13, temperature-1, ...': { ... }},
						}
	},
'Mindfulness': {...},
'Compassion': {...},
}

Contributing

See docs/contributing.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

construct_tracker-1.0.9.tar.gz (12.6 MB view details)

Uploaded Source

Built Distribution

construct_tracker-1.0.9-py3-none-any.whl (12.7 MB view details)

Uploaded Python 3

File details

Details for the file construct_tracker-1.0.9.tar.gz.

File metadata

  • Download URL: construct_tracker-1.0.9.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1025-azure

File hashes

Hashes for construct_tracker-1.0.9.tar.gz
Algorithm Hash digest
SHA256 fd93ab95c620649bf27f108e50a74a68fd50179245c6d2ac6762c5ce7126b1b3
MD5 acfe3d0d0512a8d67c432553c145b978
BLAKE2b-256 168a3be88a6fd1bb15b2d1d8cc1cefaecf3d705b5a57d6222ebf7a5dc6091054

See more details on using hashes here.

File details

Details for the file construct_tracker-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: construct_tracker-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 12.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1025-azure

File hashes

Hashes for construct_tracker-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b929c2b2c30cbb3239634de9676f2896514dbd62ce942482647fd69babc3afd3
MD5 4b00a3a98eeab6bb1c45165eeeb48ef8
BLAKE2b-256 d9f9864068618ca2dc1ca9e0641e50a36891afcbde26f7f0ae7cb39aa1268aaa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page