Track and measure constructs, concepts or categories in text documents.

These details have not been verified by PyPI

Project links

Project description

construct-tracker

Track and measure constructs, concepts or categories in text documents. Build interpretable lexicon models quickly by using LLMs. Built on top of the OpenRouterAI package so you can use most Generative AI models.

Why build lexicons?

They can be used to build models that are:

interpretable: understand why the model outputs a given score, which can help avoid biases and guarantee the model will detect certain phrases (important for high-risk scenarios to use in tandem with LLMs)
lightweight: no GPU needed (unlike LLMs)
private and free: you can run on your local computer instead of submitting to a cloud API (OpenAI) which may not be secure
have high content validity: measure what you actually want to measure (unlike existing lexicons or models that measure something only slightly related)

If you use, please cite

Low DM, Rankin O, Coppersmith DDL, Bentley KH, Nock MK, Ghosh SS (2024). Using Generative AI to create lexicons for interpretable text models with high content validity. PsyarXiv.

Installation

pip install construct-tracker

Measure 49 suicide risk factors in text data

Tutorial

We have created a lexicon with 49 risk factors for suicidal thoughts and behaviors (plus one construct for kinship) validated by clinicians who are experts in suicide research.

from construct_tracker import lexicon

srl = lexicon.load_lexicon(name = 'srl_v1-0') # Load lexicon

documents = [
	"I've been thinking about ending it all. I've been cutting. I just don't want to wake up.",
	"I've been feeling all alone. No one cares about me. I've been hospitalized multiple times. I just want out. I'm pretty hopeless"
             ]

# Extract
counts, matches_by_construct, matches_doc2construct, matches_construct2doc = srl.extract(documents, normalize = False)

counts

You can also access the Suicide Risk Lexicon in csv and json formats:

Create your own lexicon using generative AI

Create a lexicon: keywords prototypically associated to a construct

We want to know if these documents contain mentions of certain construct "insight"

documents = [
 	"Every time I speak with my cousin Bob, I have great moments of clarity and wisdom", # mention of insight
 	"He meditates a lot, but he's not super smart" # related to mindfulness, only somewhat related to insight
	"He is too competitive"] #not very related

Choose model here and obtain an API key from that provider. Cohere offers a free trial API key, 5 requests per minute. I'm going to choose GPT-4o:

os.environ["api_key"]  = 'YOUR_OPENAI_API_KEY' # This one might work for free models if no submissions have been tested:  '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'
model = 'gpt-4o'

Two lines of code to create a lexicon

l = lexicon.Lexicon()         # Initialize lexicon
l.add('Insight', section = 'tokens', value = 'create', source = model)

See results:

print(l.constructs['Insight']['tokens'])

['acuity', 'acumen', 'analysis', 'apprehension', 'awareness', 'clarity', 'comprehension', 'contemplation', 'depth', 'discernment', 'enlightenment', 'epiphany', 'foresight', 'grasp', 'illumination', 'insightfulness', 'interpretation', 'introspection', 'intuition', 'meditation', 'perception', 'perceptiveness', 'perspicacity', 'profoundness', 'realization', 'recognition', 'reflection', 'revelation', 'shrewdness', 'thoughtfulness', 'understanding', 'vision', 'wisdom']

We'll repeat for other constructs ("Mindfulness", "Compassion"). Now count whether tokens appear in document:

feature_vectors, matches_counter_d, matches_per_doc, matches_per_construct  = lexicon.extract(
	documents,
	l.constructs,
	normalize = False)

display(feature_vectors)

This traditional approach is perfectly interpretable. The first document contains three matches related to insight. Let's see which ones with highlight_matches():

lexicon.highlight_matches(documents, 'Insight', matches_construct2doc, max_matches = 1)

We provide many features to add/remove tokens, generate definitions, validate with human ratings, and much more (see tutorials/construct_tracker.ipynb)

Structure of the `lexicon.Lexicon()` object

# Save general info on the lexicon
my_lexicon = lexicon.Lexicon()			# Initialize lexicon
my_lexicon.name = 'Insight'		# Set lexicon name
my_lexicon.description = 'Insight lexicon with constructs related to insight, mindfulness, and compassion'
my_lexicon.creator = 'DML' 				# your name or initials for transparency in logging who made changes
my_lexicon.version = '1.0'				# Set version. Over time, others may modify your lexicon, so good to keep track. MAJOR.MINOR. (e.g., MAJOR: new constructs or big changes to a construct, Minor: small changes to a construct)

# Each construct is a dict. You can save a lot of metadata depending on what you provide for each construct, for instance:
print(my_lexicon.constructs)
{
 'Insight': {
	'variable_name': 'insight', # a name that is not sensitive to case with no spaces
	'prompt_name': 'insight',
	'domain': 'psychology', 	 # to guide Gen AI model as to sense of the construct (depression has different senses in psychology, geology, and economics)
	'examples': ['clarity', 'enlightenment', 'wise'], # to guide Gen AI model
	'definition': "the clarity of understanding of one's thoughts, feelings and behavior", # can be used in prompt and/or human validation
	'definition_references': 'Grant, A. M., Franklin, J., & Langford, P. (2002). The self-reflection and insight scale: A new measure of private self-consciousness. Social Behavior and Personality: an international journal, 30(8), 821-835.',
	'tokens': ['acknowledgment',
	'acuity',
	'acumen',
	'analytical',
	'astute',
	'awareness',
	'clarity',
	...],
	'tokens_lemmatized': [], # when counting you can lemmatize all tokens for better results
	'remove': [], #which tokens to remove
	'tokens_metadata': {'gpt-4o-2024-05-13, temperature-0, ...': {
								'action': 'create',
								'tokens': [...],
								'prompt': 'Provide many single words and some short phrases ...',
								'time_elapsed': 14.21},
						{'gpt-4o-2024-05-13, temperature-1, ...': { ... }},
						}
	},
'Mindfulness': {...},
'Compassion': {...},
}

Contributing

See docs/contributing.md

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.16

May 6, 2025

1.0.15

Apr 23, 2025

1.0.14

Jan 21, 2025

1.0.13

Jan 21, 2025

1.0.12

Jan 21, 2025

1.0.11

Jan 21, 2025

1.0.10

Jan 21, 2025

1.0.9

Sep 17, 2024

1.0.7

Sep 16, 2024

1.0.0

Sep 16, 2024

1.0.0b0 pre-release

Sep 3, 2024

0.0.6

Sep 16, 2024

0.0.5

Sep 16, 2024

0.0.4

Sep 16, 2024

0.0.2

Jun 14, 2024

0.0.1

Jun 14, 2024

0.0.0

Sep 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

construct_tracker-1.0.16.tar.gz (12.6 MB view details)

Uploaded May 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

construct_tracker-1.0.16-py3-none-any.whl (12.7 MB view details)

Uploaded May 6, 2025 Python 3

File details

Details for the file construct_tracker-1.0.16.tar.gz.

File metadata

Download URL: construct_tracker-1.0.16.tar.gz
Upload date: May 6, 2025
Size: 12.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.12.4 Darwin/23.6.0

File hashes

Hashes for construct_tracker-1.0.16.tar.gz
Algorithm	Hash digest
SHA256	`6b11c691b5a3aca5691c27244c12447eba4ce1863d55c16b442cf833d944dcf4`
MD5	`c1cb0747e05f79c9113f25a5b9c3802b`
BLAKE2b-256	`05c805cf5b2e54dfd3474d365fa93ea12cd167565a0dd351b8b790a9da7e60fa`

See more details on using hashes here.

File details

Details for the file construct_tracker-1.0.16-py3-none-any.whl.

File metadata

Download URL: construct_tracker-1.0.16-py3-none-any.whl
Upload date: May 6, 2025
Size: 12.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.12.4 Darwin/23.6.0

File hashes

Hashes for construct_tracker-1.0.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be23277ca17daa8a3af5a2ec73af4e47fb90d0ec75085c4d3a0f75c4f2241437`
MD5	`4a6e2e235f8f7fe24e74621955e12538`
BLAKE2b-256	`33032be7bf0c25d6a60dc39612475f5511543bb800f784cf532caae00cd6da93`

See more details on using hashes here.

construct-tracker 1.0.16

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

construct-tracker

Why build lexicons?

If you use, please cite

Installation

Measure 49 suicide risk factors in text data

Create your own lexicon using generative AI

Create a lexicon: keywords prototypically associated to a construct

Structure of the `lexicon.Lexicon()` object

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

construct-tracker 1.0.16

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

construct-tracker

Why build lexicons?

If you use, please cite

Installation

Measure 49 suicide risk factors in text data

Create your own lexicon using generative AI

Create a lexicon: keywords prototypically associated to a construct

Structure of the lexicon.Lexicon() object

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Structure of the `lexicon.Lexicon()` object