Lexicogrammatical tagging and tag counting tool

These details have not been verified by PyPI

Project links

Homepage

Project description

Lexicogrammatical Tagger (LxGrTgr)

Note that LxGrTgr is currently being beta tested and should not be used in research. Once the beta testing concludes, this message will change.

Quick Start Guide

LxGrTgr was developed using Spacy (version 3.5; en_core_web_trf model). Users will need to follow the instructions on Spacy's website to download Spacy for your specific system and the en_core_web_trf model.

Once you have Spacy installed and have dowloaded the en_core_web_trf model, you can use LxGrTgr. To install LxGrTgr, use pip:

pip install lxgrtgr

Demo site

In addition to using the code below, a demo web app (which uses a faster but slightly less accurate NLP backend) is also available.

Import LxGrTgr

First, import LxGrTgr:

import lxgrtgr as lxgr

Tag Strings and Print Output

Then, strings can be tagged and printed:

sample1 = lxgr.tag("This is a very important opportunity that only comes once in a lifetime.")
lxgr.printer(sample1)

#sentid = 0
0 This this None
1 is be None
2 a a None
3 very very rb+jjrbmod
4 important important attr+npremod
5 opportunity opportunity None
6 that that None
7 only only rb+advl
8 comes come finitecls+rel
9 once once rb+advl
10 in in None
11 a a None
12 lifetime lifetime None
13 . . None

These commands can also be combined for efficiency's sake:

lxgr.printer(lxgr.tag("This is a very important opportunity that only comes once in a lifetime."))

Write Output to File

Output can also be written to a file:

lxgr.writer("sample_results/sample1.tsv",sample1)
sample2 = lxgr.tag("I like pizza. I also enjoy eating it because it gives me a reason to drink beer.")
lxgr.writer("sample_results/sample2.tsv",sample2)

Batch Processing Corpora

Corpora come in all shapes and sizes. By default LxGrTgr presumes that each corpus file is represented as a UTF-8 text file and that all corpus files are in the same folder/directory.

Step 1: Tag Corpus Files

To tag a corpus with LxGrTgr, simply use the tagFolder() function.

tagFolder(targetDir,outputDir,suff = ".txt")

targetDir is the folder/directory where your corpus files are. outputDir is the folder where the tagged versions of your corpus files will be written.

An additional optional argument (suff) can also be used. By default, suff = ".txt". If your corpus filenames end in something other than ".txt", be sure to include the suff argument with the correct filename ending.

lxgr.tagFolder("folderWithCorpusFiles/","folderWhereTaggedVersionsWillBeWritten/")

Step 2: Check and Edit Tagged Corpus Files

Next, tagging should be checked and edited as appropriate.

Step 3: Counting Tags

After checking and editing the tags in your corpus, it is time to get tag counts for each document in your corpus using the countTagsFolder() function.

countTagsFolder(targetDir,tagList = None,suff = ".txt")

By default, complexity tags are counted. The countTagsFolder() function returns a dictionary with filenames as keys and feature counts as values.

sampleCountDictionary = lxgr.countTagsFolder("folderWhereTaggedVersionsWereWritten/")

Step 4: Writing Tag Counts to a File

The writeCounts() function can be used to write the results to a file. By default, counts are normed as the incidence per 10,000 words, though this can be changed using the norming argument. Raw counts can be obtained by including normed = False.

writeCounts(outputD,outName, tagList = None, sep = "\t", normed = True,norming = 10000)

If the default options are desired, the writeCounts() function only needs two arguments - a dictionary of filenames and index counts and a filename for the spreadsheet file:

lxgr.writeCounts(sampleCountDictionary,"sampleOutputFile.txt")

Future Directions

Add more functions for random sampling and tag-fixing.

Tag Descriptions

We are currently developing tag descriptions and detailed annotation guidelines for complexity features. Click here to access the document (updated/revised weekly)

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.5.77

Jun 13, 2025

0.5.76

Jun 12, 2025

0.5.75

Jun 10, 2025

0.5.72

May 20, 2025

This version

0.5.71

May 20, 2025

0.5.70

May 20, 2025

0.5.69

May 19, 2025

0.5.68

May 17, 2025

0.5.67

May 13, 2025

0.5.66

May 13, 2025

0.5.65.5

Apr 15, 2025

0.5.65.4

Apr 15, 2025

0.5.65.2

Apr 8, 2025

0.5.65.1

Mar 22, 2025

0.5.65

Mar 22, 2025

0.5.63

Feb 13, 2025

0.5.59

Jan 17, 2025

0.5.46

Oct 29, 2024

0.5.41.1

Oct 17, 2024

0.5.41

Oct 17, 2024

0.5.33

Sep 4, 2024

0.5.32

Sep 4, 2024

0.5.31

Sep 3, 2024

0.5.30

Sep 3, 2024

0.5.29

Sep 2, 2024

0.5.28

Sep 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxgrtgr-0.5.71.tar.gz (34.7 kB view details)

Uploaded May 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lxgrtgr-0.5.71-py3-none-any.whl (32.6 kB view details)

Uploaded May 20, 2025 Python 3

File details

Details for the file lxgrtgr-0.5.71.tar.gz.

File metadata

Download URL: lxgrtgr-0.5.71.tar.gz
Upload date: May 20, 2025
Size: 34.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for lxgrtgr-0.5.71.tar.gz
Algorithm	Hash digest
SHA256	`3d5ae9bc74c963109f76f1cd54aadef9f5d46408fa26db95e430d00062e1e187`
MD5	`323f05f74623b2d5d752e5c055695ae3`
BLAKE2b-256	`46438f3f9e43530d13e8e3420e93a395566e919512c9b0d9a9c6c4a2df59331e`

See more details on using hashes here.

File details

Details for the file lxgrtgr-0.5.71-py3-none-any.whl.

File metadata

Download URL: lxgrtgr-0.5.71-py3-none-any.whl
Upload date: May 20, 2025
Size: 32.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for lxgrtgr-0.5.71-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5671f4fc6635672d96aa564dd6b629ee5caa443b942642cf4aab5eac8ffa27fe`
MD5	`d5bef64e6aaf72bb2990e59f2886c9ef`
BLAKE2b-256	`c52367c1ef66e69acec1d07716f651aa70e5fe9a49704600fd2d1814398a40bc`

See more details on using hashes here.

lxgrtgr 0.5.71

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Lexicogrammatical Tagger (LxGrTgr)

Quick Start Guide

Demo site

Import LxGrTgr

Tag Strings and Print Output

Write Output to File

Batch Processing Corpora

Step 1: Tag Corpus Files

Step 2: Check and Edit Tagged Corpus Files

Step 3: Counting Tags

Step 4: Writing Tag Counts to a File

Future Directions

Tag Descriptions

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes