Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.
Project description
posextract
posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See our article for more. You can also download posextract for pypi with pip.
Usage
extract_triplesto extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triplesextract_adj_noun_pairsto extract adjective-noun pairsextract_subj_verb_pairsto extract subject-verb pairs
Required Paramters:
inputcan be the name of a csv file or an input stringoutputname of the output file
Optional Paramters:
--data_columnspecify the column to extract triples from.--id_columnspecify a unique ID field if csv file is given.--file-delimiterspecify comma, pipe, or tab. Default is comma.--post-combine-adjcombine triples (adjective predicate with object)--add-auxiliaryextract future and past tense triples.--prep-phraseextract the . Default set to false.--no-compound-nounExtract just the subject or object (e.g. "Indian Government" is extracted as just "Government").--lemmaspecify whether to lemmatize parts-of-speech. Default is non-lemmatized.--verboseprint
Examples
Interactive:
Extract grammatical triples.
from posextract import grammatical_triples
triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])
for triple in triples:
print(triple)
# Output: Landlords exercise oppression, soldiers were ill
Extract grammatical triples using different options from default:
from posextract.util import TripleExtractorOptions
triples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))
Or extract adjectives and the nouns they modify.
from posextract import adj_noun_pairs
adj_noun = adj_noun_pairs.extract()
Or extract subjects and their verbs.
from posextract import subj_verb_pairs
subj_verb = subj_verb_pairs.extract()
Over CLI:
posextract can extract grammatical triples from text:
python -m posextract.extract_triples "Landlords may exercise oppression." output.csv
# Output: Landlords exercise oppression
posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj
# Output: soldiers were terminally, soldiers were ill
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj
# Output: soldiers were terminally ill
If provided a .csv file:
python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv
For More Information...
... see our Wiki:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file posextract-1.2.3.tar.gz.
File metadata
- Download URL: posextract-1.2.3.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59a2e4bdef272be81d524a24003c2b6d66a1fd561b3f2f4df0b017b0676fbaf4
|
|
| MD5 |
4c0779817c764003a18f60f7d5157885
|
|
| BLAKE2b-256 |
bb8d94accbd509c2dca90b15d8a103cc8f0ca9c4f0f9d6cc5f5b6991bbbbfe88
|
File details
Details for the file posextract-1.2.3-py3-none-any.whl.
File metadata
- Download URL: posextract-1.2.3-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cf42f61c3edfe2f711b3b59ce45baaa818dd1530574e08d893ddae080057dc2
|
|
| MD5 |
dc461540644f51ce448afb4c76ffec9b
|
|
| BLAKE2b-256 |
50dcc05708e6398efc456caf4a872df084c59eac3fc66d053d6f08aac2974df8
|