Skip to main content

Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.

Project description

posextract

posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See our article for more. You can also download posextract for pypi with pip.

Usage

  • extract_triples to extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triples
  • extract_adj_noun_pairs to extract adjective-noun pairs
  • extract_subj_verb_pairs to extract subject-verb pairs

Required Paramters:

  • input can be the name of a csv file or an input string
  • output name of the output file

Optional Paramters:

  • --data_column specify the column to extract triples from.
  • --id_column specify a unique ID field if csv file is given.
  • --file-delimiter specify comma, pipe, or tab. Default is comma.
  • --post-combine-adj combine triples (adjective predicate with object)
  • --add-auxiliary extract future and past tense triples.
  • --prep-phrase extract the . Default set to false.
  • --no-compound-noun Extract just the subject or object (e.g. "Indian Government" is extracted as just "Government").
  • --lemma specify whether to lemmatize parts-of-speech. Default is non-lemmatized.
  • --verbose print

Examples

Interactive:

Extract grammatical triples.

from posextract import grammatical_triples

triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])

for triple in triples:
    print(triple)

# Output: Landlords exercise oppression, soldiers were ill

Extract grammatical triples using different options from default:

from posextract.util import TripleExtractorOptions

triples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))

Or extract adjectives and the nouns they modify.

from posextract import adj_noun_pairs

adj_noun = adj_noun_pairs.extract()

Or extract subjects and their verbs.

from posextract import subj_verb_pairs

subj_verb = subj_verb_pairs.extract()

Over CLI:

posextract can extract grammatical triples from text:

python -m posextract.extract_triples "Landlords may exercise oppression." output.csv

# Output: Landlords exercise oppression

posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:

python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally, soldiers were ill 
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally ill

If provided a .csv file:

python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv

For More Information...

... see our Wiki:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

posextract-1.2.3.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

posextract-1.2.3-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file posextract-1.2.3.tar.gz.

File metadata

  • Download URL: posextract-1.2.3.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for posextract-1.2.3.tar.gz
Algorithm Hash digest
SHA256 59a2e4bdef272be81d524a24003c2b6d66a1fd561b3f2f4df0b017b0676fbaf4
MD5 4c0779817c764003a18f60f7d5157885
BLAKE2b-256 bb8d94accbd509c2dca90b15d8a103cc8f0ca9c4f0f9d6cc5f5b6991bbbbfe88

See more details on using hashes here.

File details

Details for the file posextract-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: posextract-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for posextract-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8cf42f61c3edfe2f711b3b59ce45baaa818dd1530574e08d893ddae080057dc2
MD5 dc461540644f51ce448afb4c76ffec9b
BLAKE2b-256 50dcc05708e6398efc456caf4a872df084c59eac3fc66d053d6f08aac2974df8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page