Skip to main content

Package for retrieving collocations from text with Spacy

Project description

Collocater

Collocater is a Python library for retrieving the collocations to be found in a message. The ontology it operates on has been scraped from the Online OXFORD Collocation Dictionary.

Collocater can be added as a pipeline component to SpaCy's preprocessing pipeline, so that a messages' collocations can be retrireved the same way its named entities can.

Check out Collocations Finder to learn more about the project.

Installation

pip install collocater --no-deps

Usage

import collocater
import spacy

collie = collocater.Collocater.loader()
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(collie)

text = "If this isn't a bunch of beautiful flowers I don't know what is!"
doc = nlp(text)
print(doc._.collocs) # returns [bunch of, bunch of beautiful flowers, beautiful flowers]

#Tokens with associated collocations in text:
colls = [(col.text, col.start_char, col.end_char, col.label_) for col in found_collocations1._.collocs]
pprint(colls) # returns [
#                          ('bunch of', 16, 24, 'bunch_noun__prep'),
#                          ('bunch of beautiful flowers', 16, 42, 'flower_noun__quant'),
#                          ('beautiful flowers', 25, 42, 'flower_noun__adj')
#                          ]

print(collie(text))
#{'beautiful flowers': {'coll_type': 'flower_noun__adj', 'location': [7, 9]},
# 'bunch of': {'coll_type': 'bunch_noun__prep', 'location': [5, 7]},
# 'bunch of beautiful flowers': {'coll_type': 'flower_noun__quant',
#                                'location': [5, 9]}}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for collocater, version 0.2
Filename, size File type Python version Upload date Hashes
Filename, size collocater-0.2-py3-none-any.whl (3.1 MB) File type Wheel Python version py3 Upload date Hashes View
Filename, size collocater-0.2.tar.gz (3.0 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page