Skip to main content

Package for retrieving collocations from text with Spacy

Project description

Collocater

Collocater is a Python library for retrieving the collocations to be found in a message. The ontology it operates on has been scraped from the Online OXFORD Collocation Dictionary.

Collocater can be added as a pipeline component to SpaCy's preprocessing pipeline, so that a messages' collocations can be retrireved the same way its named entities can.

Check out Collocations Finder to learn more about the project.

Installation

pip install collocater --no-deps

Usage

from collocater import collocater
import spacy
from pprint import pprint

collie = collocater.Collocater.loader()
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(collie)

text = "If this isn't a bunch of beautiful flowers I don't know what is!"
doc = nlp(text)
print(doc._.collocs) # returns [bunch of, bunch of beautiful flowers, beautiful flowers]

#Tokens with associated collocations in text:
colls = [(col.text, col.start_char, col.end_char, col.label_) for col in doc._.collocs]
pprint(colls) # returns [
#                          ('bunch of', 16, 24, 'bunch_noun__prep'),
#                          ('bunch of beautiful flowers', 16, 42, 'flower_noun__quant'),
#                          ('beautiful flowers', 25, 42, 'flower_noun__adj')
#                          ]

print(collie(text))
#{'beautiful flowers': {'coll_type': 'flower_noun__adj', 'location': [7, 9]},
# 'bunch of': {'coll_type': 'bunch_noun__prep', 'location': [5, 7]},
# 'bunch of beautiful flowers': {'coll_type': 'flower_noun__quant',
#                                'location': [5, 9]}}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collocater-0.3.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

collocater-0.3-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file collocater-0.3.tar.gz.

File metadata

  • Download URL: collocater-0.3.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.1

File hashes

Hashes for collocater-0.3.tar.gz
Algorithm Hash digest
SHA256 155a53cff5b0d371968d3a0b9df13bdbd4823398578a9721c08ae8f145134a0d
MD5 50b8f971cdf3270e2233b0b2eb8f15ce
BLAKE2b-256 84b89baceb184e180ec3d858c2822e61ac81dfcdf5992360ff8957d1d6370fa2

See more details on using hashes here.

File details

Details for the file collocater-0.3-py3-none-any.whl.

File metadata

  • Download URL: collocater-0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.1

File hashes

Hashes for collocater-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a81b2f772b17995625abc14e21bf716ab184236cce461fbb94d89f7469ba0ac8
MD5 495bbcfe6e01ce0f9181b135ff3b4c09
BLAKE2b-256 1412ab8f758614d743f8101d3099577b1b3d3b837ccdb477b39d50279d682095

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page