skipchunk

Instant Knowledge Graphs from text documents.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 3.8

Project description

Skipchunk

Easy search autosuggest with NLP magic.

Out of the box it provides a hassle-free autosuggest for any corpus from scratch, and latent knowledge graph extraction and exploration.

Free software: MIT License
Documentation: https://skipchunk.readthedocs.io.

Install

pip install skipchunk

python -m spacy download 'en_core_web_lg'

python -m nltk.downloader wordnet

You also need to have Solr or Elasticsearch installed and running somewhere!

The current Solr supported version is 8.4.1, but it might work on other versions.

The current Elasticsearch supported version is 7.6.2, but it might work on other versions.

Use It!

See the ./example/ folder for an end-to-end OSC blog load:

Solr

Start Solr first! Doesn’t work with Solr cloud yet, but we’re working on it.

You’ll need to start solr using skipchunk’s solr_home directory for now.

Then run this: python solr-blog-example.py

Elasticsearch

Start Elasticsearch first!

Then run this: python elasticsearch-blog-example.py

Features

Identifies and groups the noun phrases and verb phrases in a corpus
Indexes these phrases in Solr or Elasticsearch for a really good out-of-the-box autosuggest
Structures the phrases as a graph so that concept-relationship-concept can be easily found
Meant to handle batched updates as part of a full stack search platform

Library API

Engine configuration

You need an engine_config, as a dict, to create skipchunk.

The dict must contain the following entries

host (the fully qualified URL of the engine web API endpoint)
name (the name of the graph)
path (the on-disk location of stateful data that will be kept)
engine_name (either “solr” or “elasticsearch”)

Solr engine config example

engine_config_solr = {

    "host":"http://localhost:8983/solr/",

    "name":"osc-blog",

    "path":"./skipchunk_data",

    "engine_name":"solr"

}

Elasticsearch engine config example

engine_config_elasticsearch = {

    "host":"http://localhost:9200/",

    "name":"osc-blog",

    "path":"./skipchunk_data",

    "engine_name":"elasticsearch"

}

Skipchunk Initialization

When initializing Skipchunk, you will need to provide the constructor with the following parameters

engine_config (the dict containing search engine connection details)
spacy_model=“en_core_web_lg” (the spacy model to use to parse text)
minconceptlength=1 (the minimum number of words that can appear in a noun phrase)
maxconceptlength=3 (the maximum number of words that can appear in a noun phrase)
minpredicatelength=1 (the minimum number of words that can appear in a verb phrase)
maxpredicatelength=3 (the maximum number of words that can appear in a verb phrase)
minlabels=1 (the number of times a concept/predicate must appear before it is recognized and kept. The lower this number, the more concepts will be kept - so be careful with large content sets!)
cache_documents=False
cache_pickle=False

Skipchunk Methods

tuplize(filename=source,fields=['title','content',...]) (Produces a list of (text,document) tuples ready for processing by the enrichment.)
enrich(tuples) (Enriching can take a long time if you provide lots of text. Consider batching at 10k docs at a time.)
save (Saves to pickle)
load (Loads from pickle)

Graph API

After enrichment, you can then index the graph into the engine

index(skipchunk:Skipchunk) (Updates the knowledge graph in the search engine)
delete (Deletes a knowledge graph - be careful!)

After indexing, you can call these methods to get autocompleted concepts or walk the knowledge graph

conceptVerbConcepts(concept:str,verb:str,mincount=1,limit=100) -> list ( Accepts a verb to find the concepts appearing in the same context)
conceptsNearVerb(verb:str,mincount=1,limit=100) -> list ( Accepts a verb to find the concepts appearing in the same context)
verbsNearConcept(concept:str,mincount=1,limit=100) -> list ( Accepts a concept to find the verbs appearing in the same context)
suggestConcepts(prefix:str,build=False) -> list ( Suggests a list of concepts given a prefix)
suggestPredicates(prefix:str,build=False) -> list ( Suggests a list of predicates given a prefix)
summarize(mincount=1,limit=100) -> list ( Summarizes a core)
graph(subject:str,objects=5,branches=10) -> list ( Gets the subject-predicate-object neighborhood graph for a subject)

Credits

Developed by Max Irwin, OpenSource Connections https://opensourceconnections.com

All the blog posts contained in the example directory are copyright OpenSource Connections, and may not be used nor redistributed without permission

History

0.1.0 (2019-06-18)

Cookie-cutted.

0.9.0 (2020-09-25)

First release on PyPI.

1.0.0 (2020-12-10)

Stable API.

1.1.0 (2020-12-10)

Beta Release.

1.1.1 (2020-12-10)

Basic Readme doc.

1.2.2 (2020-12-14)

Configset path fix.
Static assets moved.

2.0.0 (2021-09-09)

SpaCy upgraded to 3.1.2.
Batch multiprocess processing for better throughput

2.0.1 (2021-09-09)

Fixed bulk operator

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 3.8

Release history Release notifications | RSS feed

This version

2.0.1

Sep 9, 2021

2.0.0

Sep 9, 2021

1.3.2

Dec 15, 2020

1.2.2

Dec 14, 2020

1.2.1

Dec 14, 2020

1.2.0

Dec 14, 2020

1.1.2

Dec 10, 2020

1.1.0

Dec 10, 2020

1.0.0

Dec 10, 2020

0.9.10

Sep 30, 2020

0.9.9

Sep 29, 2020

0.9.8

Sep 26, 2020

0.9.7

Sep 26, 2020

0.9.6

Sep 25, 2020

0.9.5

Sep 25, 2020

0.9.3

Sep 25, 2020

0.9.2

Sep 25, 2020

0.9.1

Sep 25, 2020

0.9.0

Sep 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skipchunk-2.0.1.tar.gz (1.3 MB view details)

Uploaded Sep 9, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skipchunk-2.0.1-py2.py3-none-any.whl (251.1 kB view details)

Uploaded Sep 9, 2021 Python 2Python 3

File details

Details for the file skipchunk-2.0.1.tar.gz.

File metadata

Download URL: skipchunk-2.0.1.tar.gz
Upload date: Sep 9, 2021
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200529 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.3

File hashes

Hashes for skipchunk-2.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1931a557a16120839db8594060fd06fdd0a73873827f00e3d16720e1a9c2c0f3`
MD5	`897d8f8365d84c1fcb778ff5f667d429`
BLAKE2b-256	`bccd8d5e7fa1e3d910ac680bbe7e1038e55d4f6d3fa2684b85b369d0e4b4d470`

See more details on using hashes here.

File details

Details for the file skipchunk-2.0.1-py2.py3-none-any.whl.

File metadata

Download URL: skipchunk-2.0.1-py2.py3-none-any.whl
Upload date: Sep 9, 2021
Size: 251.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200529 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.3

File hashes

Hashes for skipchunk-2.0.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b35c28a208774aef9dd622ca39f6b0fb3df41ff5608e6fcbaf829fce3f4f80c6`
MD5	`fca9c7fd5a248631087adbcec8cd8e6f`
BLAKE2b-256	`a4e94b4eeafb7705aa74b22ef7deab5a360e715591e975b33b55756f981e58ef`

See more details on using hashes here.

skipchunk 2.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Skipchunk

Install

Use It!

Solr

Elasticsearch

Features

Library API

Engine configuration

Solr engine config example

Elasticsearch engine config example

Skipchunk Initialization

Skipchunk Methods

Graph API

Credits

History

0.1.0 (2019-06-18)

0.9.0 (2020-09-25)

1.0.0 (2020-12-10)

1.1.0 (2020-12-10)

1.1.1 (2020-12-10)

1.2.2 (2020-12-14)

2.0.0 (2021-09-09)

2.0.1 (2021-09-09)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes