Skip to main content

Graph Language Models

Project description

Graph Language Models

build Downloads License MIT PyPI version PyPI - Python Version PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.0-cp311-cp311-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.9.0-cp311-cp311-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.0-cp310-cp310-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.9.0-cp310-cp310-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.0-cp39-cp39-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.9.0-cp39-cp39-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.0-cp38-cp38-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.9.0-cp38-cp38-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 28261dc0fa2c8b99501424ac9addcfaab6ed5ac3eeebcda14e615f71b8f85b4c
MD5 6eaa7bb35a73c8b03eae08099833b59b
BLAKE2b-256 525378411282883674421ae27a866953e722bd631e9e2c60a00750d593f661f0

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 3297ca65dc55bf3bb65729cf403be0a4777e9baf14b5859b6344417407ead38c
MD5 dd758cd0097b9e438917c50557c13cd3
BLAKE2b-256 e5a98d21c1ba4802050ec82691b0de181ac94f780f8d0e513290b6a24e554687

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 2790288cf0cfb31c0e051af15b72ba3698c2baf79674392b8c11546319c8f5d8
MD5 1ffeb9eaecc1f2c6a1f40d3ace717e33
BLAKE2b-256 0006989fb410c7005819d0ca79d5a3d82f54221e331b5b09e817668810194b85

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 882b6c1c53cd6964e6d45e2f995945ca89382217361569ef90f3d512bdbf9d2d
MD5 0eb2395261473f9af5be23d4e534cff8
BLAKE2b-256 eac7752b1ff669da5e9080336a58098de9b6cbe45540e33e41aa185a1d327a22

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 ce8edeaabf20c430b6763bef9c1551cb5798ceb0dd1c2f37989be245e5667fe2
MD5 d61214225e1139711840ced272a816ca
BLAKE2b-256 6884a1a0303f7f308a6f7b9136e61f153e447a1404bffc9dc070f9c2327ab4e7

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 4bac3e576bc17f623736a829236290d6018e3a2b09d8567b8395b7f8db1d79d5
MD5 cfed686278e8255046fcd2359f3edd26
BLAKE2b-256 3fd5e41dd5606084f36057af0e356bab71ab3ff627f7e0305d18c1481939fafa

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f97740c51bab5b241ce1811cf3ad84c9e1c7031c2d8ad20a9838fa0c442d9423
MD5 8a67b72087bd817fe9975565f3cc7a20
BLAKE2b-256 9b9ee3aec9901c93140a9ce87412641f030d7e333289633728170b0e46f96459

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 7ea7a609b2e96262c0e7a990b70bad5e168e9adb8031dbda0de8ff95ee7d6c7c
MD5 11efbed57532ce931c754cb0f1bead7f
BLAKE2b-256 b42d3683b854b22df06bd89f2d9111b8e443eae696732b3a5d326fdb786529cb

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 9ba55d8b575d403f8012751436637a1206e5e4450016e5748bb94356e6bb0608
MD5 1f5c7aaf4d17c70a8c5fa8f24b7afa34
BLAKE2b-256 d64b5475af98f21b188af42d73726713932ab1a784098a7b07520762f3d1134b

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7c034d0f0e48efc5a71072ad80356c513b206eae6d6528454f078d3ebe14274d
MD5 e942d1ad5246b1234c76cde9bffe3cc1
BLAKE2b-256 c80c3c317732c3ed13842f30b3e8324eb27b15013bb4111f8b0cfb308dffe731

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 8bc8046620a10ce043a72ebe5d306ae96d688de9c41a6b250e013a2b3b6f5d02
MD5 b01132e2bf13446273a013556ea519cd
BLAKE2b-256 3c3f08d7a89cf135ef6d3b470c74e246c3e6e9b9f2069c3e44fdff38d889add7

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.0-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 eb4390fde634a5affceb1d3248963eb7c1217fd9485f92537c94c87eacd0693a
MD5 af2a3fa3384d769eb40a4062e220bb65
BLAKE2b-256 85d0d3c5ecdc3304cb538a54c254baa2f7c4d1bec9b6db2e5b7ece528614e20a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page