Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.1-cp311-cp311-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.9.1-cp311-cp311-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.1-cp310-cp310-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.9.1-cp310-cp310-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.1-cp39-cp39-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.9.1-cp39-cp39-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.9.1-cp38-cp38-macosx_12_0_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.9.1-cp38-cp38-macosx_12_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f21d7548522358b7f9417895277cf388e8c0d9efdd99a916541000ca2c063623
MD5 2a39013cca062b7cbcc1d718962b7887
BLAKE2b-256 e2f5c9e616ca8c8cba339f7f6f0fd598304d7005faf3de514965845faeb064b7

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 8973a81fd4344ca3010b4b1bb7e7f184976c7da8cdb239b3513d210f1866edb1
MD5 6b0772cc668b59a66edd77f9105abbbb
BLAKE2b-256 24a7fe2c63220902bb4898b3fa73005cb9406f157d6ee9880fb7594e8ece4be1

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 072f2a33708241e11f08f675da395466ad2e47e24827fcb2b174d9859d020cf5
MD5 7d428b07956292b7a47b59cc12dba927
BLAKE2b-256 e752fd843d41d358120db9a86eabf54eeed6e86711c5435220ac361b57453050

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6b66e8183f56afff49570cf3b45f166ba47032a256362ca57cd1a2acc863fd05
MD5 a297647b9cad373f24e60cf48d9403ed
BLAKE2b-256 e0eb083e5860e7b9747e69d2757d3bb4ad413933d7b138b0f5e6a945254d2178

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1f58d99c54598972d292afae39037f60dc0823a14e43b3a21c16534f24f8c4ff
MD5 1f0860d87b6160c8cd6c2f286ec36161
BLAKE2b-256 c0981af559cc681e0ee204214ef3be5678ef971b65a5d30546ddaf569a0537b3

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 7ce0ac4ff5289e39462a93d4cf5063c420b94d6920d5656ae16c9432a07575a3
MD5 28be9c6c3bd00d99e79b7c3319c9a940
BLAKE2b-256 4378516874d49e93951c14538a21f09b80b5c3b7d61e4e262261a883af26c626

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b184b4887cf4bd22c445aa95a1714a97528e207ff0ec71f087728c3aa20e879
MD5 38f9906fd130b425b17494ab13878884
BLAKE2b-256 a7b4b6e4a1f599dd8c5b22a644ee41f01aa39b5fad5698dce6be0e003fdd6457

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 02924eef9874aa6836affb9e57fd7139cd3d865bb42e47eac7888d4984e1b32a
MD5 fb683075e913d5d0f5679746fdfa73bd
BLAKE2b-256 7e6a0388fd91b891a14649abe3f52b12a3a85905a1ce84c659b72bf7cccd8cb2

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 26c12f4df67585a0b19fb988d1b4b3ba16429814618ca5f629b310dad0e52d47
MD5 dc5cc347f0348569f1209efdcaad8fd7
BLAKE2b-256 551196b6b2893c8f279db14de91093aac5e4a6e2788bf61e84a4a9ff2872923f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf6c6d18baeb09163936aa2f5e69ce8d7993d6e2cb40f45fb528762148d73b3c
MD5 9dfba50c5393be5cf492c8e6ce618938
BLAKE2b-256 31bffa24aa535cb8196c9a4215a11efb1c941d3b2e3b5f673356e814e6a8d5af

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 42542839dbec8e6db7f70ea3cc55523a92a6aaf666be0567dc6e6769a6f34182
MD5 ebd9ad41b54134a568cd94f4fbeaea8a
BLAKE2b-256 5ad4c2215e0e8e819ed27c80c941f09467a83d5e532e3bfcbb3a6309e5a61d5d

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.9.1-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.9.1-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c10587b800fa264657a1cb013f583968dd89ef97cf41d16d30a31b11d8daef51
MD5 90d2969a977f16babc9cdf22642478e8
BLAKE2b-256 affaf5aa8cab72bf0c619895af8eeb13b8df72c163272fbc08d7ea6787defb05

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page