Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.14.0-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.14.0-cp311-cp311-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.14.0-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.14.0-cp310-cp310-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.14.0-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.14.0-cp39-cp39-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.14.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.14.0-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.14.0-cp38-cp38-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e6858b3f54840f3a7ba2a67155a712fad56de3dcdd9af7b8575d52795d6e198d
MD5 37e66524cbf4bccdf78aaa38576ea956
BLAKE2b-256 da5220dcb797f0e56fcadca8a94baefa9d89aac59dddababb59a9e95337b6875

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 46f3e33d869af1cb05fe89cd4c0919534a0b1bb085b42018f63f199e7b288d30
MD5 03bc37fa16eab2fb9b831dfd29c702b7
BLAKE2b-256 e3d3ee7c9930b0fe7d61709a3c34048d5e527531fd499b133c6bdbc223ac9353

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 96f198167fdf79a4b8c093ffe55b087befb46352c3f64f7df996924a42438047
MD5 890240d802d0803a1eb5fd3de337df03
BLAKE2b-256 fb0ca66da65d7d666d44a3e455572779c58c58c034ef27385b9a692a9204d8c8

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c0ea4ecafa8967719577d7cd8fb6488337d44953b657ba392a9b3059a496cd15
MD5 9988af98c1717cec6d3d96944f60992c
BLAKE2b-256 88d2732dbb64113f3d8291f764a339fc88cb9f05967a7fb7ee65325e24cafc9c

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 8dde65ffc3bac25973fab1db6495c48ac266cf00e5b1fb509dc68aee2451a3c4
MD5 dbf1e65d3431d97835135392b0d79c35
BLAKE2b-256 c21a8366be2ce80989b4bdd47bf19b49308e5839d6e08a4ea9a8b2583d1847a6

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 cb92fb15fafef104b3ac6667c334998f4e11bf8d1add329efe43d18e6919e925
MD5 49ec5ef113c218c54cbe0d8f322d5402
BLAKE2b-256 8d393fd15e05c53cb7c5fbe1326ea03b0ce48e51e4cd96e973b084b89fd281c1

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3b1aea8b8ec11416ba26b395816309836fc25df589ff05f5beeb1139e50721d5
MD5 eb805476b6e5a77a76cab60bda2ed2a8
BLAKE2b-256 a4154bb2002106a537990f21fd9d7ebfd3c65a9c8e8b5ccd01b0f70717faa715

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 64dfa3fae2054115bb4cd6f63d536c8eec987e58b96c852840c9063db714bce7
MD5 a1a847b588c7876f1004c2bf3f5dd2fd
BLAKE2b-256 15c431ae5381b2e019148974fbd9b6ee4e73aeffeec31164253842bf9a26c513

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c3a0c88c9549a60a50c69e52526c184e4251a5686d96f3b2cd36e70340fe5e96
MD5 55946f187cd89a803f18c791a3e186ab
BLAKE2b-256 1adb1126f054ee646189b4caad2ee1610eaad6f1a5ed4d2d336c476c24a8de08

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0c9695f25f8ec99111bc182c65b82518e6693fdda5b90b820b229ccf8f879fdf
MD5 57521b1bb270287fc0900e701dbbc27d
BLAKE2b-256 d188e712af223bc2c71afa9c893b3428cc38002fa6f2fcf2bcb3cf9049f7f17c

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 7c7d8c1237c49467171dd01d870a96ff28e9009d9f2d8417e9405828fb2df300
MD5 ff9b77fbddf5536c31ce5e119c9859df
BLAKE2b-256 76faec353a95ac3086712e356f9c89f8cee343b7b9d4f4cd80348409e700f09d

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.14.0-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.14.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 64596a5175be7c695923dfeada0aa1b6176eb30a06eba58cab041a78a7327ac9
MD5 77488cd31d55a03892c48ca6a1404ede
BLAKE2b-256 ba7f2f0dc1945f724ff9e4dc9e317c301c2a77b7c34a2952cf0ea3722e5a9924

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page