Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.15.1-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.15.1-cp311-cp311-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.15.1-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.15.1-cp310-cp310-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.15.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.15.1-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.15.1-cp39-cp39-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.15.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.15.1-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.15.1-cp38-cp38-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8a8070713bec3392dffe2dc526acf7d3e92b840c5a594cead92f84bfadaecf96
MD5 7595be6c417f29c2b1c5ec1a18b5efdd
BLAKE2b-256 601b94d45c090ccb9868b829ed83bca6429b21167e5f867282c5dbe4ab2d22f0

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a2c5bbe7dd57f678b01ea17ab5643abcfc2c190d8a4bcb3c3b67d73d5bb84c72
MD5 009620e7f83768eb49cb450ac4e2040f
BLAKE2b-256 be04d76c2a65de31f10ce11fa5f5c88f9e559dbb684cf3ada540ab3534d0745f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 e7e490b6bf6176d89277a45b809ae39d64f8e0bd9a199445c21f38309f8ad849
MD5 6b32afa9b86976b89d0a799cd41ce6bf
BLAKE2b-256 b30781280e66c3a46fa18690537a7f18ed32ed27b1353bd4a4f660b9f2b72b2f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cff1a3a5021fa403b40f354ee1ffcfff2ceac50439491206619a48064cdd2e75
MD5 46d5d22099824bcab1e18e01480cea9b
BLAKE2b-256 38221fe8ca2798ff145788ad009fbecf98d75dabe8f66f0ad2e2fb2f59323a14

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 c9af3cb87c33a1e248e53a609b70c32a79ea5053d05acfd18781e21fbc8c6790
MD5 e9e9775e59f732057371a2632d8f84ff
BLAKE2b-256 fafaa302af2ad98a3b46daf2990762801f4b6f60cb2fae9d483db1bce1c78b67

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 35b8591c17aab1a03e7efa83694b0904a91a56136b0affc7b1ab0d84695eb705
MD5 40b97d58281a58d1131121462246af49
BLAKE2b-256 ccb744b0fe2546eb9e907bd3c887ef1adc490fd07868bf128d137d9b138deb5d

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b27b6ab2b58b4ed26c67647bd49d6d0bf956202da8bbeac642b5f669d821586f
MD5 cfa4c2b5d55bad27b96ac61b1aafcf33
BLAKE2b-256 db16afc67f0f936b94a1a4d9490b60561ca9afbb22004aad52335b435564375e

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a7ba83f82b3e0d6de9164b94953b729b3f31ba775266334c9441da89cbd38897
MD5 7bd8d523ded99dca4a03460c68a65be5
BLAKE2b-256 a83f8a5dc857d8300374a94c38ad8499ee7dc1a19101346d24564f87ce8c145f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 283f09a759eaba7b9390e4d6af3941591a2c25099fac6589bfd012ebf892f80d
MD5 4ce4e07f5506e4ea07fd4a1048248f1c
BLAKE2b-256 f2737a4dd93e1ca4a1f4ba23b55582201abbd56549f1130d82ac3ceff0d89ad3

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a96825aac5cba6cdf4b9f876543ac1690650a55eeb834a7ae49f7f586c344bb1
MD5 faec49da8b43ab5ab52b973b54a7d673
BLAKE2b-256 8e90c2e4ff63b1c73d545db6ae4efec4c6122701f21b80c913bac8c1e7f31755

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 46f17ba14708e2a925901823d02c7ccbf19fc066986c7f5fe6eddb268d5f3f76
MD5 8b306ed42dee43d6d12c0d0bf5ae07eb
BLAKE2b-256 89f5d6f724ad5f6fda0630f3f325bb26f3f03697dae707016403b97114fe815b

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.15.1-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.15.1-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 b310a6d9b4b64b4ba016840ca7f7a6b4e60df29068eb826dbdcb57736929b44c
MD5 1d2d61c7acbfa2efda68122c587437d6
BLAKE2b-256 357974111e6888f43e7b677c0786fe5fc42df614135fc57b0b2afe1352838dc9

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page