Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.17.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.2-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.17.2-cp311-cp311-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.17.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.2-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.17.2-cp310-cp310-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.17.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.2-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.17.2-cp39-cp39-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.17.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.2-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.17.2-cp38-cp38-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.17.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6f512fede5fd062ff005f51073ef5660e2e963e0013251176b69bd7ab9e45faa
MD5 32550481c74829f554a81cfeca6c5747
BLAKE2b-256 983e32392e0eebdaaf0544314f7745b04d27e098c64190d679e9a40eefed0b7c

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1b9203be399dd4f026769998cca25a5691ff79791ead2dfa05385af8467f4bd8
MD5 3244c80d0caa88ef72f4bab753d4f971
BLAKE2b-256 3fb112462709eacd42313e88e618e03cc86d9c7c43d6ad9cd9f1617287224512

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 95cbe8e8264c675a128f520f33afa3fd34295c64b00d282c015fe13c7cc2bf3b
MD5 791640d77a949c9148cfa87a5169ff9f
BLAKE2b-256 bf5b74908489dff983df546d35261cf405f1b8c56108b28441490e4f31234905

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8fe2669ac7f8567383e0818fe9b3b73979978fb5e65f36db3b7626bf3af6206d
MD5 4b06916749e2790c4fe4f3fc2d58dd06
BLAKE2b-256 afa2f8e9a4e05a773cfb0406079e72889722b42ec7de7824c56f07987ef2d143

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 af97145ebb2f21b074ef6385c45d60a2d2553b68254c30aa66b7ddd9206b7f7b
MD5 57956ec7625805342dc53f812c79f017
BLAKE2b-256 8f511a56fdb45bbf98979f3d30e7d7b6b5bdec429f3fbe78ff518ba236a76ec8

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 6234fb2fa6755ff1bb7000d21e4574eea68a29557d8f16ba179f5f5713766d9b
MD5 cb37a4d182a9a56dfef47f686f95f672
BLAKE2b-256 2afbb0967384e5cbd803822fd22d77feb5f2f1d0a0d7986defd4b8820562459a

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 50757b64a607104882a683b2a570f96f450faaf0f1047125df043d01406f2f16
MD5 257a708b5bd60be39c23b377505314e4
BLAKE2b-256 8414cbbd2e9998e34233a149cade39c23bcbed9eb76087584d640e706f4c0949

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 da459f79913b0f967f5802766b2b964bc997d7f5259663901d84c2780961dfa8
MD5 392fba29dd7c80fcd35519dbf43aae04
BLAKE2b-256 39cef897d7dd55abf71f5a9461ae5e067533410fc3683c06a236612d7ff12ed5

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 0f8405524b000669b82098b1989e8c4ef4da0f93407477c1d807533d9f427867
MD5 1bc57fa15d20ec5db0896d82730af27b
BLAKE2b-256 3fc19681a8043e9d181251c37d97af6f6fccdd7b9b6e18df5c647b42db9c6c3e

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 20e7b819d58df057bd1826fea8cd3e6d0ed4cac3fe819795ee5205360fa77fee
MD5 466807dca23c48d7f8fc9b4750ebe5d9
BLAKE2b-256 9dcadafd1b14c9468532fd6e0818d463e5461e40a78b813a3e3d00a6ca7829c9

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 e3ddcaf73dd5578786db3333c238790a45d0fa0af4b1df9a41a4b9dd234c2401
MD5 9e731c719d2aeaa1ffa3c182fb9e55e9
BLAKE2b-256 f61eb7db665802f8f15a67fd4ece62b20126ce85065c4e380e8c36ff6f7f5f29

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.2-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.2-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 be15a98b0cbf36e5141e5dd8e22ba29b0e0d92604fc58e53e8fa6c837b29a40f
MD5 459fb6af5ef05c207f77deba9ce9aa05
BLAKE2b-256 23e6d2845266fc7d7df5fdf85bd025fc04ec89c813c448019f823197f01ba8ad

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page