Skip to main content

Graph Language Models

Project description

Graph Language Models

build PyPI version PyPI - Python Version License MIT

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deepsearch_glm-0.6.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.6.3-cp311-cp311-macosx_12_0_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11macOS 12.0+ x86-64

deepsearch_glm-0.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.6.3-cp310-cp310-macosx_12_0_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10macOS 12.0+ x86-64

deepsearch_glm-0.6.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.6.3-cp39-cp39-macosx_12_0_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9macOS 12.0+ x86-64

deepsearch_glm-0.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.6.3-cp38-cp38-macosx_12_0_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8macOS 12.0+ x86-64

File details

Details for the file deepsearch_glm-0.6.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0d5c3823ad0e588c0b9b9c8090f60f3571649f5d80c83d241aa9562f9bcd717e
MD5 c298c1b8e54fe1fe0744a737b8dd36b8
BLAKE2b-256 bd5d9d469a521e79100cc5745e8edca3450bb76e9bcc743760ada50b76031c89

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 555c56d4a59ee95456960528ab7125c61d17cc1915af41d6439c0378e9b15b27
MD5 13d70564ded19630eceeedf80f5aa2fb
BLAKE2b-256 080461fc4acab14b066c3bba5dc9e61d5913e27236e5b8454d3edf94237f924d

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0af9ca3dbb37f11a15cf0002eb4119eb710a2fc63f41554c28ad19573f03851a
MD5 573c415faa3b20f13e4a70c8e2ac7e7f
BLAKE2b-256 6671e21dd09a7639df8e4738565019c8373eb1c74c2497e5896f33578416f849

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 57b0255ae3ed12d09a927cfaca65543c6988144f295850a848946f4b56c34cac
MD5 cf38f9482b55c719427dacf3d762c651
BLAKE2b-256 d5a42cea6f865cf18e4e05ee9f3dc79f5d349ecc91fbaa58dd6ecd46739ce0d2

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2f429be5ea1ebfdfb4b0eb0c8aacd1e0d8d402ee90b0314482fdce2a396d4f0c
MD5 2d2d8504282180fc3828ee9b5617adfe
BLAKE2b-256 9851ad2b67f942f713a495575681088be6cedec50bc57c94a9aa14dbc2a720dd

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 4399a28dac8b82450695ebee1ab7615d352ef9953b7a0c3e3c2ef0a437fb9e0a
MD5 fd207cf92bb34e9bbdf8addbdb8bd4a1
BLAKE2b-256 8c9bf0458114d6eb349b5badae4df79c8af3c8ecd36312f8771c25b5c98e4d7a

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6693c82f7db20b9d92aa23bc0589c0905cf10358d732d06d07d39bed04c8c4c6
MD5 800152e0c47355b7a224d321a50225b5
BLAKE2b-256 30b1d605bebea09ce76d4b706689bd824899b66ee318260ad74eb8ca5219b0ed

See more details on using hashes here.

File details

Details for the file deepsearch_glm-0.6.3-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.6.3-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 cc6e2abcef617c8f4a4027d378ccb25493094acd36e41e078882caec9f00eb33
MD5 bfbeceb9d59954ab9a085e84edf2627e
BLAKE2b-256 4ade12242fd8318ffde2cc223ed28618f0ccc5bd4b01496689a671823bb75ba7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page