Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.18.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.3-cp311-cp311-macosx_14_0_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.11 macOS 14.0+ x86-64

deepsearch_glm-0.18.3-cp311-cp311-macosx_14_0_arm64.whl (6.2 MB view details)

Uploaded CPython 3.11 macOS 14.0+ ARM64

deepsearch_glm-0.18.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.3-cp310-cp310-macosx_14_0_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.10 macOS 14.0+ x86-64

deepsearch_glm-0.18.3-cp310-cp310-macosx_14_0_arm64.whl (6.2 MB view details)

Uploaded CPython 3.10 macOS 14.0+ ARM64

deepsearch_glm-0.18.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.3-cp39-cp39-macosx_14_0_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.9 macOS 14.0+ x86-64

deepsearch_glm-0.18.3-cp39-cp39-macosx_14_0_arm64.whl (6.2 MB view details)

Uploaded CPython 3.9 macOS 14.0+ ARM64

deepsearch_glm-0.18.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.3-cp38-cp38-macosx_14_0_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.8 macOS 14.0+ x86-64

deepsearch_glm-0.18.3-cp38-cp38-macosx_14_0_arm64.whl (6.2 MB view details)

Uploaded CPython 3.8 macOS 14.0+ ARM64

File details

Details for the file deepsearch_glm-0.18.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6814f1d7f8850aa1e4f341d5020180254b2be2fbb2fe11f86a50ad0a420ad60f
MD5 36c7b4a4413c56513405a673ca6d1240
BLAKE2b-256 eb35061f91e03607b08554977874204058e364e93489ad6416ae9e83e652141f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 617f6f7028b3e4b9d8c2f57fe301754473b3ddfa737bda40cbcdccea57cc650e
MD5 be587e8b07748bb53addb2736e44dca6
BLAKE2b-256 c906c038c5a4e3479a65181772ea6e3dab7013ebdae02a9e67bbce467ff95962

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 f344e2b33828dc423dac93a92505e7cad738c4cc6cd977e200b1d445eee3af65
MD5 1d0a61a17d0ecd5fcaec3dfc0cc132ab
BLAKE2b-256 bdc3533a804c730f612186bca204eed68a0b3885ad545f20c69f801aff3de890

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2b66f79934ea39be3f74e4ea769dec956574d624e6540ad739997753f641dfc2
MD5 819a4c90320037a05ba68fe8389ebde0
BLAKE2b-256 7c8ca5445beef444a6ea0a27cc4727334fc26afdb882d165b700b85869360ca4

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 78c4a719ef7a17dec82b0d86c53d33fe0ea2b2fe2849b7a2ee820673e4529b22
MD5 a0b6643ec47f27154e8df3200924393d
BLAKE2b-256 90c96d7810074ee52aae7d584d62201d50f2d7f862933b04c97f3767a4ebef50

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 787737cc5f771471be07f62a64ff7faa3fecf771cdf0ce92db2b9c1bef4aae74
MD5 c3eb30dba263ad0d36fb3e453e731313
BLAKE2b-256 1c8a53b65cb75eb6054856360cb106f93db318fda017473fa05902946b7105f5

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 01320598894a54eb02ee3163e26e69bca30e37f144c65953c14572c0fd2b2d46
MD5 fa6a3148145f48505fac0df3b3a304b6
BLAKE2b-256 a3deb6771ff662801af6ee29af04378ed60a3e0c6a383e7245c16513b2b6658e

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp39-cp39-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp39-cp39-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 0d4a86be5a4b61b4ae1b9f1a50f4e87a6f44bc63d4f72bb7f84eccaf982692a6
MD5 add4e7455b33b6a307b9a1c97c86af1d
BLAKE2b-256 17a186fc611454bde431272aafa84fd2d96c7a8a41b3c68924fff4d61e5ebcd7

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp39-cp39-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp39-cp39-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 41db65deb2988510ee5eb9715b88d9bf8b933847261ec8e788f628f738b832de
MD5 700b5a168228c94ba2f1056b44e6b25b
BLAKE2b-256 1748016face89fd6074cc2e911dbf46e992953f1e34ccaaafa07fe84c190675a

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3aaa89d5430dcffa0cf676d5acb70402612e6c6f36cff187f1aba2d59804581f
MD5 7b8ce32c652ff4257bd756b6cce1aa95
BLAKE2b-256 3d9529f8c12f6095b594d3c45acf43136fa85b735d1aeed39f486d51f56eb665

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp38-cp38-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp38-cp38-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 1148155fab50db23f3cf7f6378eedc795801f3d3cefb56f8654c22466409693a
MD5 6b05796624ad307d8cde10a7738e695c
BLAKE2b-256 a9472e1551a2a18e610e128c194a61f4d71e66c9898f073a00a342517779fbe9

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.3-cp38-cp38-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.3-cp38-cp38-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 eee8fd09a568a091ccadf5818af59a71cb3b34d9deb7aaefc7753c977bf0c73e
MD5 59c44bb51ea70eb9381ac4a479eca1cf
BLAKE2b-256 0b07b9af1d475098ba7fcb36b7b51ea5f6264b3594ec7e2e9b50233a565a7975

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page