Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.17.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.1-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.17.1-cp311-cp311-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.17.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.1-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.17.1-cp310-cp310-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.17.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.1-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.17.1-cp39-cp39-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.17.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.17.1-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.17.1-cp38-cp38-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.17.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 77b78a974fa52a592afd8ff171498f65517942001cb96cd6d6205a666fc2c3e0
MD5 4eabaa256087d495ae602ca129b3c2db
BLAKE2b-256 4a481efe4a1db9dc6d8ecf36f533e9cd799e5762e45d03412bba07743801fec2

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1cc021cdc7a98c6210649b6d903c1c60206e1b2674407ed84ff452685037ad89
MD5 3c2d5df87c8ff5ee8d94b6210d927766
BLAKE2b-256 99b73adee4e34f91ffdb9d945cafd71aab9f906794b081be98a268bf902b38aa

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 bfefab03ac88743f1a9b84334ed275f9a7509fd7615a53680609a90e4ee8e70e
MD5 26e91eafd113c11bcc45776d74cd0cad
BLAKE2b-256 1d426ef709a6437f84323483fd4c12769af91448b7676af294660b29fe83ff0b

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 347858a9da5ac9035458331d79ce45925641c729455db5bf7911b60fd2b0d765
MD5 8377f1d9490de3b9146c542177b6533a
BLAKE2b-256 09f4720a078257b10da23f26b065336ecb782dd45f2bd170bed1d99cf6069471

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 ed412260585631215d69d1d7bb1ae7c4bbb924e9dfaec6b57b9789a58f006b82
MD5 0367575be8350dec021bc81691bd0b6e
BLAKE2b-256 21a42e6a675fceacc4bc07474cb41fd90a1e9324fe050cd17936b0101fb8f38c

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 92e73d854bc243f9847295fd0d0d04811b70ec5e05b70205cf07a51b7100a6d4
MD5 2c5f50874190be21185002534c0af0ce
BLAKE2b-256 229921f35740bfcd1c62a29871a6c5a0b21de9d509965e172f730fe3ca93d2a0

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5997dc66dbfd4fd4630c5d6cb45bcb185bbe0d7041628e408e2797b2e059ee70
MD5 2dd566c480996b6f1d9acff29eb43623
BLAKE2b-256 3312d671478b8d6141b1ef92631d0f21295c9bb2bbf72d46811e8a03af94f8b1

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 ad9ecfe1fb2ee8fcb8cfe9d9b5005091097dbc88c0b2fd5fffe93c7b59f5aaa5
MD5 91ca334e5a053da617cdc89757432580
BLAKE2b-256 5efc54fd697f4dae9e4d56a5df4d785c195f7d769276650457f1802539f10627

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 5e9e2f7e699f9e4c26cc0b801f1d007355f20b08c4fb027765b308d0fb8d980e
MD5 e76b50b96ceb6fb751f934c5acb2a08c
BLAKE2b-256 76bca80c6530dac16a95f44b660de061c843c3a5ff85a1f96ca5235cec53defe

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b410ec0770e3031ae440b4c4b6e4f16203b2b014a56429b77c8b961f175c3f96
MD5 1e70d30c92d15af6cd018f76ed3aab0e
BLAKE2b-256 8f9e8e88ee3b81de7a8a742c991f6ea72e3471f680b6d54b62f057975dadb240

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 c07c80edde70a5b0c802b49cda8e46a0e6916db65815bedb3cb0658cafba26ae
MD5 814c12407e32a5a3a7f2c456ef716bf6
BLAKE2b-256 8e378a117335bdb555fa90af13b675175910fa8318a7d562758d3af87f22d3a4

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.17.1-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.17.1-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 cd72b6ccd4ea2137d69fcfef142f2232be12456fabf457febfc96ea612ca79ca
MD5 dc509be1fdcfb8c65c2127e270039206
BLAKE2b-256 62828c9b612c623824f32f33a1b3696f1014c91e3400969f663ee9714dde76c7

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page