Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.16.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 12f94be4fa3794bff095b1db8fd44e6cd5ef7f4dd7d9add6f135333558d2d0b4
MD5 811e9c86597566275d1733c771caf53c
BLAKE2b-256 5c006446d3b221920f7bc8b445ab8fbd85e0522cfbabd1241478443bcd73dd1b

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 4d5df836410950c02aef9197b250168b64b67169685053d8d9cf9cc6921d39d3
MD5 7b0c706772439f3fabb6ccbd5224ebe4
BLAKE2b-256 33619016b8727a899b00b2b6a5d80e90af107247c580b36652a0b8be0b7ace97

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 0bbaeb0d69fc9a317ccc44a8aaa14bd302b6118a9758ba027d50393bdf5b5592
MD5 7005b183cf0e2fe4ea286b949794c250
BLAKE2b-256 3370a782716c06eeddd82bc047027cb696e4f171258f32f6ca0d815bde0b6e5f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7b9de4abfc8f301e5af1d4d2bc042203cfa3197df05584289b129467d9ad63a4
MD5 c229ee0860a71398f8bae35771261432
BLAKE2b-256 75833afaf7396be4287ef8f2aea263d250ba74027ad948b9dad46156464a5d05

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 f574475312bfb7efffdb75b581d4be4f490856273df97d4a59d48676e42d6553
MD5 9249df56c2e12be12fbf93cb8f4aed75
BLAKE2b-256 2d686a6e937dda2a1896acd6e2af02d22d054c320d4ec60b19c5cc3bbefce89a

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 541a907903fd4c992aaf596b2fb41253a9e0fb04fee1b9587f897a607cf3d517
MD5 c02d7d191e6ad7bad812c31f671556c6
BLAKE2b-256 1066c8eaef7fb2f7be9e38aa94b23e0d89881adeb804b4da7fac179bcb031d1e

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06e8b58193508b1bcc67f501b96fc64529f4811b415dbae141249349202d2654
MD5 2a93950c72c7b1db3d52a3a2baa0b8ad
BLAKE2b-256 f7fef1beefa991211cf44a985ca185d4568682755b4bbe7118a77b210ffa4ce9

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 24e9696a9b9c3a94635f9d7b6037de993e3764e6398a4fe3c8dd50c4c42a1541
MD5 9c2e267fc4027981d7afa483d1a16459
BLAKE2b-256 432591de839732c2219063022094e09ee27f13232cb7d3559f00a3d8b4bbc804

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 6dd8b007f1035b48c93ca487497852528585e2455fd144ad603c9268cbdb920a
MD5 108f14a47d47e847a30a825750058f26
BLAKE2b-256 6acafed62a1fc9b5b8e444506d60edc3fefb55220a4a690cec4ef29fc6511d72

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0f0c64c7ee7bc862c9b2e2da04e60ac1f3f665d439cb5fc40bcb18e7971c6138
MD5 e92ba0521953f86e713e20bbebb6b257
BLAKE2b-256 7a0dcc0d58551d53979660048829e26ab699df0611c10b558524ba22907b43d3

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 914d7e880a9fee447291203cae4e5a9ce132027be30adf71337c0eaa187bbad1
MD5 6a750246b39d89a0e4a4b63332e0eec0
BLAKE2b-256 9f7b1ad664de1cae81bec1e4649d5a0fb77fdd142a1152ce16a411755e895919

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 1896823579ca003334b81a8c81650ca1b0bd14569f29e0c3a49f03875e1f01fd
MD5 d4302bee16a855c1678412c79d9fd88f
BLAKE2b-256 fa76b29e7c06c1b718b4bffce2fe5fc365388f52833824c78adc307255655282

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page