Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.0-cp311-cp311-macosx_12_0_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.18.0-cp311-cp311-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.0-cp310-cp310-macosx_12_0_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.18.0-cp310-cp310-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.18.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.0-cp39-cp39-macosx_12_0_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.18.0-cp39-cp39-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.18.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.18.0-cp38-cp38-macosx_12_0_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.18.0-cp38-cp38-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e0cf9fc1de417e5af196adeddad2729d86835bcbf206e06c9d0dd8d0861bc520
MD5 0898121abb3722ca6a7882bc9a9c5d44
BLAKE2b-256 d076da82c1be584b4e5d2df9539a6dbd9e6f36d10c886dc4a5906f71c2413556

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 7bb3a5dfba26d891fb7839913e7df1055fab92dd0f18c645f344e179805f1312
MD5 7700b8b5d5cd419ed0dcabbe5d3fed8b
BLAKE2b-256 bda8d209989627b923f299e0ae101f3ee9a185697a0babb2d6bde39ac22025a6

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 46ca0fa3e2bdc0150e8603e3dc9701d5b055e534a35115ae0ddb9ee9e693ba69
MD5 8e474bbbf1ef44bb2a8fe6bd47756855
BLAKE2b-256 f1de4509e79e4ba516e6f9d545e6fcb1453c53bf145bda938334a35142839ce3

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 04d1297f580d0c2140bbcbc6bf79f19ef79c6e2fab864cd731f9775f4decbc9f
MD5 a1537e9f74a540d57708dc2f7d8115fa
BLAKE2b-256 12ab20851ec907586a441d375971eadb7086643fcaf7e2c367b4a1e7ec1d2385

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 7b33c1469c83feb3d5afa6638cac59f304edd8459dc1e6a4227bd648aa2519fb
MD5 7c1ae18eb1abf8d33d91ca5e94e18b1c
BLAKE2b-256 f079aaddee4f56bd41c209dc31cce0a79ccb39752a8d133e07ee36a43d72fc9a

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 1a2285413e11b9f66972e168afbc4d02d1fc83e1b37b9d0076575592456bf0c7
MD5 932c46b6e0b3de6d1650124fb0bd353a
BLAKE2b-256 f108cf549a41400fc9c1e9de679710f28f5f1ee14adbe93675549af0b99613c7

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06138810a85becea57159c6262d7c6eebb0ba15a92e9bc74bda82f87e2d8868d
MD5 ac8a791f2ef8019894add2fa8d4eb867
BLAKE2b-256 69ece1920c6e71ce1a31226de2cdf1a263d84a6d78641a2fb78261d0165e01a8

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 b91accbe8913573fa5c141a30043e005f967474bf42db67c54325510e79c8045
MD5 3c77173340e523f034b59dc77f2ca622
BLAKE2b-256 bf1e89abfc54a5575921443ca139a90d4e9462b3a98c86c27cd25acd5ca224fa

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 23aa71bb3cb684218606bfc3c0ec1e4f12e4b785c89933008b9b33079b5974b2
MD5 2983ec0d31d0f234408cac9d6833d304
BLAKE2b-256 fadf8dd7463084d0250e2076bb42d14fe82ffe91814959916d0516443283b318

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 16daf2ea109670bd6a11f64e601c62028510bf10ef8351171b7e8ed25a034021
MD5 3d9c2e3246e756a7358e431c423a7cc2
BLAKE2b-256 fa8e16039513e33d223183ad18a3be860f05edac0fcd2c1d0be31a18985265d6

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a8c6cbb242a9bef26d1d2c1916e41f93f75bc491ff3bece1858c66b40ac5c2b4
MD5 be4ace05110e4aee86df4e2299d7e780
BLAKE2b-256 8ec644fb52f7f54826920e0847e6caf7a48006cb59287a57f23255272aef0082

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.18.0-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.18.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 b7009404073d64cc69490b45365bbc6788cb0a8081b26bec0f9a997214898575
MD5 873039cd8d30d035c126fa477aa9a5a1
BLAKE2b-256 ab69e14b1638c4347ba93553de9c895ec1f0b95444c0be224c330f2cae516353

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page