Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.0-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.16.0-cp311-cp311-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.0-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.16.0-cp310-cp310-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.16.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.0-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.16.0-cp39-cp39-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.16.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.16.0-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.16.0-cp38-cp38-macosx_12_0_arm64.whl (6.1 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5892c49af3132f2fc768b321938bce8a0d8d1be21098a89ac04b0a76f1669c25
MD5 9b75199b859230a40e489ce87d5cfbad
BLAKE2b-256 df9173d1bf135d490e68a5ee97bf8ce0cce6f091b2f68cb8995baa5625454bc6

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a6a807443119e526813d9c09ac2fcf380afec340033db6d75543f22db77d0172
MD5 d4d133a7d51a925f7180a88e2d8bc69f
BLAKE2b-256 07cb1bc43fc750a02990a3e0035cf8cc730fd67b1dffba825bf786c81a2f3d2d

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 57899789d7a913b7729a16491b9bd1ce0c00481d99a5eb15c1a382f2b867c180
MD5 bff1aebc5f009e66a2279b1a24472199
BLAKE2b-256 643b150872e6d1cb89633a8a2570e22c869041b7ab62a54886194401fe96c5d8

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2fa1032cac331aff7428bfdb3e01f78551de6896bdd4fb230454d0342037dc7b
MD5 ae58d45bac2066a53a5b666ebd407ccc
BLAKE2b-256 181d7851895147cc87cb7cc456212c64b4392f4764a08b933fd6f588fff69299

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 90e79cefe2a99f7eea82c6d3684dc19ada599a4c8f014e5381f98bcb862ab781
MD5 787ad0d21c1a22dad82d7435a50a5588
BLAKE2b-256 3700169e4def80004e0a47221397f5c76e871e2ed346fcc835a74a0991cfb2aa

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 55a26941418029931e5466425ab0ecea5a07eb165292fcd4912ea43a1c4d6c3a
MD5 2827b9d73945ff7c825cf4e68a2d6e91
BLAKE2b-256 9ae16daaa9b8964c6960fc77dcdd3a396632c49ed5114d0f5c0621c5efda7052

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 186cc2decd3d92d6a29e1f256beee6720a867c870018508e3e7f9a502e95ec61
MD5 0956ca2db749d93b08c171d245f08c01
BLAKE2b-256 a2115a15aa25884e717889187150e1ad86a823f2401b7a940d5d742d49bfd547

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 5b2771451220567ada95e8ad7c8584e901e3e6f33c02e85b2bc31e8a81912867
MD5 75b3a653ec1926ebd95cf8390d6e6673
BLAKE2b-256 72a2d2a825c30f1658b6e9d1958637c6b2ae54439c881dc7b793a4f6fb15ee6a

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 dcea4bcda37f5144843fd1c9ef5c4b9e073c3edc0b46a75066d4510255dd4002
MD5 ceb3fa194181da40d2563dc5d27bf6ea
BLAKE2b-256 d8b80c2d5af573a95be93fce8a59d82f34eff0f4c8cdd7177d9ca6fe3224b44c

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 99abc53837e0e38d0fc3c95bc01ca5af749d6be794e3df7462a899383dbe71c1
MD5 5cab2c4004ff545ffeb74ecf80e160f2
BLAKE2b-256 6b3b270a947d95d21ed2c01f60412bf3cd882c5830197982b153f1b807435cbb

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 2f3c88aef52339c5778c5a90926e4f26ee98f8c049d66fb070fa3d9878eb7686
MD5 f06713cb72498c8d0bf85fc4a3a299a6
BLAKE2b-256 872de106cd5e711b9b3946d33769250cc0a59bafa33c76c0cfe8d834a08e7edb

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.16.0-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.16.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 4bbee92c9997179016e9166026b2e5256b3a4c984f5e1d5a40bb48d4292d0304
MD5 5acfd1c6feac07763996d82fc557b361
BLAKE2b-256 dac60e7daba9369e1501908c5d1ca7651f33db443049ccb2bea5e0e924844231

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page