Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.12.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.12.2-cp311-cp311-macosx_12_0_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.12.2-cp311-cp311-macosx_12_0_arm64.whl (6.3 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.12.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.12.2-cp310-cp310-macosx_12_0_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.12.2-cp310-cp310-macosx_12_0_arm64.whl (6.3 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.12.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.12.2-cp39-cp39-macosx_12_0_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.12.2-cp39-cp39-macosx_12_0_arm64.whl (6.3 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.12.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.12.2-cp38-cp38-macosx_12_0_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.12.2-cp38-cp38-macosx_12_0_arm64.whl (6.3 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.12.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ae926bdbec45e8ed9585a0cb9e3e124796522a18707310bee6aea44084616f12
MD5 ddcc6791f9f88eeee6d6d1a8db7d961f
BLAKE2b-256 08c29bb6b2f87fcf80399272f2a35e189eeaf3bd48fe99d6ccbc35260d2b7306

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 6c5c1dffea9c2118dad31762fa2c437d7f90954b9c0b744f5754dbf819be5462
MD5 52c2be1683c565ad401f1f582f4b395d
BLAKE2b-256 6088125331cce62d21300413e5c4d0cb44a4daf1df9333c21bb7ddaa7fbd5d06

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 0695723b85fb81f65e9193549940e72788eb6a9369fa3a472a82c84ed55ca1cb
MD5 f80c7349e79aec0b9b41697f59edfae5
BLAKE2b-256 9cc122dbdfd71a5b5c13ea761fc5458cde66894f71d34327039eddc0c896d913

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f39a5f2032f00a0474a38d67c3c462a2fd9af6ebb2dce00489d83c51ef740aea
MD5 0e1a395d95586fd2f76416067cd3792d
BLAKE2b-256 f689c9dd05cae9b36bd870b432dc4b1b695b4b6a48e8363a84c46e1177d1e994

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 f0f26a8be5795c750c246ed5fa9381ee0b2deeac3c0d4e37a37f5b76430cb567
MD5 161c99d0e64bae055b9fd4694583371b
BLAKE2b-256 72177c9a894a7c63984876095b13d9c3619859fbaea34b8f8141c761116ff40e

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c815a9c6c6179047fde28f979783f80b62b817d2bdf7735d6e75554f518415af
MD5 f7d7f1bc3294aa6eeb2c8ed7c89c8634
BLAKE2b-256 a422585861418de043ac826449b24758362bcc89154de84582a35eedab42c78f

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 79e0ff50ee432e7f0a3ab8a95ba68f58b8867b5361c8489f9465da3f4f73b7c0
MD5 d515c3f5e069b2fea07355ee32b58206
BLAKE2b-256 2f021785d69c4dde548644a1e3d470539fd8d10e1d6870c81e7194a9efbd69c5

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 bf36f7c40d880bdfc4b5548c32ef4bc244f1d0a858992b0dc1c56c4962e8381f
MD5 0993352618c87704e607a025207f7a47
BLAKE2b-256 a2a3749cff5ea4f8d77088e43604b961a56641ae8c95f27c99a86f91779320f6

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 f518e5a11773fd9a8cf8c299237663bfd7533fe2d52bc6bc729862415955ab99
MD5 4ed0d682717f62ed0263d93fc3d4d6f3
BLAKE2b-256 303bb0d818b3097ad28b80633b330a48d64d0a6628062d7d5f1052ede5b90303

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 609908f148bc3a7f975acf83469c54f9dd807821535517b9ddf5a46c27a35641
MD5 580772e4fefe4fb4850d8ba9ba683901
BLAKE2b-256 906e54aabf661234e799eb62166d097f9f29d8b46b8ffc55f95f59140b99ff98

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 df3277a64208ba4c93b126f5cd4dc29ac09e81ae485364b63a1f49625b596e32
MD5 7e5400009b74a880326d6a4cf0dda56e
BLAKE2b-256 f3ac4545b9f3de16e205a921d49f1fde4b18ae14e6b09ec3856a4c3cdeb0d6a6

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.12.2-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.12.2-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 4477879a8bd52f04dacfcf41a603a44d6839df46e2bf1f73abf75657e282129d
MD5 b4d3c927acde79e1d1cb549d75a8da43
BLAKE2b-256 c758144d7d049f3352c545093d5c46e343a1989fc7dc29b2777ebbab4f8eed0a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page