Skip to main content

Graph Language Models

Project description

Graph Language Models

build tests

License MIT Code style: black

PyPI version PyPI - Python Version

PyPI - Downloads

Getting Started

Finding entities and relations via NLP on text and documents

To get easily started, simply install the deepsearch-glm package from PyPi. This can be done using the traditional pip install deepsearch-glm or via poetry poetry add deepsearch-glm.

Below, you can find the code-snippet to process pieces of text,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""

res = mdl.apply_on_text(text)
print_on_shell(text, res)

The last command will print the pandas dataframes on the shell and provides the following output,

text:

   #France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.

properties:

         type label  confidence
0  language    en    0.897559

instances:

  type         subtype               subj_path      char_i    char_j  original
-----------  --------------------  -----------  --------  --------  ---------------------------------------------------------------------
sentence                           #                   1       180  France (French: [fʁɑ̃s] Listen), officially the French Republic
                                                                    (French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
                                                                    located primarily in Western Europe.
term         single-term           #                   1         8  #France
expression   wtoken-concatenation  #                   1         8  #France
parenthesis  round brackets        #                   9        36  (French: [fʁɑ̃s] Listen)
expression   wtoken-concatenation  #                  18        28  [fʁɑ̃s]
term         single-term           #                  29        35  Listen
term         single-term           #                  53        68  French Republic
parenthesis  round brackets        #                  69       125  (French: République française [ʁepyblik fʁɑ̃sɛz])
term         single-term           #                  78       100  République française
term         single-term           #                 112       124  fʁɑ̃sɛz]
parenthesis  reference             #                 126       130  [14]
numval       ival                  #                 127       129  14
term         single-term           #                 136       143  country
term         single-term           #                 165       179  Western Europe
sentence                           #                 181       373  It also includes overseas regions and territories in the Americas and
                                                                    the Atlantic, Pacific and Indian Oceans, giving it one of the largest
                                                                    discontiguous exclusive economic zones in the world.
term         single-term           #                 198       214  overseas regions
term         enum-term-mark-3      #                 207       230  regions and territories
term         single-term           #                 219       230  territories
term         single-term           #                 238       246  Americas
term         enum-term-mark-4      #                 255       290  Atlantic, Pacific and Indian Oceans
term         single-term           #                 255       263  Atlantic
term         single-term           #                 265       272  Pacific
term         single-term           #                 277       290  Indian Oceans
term         single-term           #                 313       359  largest discontiguous exclusive economic zones
term         single-term           #                 367       372  world

The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,

from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell

load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()

with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
    doc = json.load(fr)

enriched_doc = mdl.apply_on_doc(doc)

Creating Graphs from NLP entities and relations in document collections

To create graphs, you need two ingredients, namely,

  1. a collection of text or documents
  2. a set of NLP models that provide entities and relations

Below is a code snippet to create the graph using these basic ingredients,

odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"

glm = create_glm_from_docs(odir, json_files, model_names)	

Querying Graphs

TBD

Install for development

Python installation

To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,

poetry install

CXX compilation

To compile from scratch, simply run the following command in the deepsearch-glm root folder to create the build directory,

cmake -B ./build; 

Next, compile the code from scratch,

cmake --build ./build -j

Run using the Python Interface

NLP and GLM examples

To run the examples, simply do execute the scripts as poetry run python <script> <input>. Examples are,

  1. apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
  1. analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json' 
  1. create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf

Deep Search utilities

  1. Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
  1. Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"

Run using CXX executables

If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form

./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>

In both cases, the modes can be queried directly via the -h or --help

./nlp.exe -h
./glm.exe -h

and the configuration files can be generated,

./nlp.exe -m create-configs
./glm.exe -m create-configs

Natural Language Processing (NLP)

After you have generated the configuration files (see above), you can

  1. train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
  1. leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json

Graph Language Models (GLM)

  1. create a GLM
./glm.exe -m create -c glm_config_create.json
  1. explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json

Testing

To run the tests, simply execute (after installation),

poetry run pytest ./tests -vvv -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

deepsearch_glm-0.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.13.0-cp311-cp311-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ x86-64

deepsearch_glm-0.13.0-cp311-cp311-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

deepsearch_glm-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.13.0-cp310-cp310-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

deepsearch_glm-0.13.0-cp310-cp310-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

deepsearch_glm-0.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.13.0-cp39-cp39-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

deepsearch_glm-0.13.0-cp39-cp39-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

deepsearch_glm-0.13.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

deepsearch_glm-0.13.0-cp38-cp38-macosx_12_0_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

deepsearch_glm-0.13.0-cp38-cp38-macosx_12_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

File details

Details for the file deepsearch_glm-0.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9a68e5e200bf8626d1414213c64b72459dee19666490b6e4106347515c8fc54a
MD5 b757653ade95185f55c80349c4a69aef
BLAKE2b-256 3e5097a7a6178a16bbdbadd797f36cb40bd441ae310fe11c5a3a4e0b50533213

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp311-cp311-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 e0852e3c8803aabe1c4646517deca53be30cd1847e69bb52f2ce516bade64949
MD5 fabcd003536797b8da3371d283e236fd
BLAKE2b-256 8830bbbc390047fd7b8003100518134505846947ef0b7a96f19a06bcfdbeb31d

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 044903dc1ba4426bdf093d168af367976cd2d219a8cf4652445a5c3d78bd071e
MD5 1b14fdf463889aa8508aca4d1d962365
BLAKE2b-256 e28813108c9c44f77ee47334ac9d97bfdeedcd265c2605b2d17a6736575181e1

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 625e0b86a19c1e1d0254b15faa11949534ee8085428ae56a112943b0afea7c0a
MD5 994dd32a4a576092686614a0a16b9179
BLAKE2b-256 ef5ec556d150f609c092c9965afeb0d02a78bc6a24fd7902935a10a151e1c084

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 19751ce82cba547aa8424bcd382b77cc896cffc2f2049d496a28f62d27ee57ee
MD5 8bf314ff94508c0896ded419f7ce4001
BLAKE2b-256 411818e1d6a7424ae3d0aa71741dc705aa84045ea1ab88510fc79e208628f2f8

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 1e6ba1bc3ae95b0ac116a824545b2d6fd909062bc801f7619ffdd263697bc55e
MD5 752506db5912006b2266ac15889559d2
BLAKE2b-256 aefe9a2bca4d49180cc5415676ca38f569bfc344c7a51bc455198d3d7950b84b

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8dee2e505d34ca482277b3ebbaebfebd0e4041764f1cb29f268de62e3db2d9b7
MD5 deedf23eec363ec721ddd9d7c83ede9a
BLAKE2b-256 d849ce2eae222d98424fa7c885f7f142ad712671068a38efe4af55ffe7ba9276

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 67f60f14034ed08a3ea144a035600ade3eda354876bea5b341b5b69f9540cb45
MD5 f5394f0a9b11e9e82e05dfe1ffd0be23
BLAKE2b-256 3f11a42d095257793e7bcc6fad4e1f88dc34a1373be2dfff32e97580ff6b3166

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 76ef6d8b245d03aca57023a0b090fcf7ed7f18a7eee27e059b507f4e572cfb9b
MD5 b8a15b14856b07e9d82b401d2a03572a
BLAKE2b-256 e0b2480a0734ce0e30ce00028f1f69d36e58df69afc8fdf9f5a421528c77932d

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 db231dd27f01342bca0d66384f6ff9d82876ce633071bdf249e51681d56b9fc8
MD5 0eb8325410e1d98c90dbc6d8bba9f207
BLAKE2b-256 f74ecee8f3f7b453fcf86719edff176d7933871126ca9f93ba714e49d6a21595

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 571e0e65393491904e7c9a4007bea9b22c53bccc4fd9f969f62d2fa86558ae02
MD5 746fa020f32c194035f884510f20ac8b
BLAKE2b-256 4219e943fbdbb46cc784addf39fdb7aed21a8a737723e58417daf4ae2ea95776

See more details on using hashes here.

Provenance

File details

Details for the file deepsearch_glm-0.13.0-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for deepsearch_glm-0.13.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 a0d963e585342200f826ac97f4ac0a4053bd5a018632db9244c72e1a48677c68
MD5 6dd14d7003e6aa8f5f1e1daafbcc0da2
BLAKE2b-256 1b069140ab7ac2f19e6b75f45c491bd5f6fa2f08b2e5454285486b51e5eb4a21

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page