Graph Language Models
Project description
Graph Language Models
Getting Started
Finding entities and relations via NLP on text and documents
To get easily started, simply install the deepsearch-glm
package from PyPi. This can be
done using the traditional pip install deepsearch-glm
or via poetry poetry add deepsearch-glm
.
Below, you can find the code-snippet to process pieces of text,
from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell
load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()
# from Wikipedia (https://en.wikipedia.org/wiki/France)
text = """
France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans,[XII] giving it one of the largest discontiguous exclusive
economic zones in the world.
"""
res = mdl.apply_on_text(text)
print_on_shell(text, res)
The last command will print the pandas dataframes on the shell and provides the following output,
text:
#France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe. It also includes overseas regions
and territories in the Americas and the Atlantic, Pacific and Indian
Oceans, giving it one of the largest discontiguous exclusive economic
zones in the world.
properties:
type label confidence
0 language en 0.897559
instances:
type subtype subj_path char_i char_j original
----------- -------------------- ----------- -------- -------- ---------------------------------------------------------------------
sentence # 1 180 France (French: [fʁɑ̃s] Listen), officially the French Republic
(French: République française [ʁepyblik fʁɑ̃sɛz]),[14] is a country
located primarily in Western Europe.
term single-term # 1 8 #France
expression wtoken-concatenation # 1 8 #France
parenthesis round brackets # 9 36 (French: [fʁɑ̃s] Listen)
expression wtoken-concatenation # 18 28 [fʁɑ̃s]
term single-term # 29 35 Listen
term single-term # 53 68 French Republic
parenthesis round brackets # 69 125 (French: République française [ʁepyblik fʁɑ̃sɛz])
term single-term # 78 100 République française
term single-term # 112 124 fʁɑ̃sɛz]
parenthesis reference # 126 130 [14]
numval ival # 127 129 14
term single-term # 136 143 country
term single-term # 165 179 Western Europe
sentence # 181 373 It also includes overseas regions and territories in the Americas and
the Atlantic, Pacific and Indian Oceans, giving it one of the largest
discontiguous exclusive economic zones in the world.
term single-term # 198 214 overseas regions
term enum-term-mark-3 # 207 230 regions and territories
term single-term # 219 230 territories
term single-term # 238 246 Americas
term enum-term-mark-4 # 255 290 Atlantic, Pacific and Indian Oceans
term single-term # 255 263 Atlantic
term single-term # 265 272 Pacific
term single-term # 277 290 Indian Oceans
term single-term # 313 359 largest discontiguous exclusive economic zones
term single-term # 367 372 world
The NLP can also be applied on entire documents which were converted using Deep Search. A simple example is shown below,
from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
from deepsearch_glm.nlp_utils import init_nlp_model, print_on_shell
load_pretrained_nlp_models(force=False, verbose=False)
mdl = init_nlp_model()
with open("<path-to-json-file-of-converted-pdf-doc>", "r") as fr:
doc = json.load(fr)
enriched_doc = mdl.apply_on_doc(doc)
Creating Graphs from NLP entities and relations in document collections
To create graphs, you need two ingredients, namely,
- a collection of text or documents
- a set of NLP models that provide entities and relations
Below is a code snippet to create the graph using these basic ingredients,
odir = "<ouput-dir-to-save-graph>"
json_files = ["json-file of converted PDF document"]
model_names = "<list of NLP models:langauge;term;verb;abbreviation>"
glm = create_glm_from_docs(odir, json_files, model_names)
Querying Graphs
TBD
Install for development
Python installation
To use the python interface, first make sure all dependencies are installed. We use poetry for that. To install all the dependent python packages and get the python bindings, simply execute,
poetry install
CXX compilation
To compile from scratch, simply run the following command in the deepsearch-glm
root folder to
create the build
directory,
cmake -B ./build;
Next, compile the code from scratch,
cmake --build ./build -j
Run using the Python Interface
NLP and GLM examples
To run the examples, simply do execute the scripts as poetry run python <script> <input>
. Examples are,
- apply NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --pdf './data/documents/articles/2305.*.pdf' --models 'language;term'
- analyse NLP on document(s)
poetry run python ./deepsearch_glm/nlp_apply_on_docs.py --json './data/documents/articles/2305.*.nlp.json'
- create GLM from document(s)
poetry run python ./deepsearch_glm/glm_create_from_docs.py --pdf ./data/documents/reports/2022-ibm-annual-report.pdf
Deep Search utilities
- Query and download document(s)
poetry run python ./deepsearch_glm/utils/ds_query.py --index patent-uspto --query "\"global warming potential\" AND \"etching\""
- Converting PDF document(s) into JSON
poetry run python ./deepsearch_glm/utils/ds_convert.py --pdf './data/documents/articles/2305.*.pdf'"
Run using CXX executables
If you like to be bare-bones, you can also use the executables for NLP and GLM's directly. In general, we follow a simple scheme of the form
./nlp.exe -m <mode> -c <JSON-config file>
./glm.exe -m <mode> -c <JSON-config file>
In both cases, the modes can be queried directly via the -h
or --help
./nlp.exe -h
./glm.exe -h
and the configuration files can be generated,
./nlp.exe -m create-configs
./glm.exe -m create-configs
Natural Language Processing (NLP)
After you have generated the configuration files (see above), you can
- train simple NLP models
./nlp.exe -m train -c nlp_train_config.json
- leverage pre-trained models
./nlp.exe -m predict -c nlp_predict.example.json
Graph Language Models (GLM)
- create a GLM
./glm.exe -m create -c glm_config_create.json
- explore interactively the GLM
./glm.exe -m explore -c glm_config_explore.json
Testing
To run the tests, simply execute (after installation),
poetry run pytest ./tests -vvv -s
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file deepsearch_glm-0.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.8 Linux/6.5.0-1016-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12f94be4fa3794bff095b1db8fd44e6cd5ef7f4dd7d9add6f135333558d2d0b4 |
|
MD5 | 811e9c86597566275d1733c771caf53c |
|
BLAKE2b-256 | 5c006446d3b221920f7bc8b445ab8fbd85e0522cfbabd1241478443bcd73dd1b |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_x86_64.whl
- Upload date:
- Size: 6.4 MB
- Tags: CPython 3.11, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.8 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d5df836410950c02aef9197b250168b64b67169685053d8d9cf9cc6921d39d3 |
|
MD5 | 7b0c706772439f3fabb6ccbd5224ebe4 |
|
BLAKE2b-256 | 33619016b8727a899b00b2b6a5d80e90af107247c580b36652a0b8be0b7ace97 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_arm64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp311-cp311-macosx_12_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.11, macOS 12.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.8 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bbaeb0d69fc9a317ccc44a8aaa14bd302b6118a9758ba027d50393bdf5b5592 |
|
MD5 | 7005b183cf0e2fe4ea286b949794c250 |
|
BLAKE2b-256 | 3370a782716c06eeddd82bc047027cb696e4f171258f32f6ca0d815bde0b6e5f |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.13 Linux/6.5.0-1016-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b9de4abfc8f301e5af1d4d2bc042203cfa3197df05584289b129467d9ad63a4 |
|
MD5 | c229ee0860a71398f8bae35771261432 |
|
BLAKE2b-256 | 75833afaf7396be4287ef8f2aea263d250ba74027ad948b9dad46156464a5d05 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_x86_64.whl
- Upload date:
- Size: 6.4 MB
- Tags: CPython 3.10, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.13 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f574475312bfb7efffdb75b581d4be4f490856273df97d4a59d48676e42d6553 |
|
MD5 | 9249df56c2e12be12fbf93cb8f4aed75 |
|
BLAKE2b-256 | 2d686a6e937dda2a1896acd6e2af02d22d054c320d4ec60b19c5cc3bbefce89a |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_arm64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp310-cp310-macosx_12_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.10, macOS 12.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.13 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 541a907903fd4c992aaf596b2fb41253a9e0fb04fee1b9587f897a607cf3d517 |
|
MD5 | c02d7d191e6ad7bad812c31f671556c6 |
|
BLAKE2b-256 | 1066c8eaef7fb2f7be9e38aa94b23e0d89881adeb804b4da7fac179bcb031d1e |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/6.5.0-1016-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06e8b58193508b1bcc67f501b96fc64529f4811b415dbae141249349202d2654 |
|
MD5 | 2a93950c72c7b1db3d52a3a2baa0b8ad |
|
BLAKE2b-256 | f7fef1beefa991211cf44a985ca185d4568682755b4bbe7118a77b210ffa4ce9 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_x86_64.whl
- Upload date:
- Size: 6.4 MB
- Tags: CPython 3.9, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24e9696a9b9c3a94635f9d7b6037de993e3764e6398a4fe3c8dd50c4c42a1541 |
|
MD5 | 9c2e267fc4027981d7afa483d1a16459 |
|
BLAKE2b-256 | 432591de839732c2219063022094e09ee27f13232cb7d3559f00a3d8b4bbc804 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_arm64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp39-cp39-macosx_12_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.9, macOS 12.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6dd8b007f1035b48c93ca487497852528585e2455fd144ad603c9268cbdb920a |
|
MD5 | 108f14a47d47e847a30a825750058f26 |
|
BLAKE2b-256 | 6acafed62a1fc9b5b8e444506d60edc3fefb55220a4a690cec4ef29fc6511d72 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.8.18 Linux/6.5.0-1016-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f0c64c7ee7bc862c9b2e2da04e60ac1f3f665d439cb5fc40bcb18e7971c6138 |
|
MD5 | e92ba0521953f86e713e20bbebb6b257 |
|
BLAKE2b-256 | 7a0dcc0d58551d53979660048829e26ab699df0611c10b558524ba22907b43d3 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_x86_64.whl
- Upload date:
- Size: 6.4 MB
- Tags: CPython 3.8, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.8.18 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 914d7e880a9fee447291203cae4e5a9ce132027be30adf71337c0eaa187bbad1 |
|
MD5 | 6a750246b39d89a0e4a4b63332e0eec0 |
|
BLAKE2b-256 | 9f7b1ad664de1cae81bec1e4649d5a0fb77fdd142a1152ce16a411755e895919 |
Provenance
File details
Details for the file deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_arm64.whl
.
File metadata
- Download URL: deepsearch_glm-0.16.1-cp38-cp38-macosx_12_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.8, macOS 12.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.8.18 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1896823579ca003334b81a8c81650ca1b0bd14569f29e0c3a49f03875e1f01fd |
|
MD5 | d4302bee16a855c1678412c79d9fd88f |
|
BLAKE2b-256 | fa76b29e7c06c1b718b4bffce2fe5fc365388f52833824c78adc307255655282 |