GECK (Garden Of Eden Creation Kit) is a toolkit for setting up and maintaning STC
Project description
GECK (Garden of Eden Creation Kit)
GECK is a Python library and Bash tool to deploy and access STC - the large corpus of scholarly texts. GECK includes embedded search engine Summa, helps to feed it with a prepared IPFS-based database of scholarly texts, do search queries over the database and iterate over all documents if you need.
Install
Firstly, You should have installed IPFS
Pre-built wheels of libstc-geck are available for Python 3.8, 3.9, 3.10 and 3.11
pip install libstc-geck
Usage
Attention! STC does not contain every book or publication in the world. We are constantly increasing coverage but there is still a lot to do.
STC contains metadata for the most of the items, but links or content fields may be absent.
CLI
# (Optional) Launch standalone Summa search engine, then you will not have to wait bootstrapping every time.
# It will take a time! Wait until the text `Serving on ...` appears
# If you decided to launch it, switch to another Terminal window
ultranymous@nevermore:~ geck - serve
INFO: Serving on 127.0.0.1:10082
# Iterate over all stored documents
ultranymous@nevermore:~ geck - documents
INFO: Setting up indices...
# Do a match search by field
ultranymous@nevermore:~ geck - search uris:"doi:10.3384/ecp1392a41"
INFO: Setting up indices...
INFO: Searching uris:"doi:10.3384/ecp1392a41"...
{"abstract": "In recent years, water hydraulics has been getting more <...> "type": "proceedings-article", "updated_at": 1687530737}
# Do a match search by word. In the example below documents are cut for displaying reason
ultranymous@nevermore:~ geck - search hemoglobin --limit 3
INFO: Setting up indices...
INFO: Searching hemoglobin...
{"abstract": "Abstract\nWe exa <...>
{"abstract": "Abstract\nUsing a <...>
{"abstract": "Regional cerebral <...>
You can add --debug flag after geck to enable debugging output.
Python
import argparse
import asyncio
from stc_geck.advices import format_document
from stc_geck.client import StcGeck
DEFAULT_LIMIT = 5
async def main(limit: int):
geck = StcGeck(
ipfs_http_base_url='http://127.0.0.1:8080',
timeout=300,
)
# Connects to IPFS and instantiate configured indices for searching
# It will take a time depending on your IPFS performance
await geck.start()
# GECK encapsulates Python client to Summa.
# It can be either external stand-alone server or embed server,
# but details are hidden behind `SummaClient` interface.
summa_client = geck.get_summa_client()
# Match search returns top-5 documents which contain `additive manufacturing` in their title, abstract or content.
documents = await summa_client.search_documents({
"index_alias": "stc",
"query": {
"match": {
"value": "additive manufacturing",
"query_parser_config": {"default_fields": ["abstract", "title", "content"]}
}
},
"collectors": [{"top_docs": {"limit": limit}}],
"is_fieldnorms_scoring_enabled": False,
})
for document in documents:
print(format_document(document) + '\n')
await geck.stop()
if __name__ == "__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('--limit', type=int, default=DEFAULT_LIMIT)
args = argparser.parse_args()
asyncio.run(main(args.limit))
More example for Python can be found in examples directory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file libstc_geck-2.1.3.tar.gz.
File metadata
- Download URL: libstc_geck-2.1.3.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19b7b7b8e03c5bd699c433d3893368808a38f6d631dd2fc8678d85300199534b
|
|
| MD5 |
c5ed3affdacfa46dc00722e7100b69d5
|
|
| BLAKE2b-256 |
a326ae764c27643a6d463ef713bd9bde24610c0a6e90e695c7b802ce311e2f1a
|
File details
Details for the file libstc_geck-2.1.3-py2.py3-none-any.whl.
File metadata
- Download URL: libstc_geck-2.1.3-py2.py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29fc00faeee978322d3099e7ec36e90097a980d5c7a7f79511116a58b403070c
|
|
| MD5 |
b2143ffa453ecfae75077d8d5ff61a96
|
|
| BLAKE2b-256 |
e73aecc6842d9583d20cc658d63fd9dfc5307be559e7bd0660b9caf67ea23ea1
|