Elasticsearch Extensions for Hebrew and Multi-language Text Processing
Project description
TRelasticExt
A Python package for enhanced Elasticsearch (8.x.x) operations with specialized support for Hebrew text processing.
Installation
pip install trelasticext
Features
- Elasticsearch document management (create, update, delete)
- Hebrew text tokenization and analysis
- Text similarity search with customizable parameters
- Bulk operations for efficient data handling
- Support for cross-index and cross-cluster operations
Quick Start
import trelasticext as ee
# Connect to Elasticsearch
es_params = {"ehost": "http://localhost:9200", "index": "my_index"}
# Search for documents
results = ee.get_es_records_by_field("location", "document_location", es_params)
# Tokenize Hebrew text
tokens = ee.ftokens("שלום עולם")
# Build a custom query
query = ee.query_builder(
text="מילים לחיפוש",
fields=["content", "title"],
fuzziness=1,
query_type="multimatch"
)
Advanced Usage
Document Operations
# Add a document
doc = {"title": "Sample Document", "content": "This is a test"}
ee.post_es_record(doc, es_params)
# Update a document by ID
update_doc = {"doc": {"title": "Updated Title"}}
ee.update_by_id("doc_id", update_doc, es_params["index"], es_params)
# Delete a document
ee.delete_es_record("doc_id", es_params)
Bulk Operations
# Copy data between Elasticsearch clusters
source_params = {"ehost": "http://source-es:9200", "index": "source_index"}
target_params = {"ehost": "http://target-es:9200", "index": "target_index"}
ee.copy_records_between_hosts(source_params, target_params, clear_data=True)
Text Analysis
# Analyze text using Elasticsearch analyzers
analysis = ee.get_es_analyze_text(
index="my_index",
analyzer="hebrew",
text="טקסט לניתוח",
es_params=es_params
)
# Get tokens with language filtering
hebrew_tokens = ee.ftokens("מילים באנגלית and Hebrew", lang=["HEB"])
API Reference
Main Functions
get_es_records_by_field(field, value, es_params, les=None, default_result=False, all_records=False)get_es_source_by_field(field, value, es_params, les=None)post_es_record(doc, es_params, les=None)update_by_id(doc_id, doc, index, es_params, les=None)delete_es_record(doc_id, es_params, les=None)ftokens(text, spliter=None, lang=None)query_builder(text, fields=["sentence"], fuzziness=0, ...)
Requirements
- Python 3.6+
- elasticsearch
- pandas
- numpy
- fasttext
- Levenshtein
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
trelasticext-0.2.0.tar.gz
(17.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trelasticext-0.2.0.tar.gz.
File metadata
- Download URL: trelasticext-0.2.0.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0326b96c84986d93d2b981df7ae0ae757722315d428b286a3fd854bf017b9c9
|
|
| MD5 |
5bb9618c4ca971497b0e80d45031de89
|
|
| BLAKE2b-256 |
8e641f1cbe18645b03047c0ec8812b64275ef7e96a3cb4a668c2759445ac7cea
|
File details
Details for the file trelasticext-0.2.0-py3-none-any.whl.
File metadata
- Download URL: trelasticext-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
949006b87e261ed52b719b661907a34f6c8cbb63bee16063528ed2d8c5b75de1
|
|
| MD5 |
adac0874d0d0745c8ba432452a2f30c5
|
|
| BLAKE2b-256 |
23f88d65779a8d7dc423935216719153d34bb8b6fbe70646bccd8848daa8f372
|