Elasticsearch Extensions for Hebrew and Multi-language Text Processing
Project description
TRelasticExt
A Python package for enhanced Elasticsearch (8.x.x) operations with specialized support for Hebrew text processing.
Installation
pip install trelasticext
Features
- Elasticsearch document management (create, update, delete)
- Hebrew text tokenization and analysis
- Text similarity search with customizable parameters
- Bulk operations for efficient data handling
- Support for cross-index and cross-cluster operations
Quick Start
import trelasticext as ee
# Connect to Elasticsearch
es_params = {"ehost": "http://localhost:9200", "index": "my_index"}
# Search for documents
results = ee.get_es_records_by_field("location", "document_location", es_params)
# Tokenize Hebrew text
tokens = ee.ftokens("שלום עולם")
# Build a custom query
query = ee.query_builder(
text="מילים לחיפוש",
fields=["content", "title"],
fuzziness=1,
query_type="multimatch"
)
Advanced Usage
Document Operations
# Add a document
doc = {"title": "Sample Document", "content": "This is a test"}
ee.post_es_record(doc, es_params)
# Update a document by ID
update_doc = {"doc": {"title": "Updated Title"}}
ee.update_by_id("doc_id", update_doc, es_params["index"], es_params)
# Delete a document
ee.delete_es_record("doc_id", es_params)
Bulk Operations
# Copy data between Elasticsearch clusters
source_params = {"ehost": "http://source-es:9200", "index": "source_index"}
target_params = {"ehost": "http://target-es:9200", "index": "target_index"}
ee.copy_records_between_hosts(source_params, target_params, clear_data=True)
Text Analysis
# Analyze text using Elasticsearch analyzers
analysis = ee.get_es_analyze_text(
index="my_index",
analyzer="hebrew",
text="טקסט לניתוח",
es_params=es_params
)
# Get tokens with language filtering
hebrew_tokens = ee.ftokens("מילים באנגלית and Hebrew", lang=["HEB"])
API Reference
Main Functions
get_es_records_by_field(field, value, es_params, les=None, default_result=False, all_records=False)get_es_source_by_field(field, value, es_params, les=None)post_es_record(doc, es_params, les=None)update_by_id(doc_id, doc, index, es_params, les=None)delete_es_record(doc_id, es_params, les=None)ftokens(text, spliter=None, lang=None)query_builder(text, fields=["sentence"], fuzziness=0, ...)
Requirements
- Python 3.6+
- elasticsearch
- pandas
- numpy
- fasttext
- Levenshtein
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
trelasticext-0.2.1.tar.gz
(19.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trelasticext-0.2.1.tar.gz.
File metadata
- Download URL: trelasticext-0.2.1.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45843346323fd4dd62c25bd19181982d65ef080d27b92bacd20bed3af5436507
|
|
| MD5 |
d7d791274524879edc70040236e8a405
|
|
| BLAKE2b-256 |
8de563f9abc6be1de57ff3801c7adfb7ad7a67937fdbb1be9217ff4f73e98b4d
|
File details
Details for the file trelasticext-0.2.1-py3-none-any.whl.
File metadata
- Download URL: trelasticext-0.2.1-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f21ccdaa267c57ce85022fda0fabd81c4557357ca4a1b5dc5699c05f46c55b8
|
|
| MD5 |
b80ef10b7ca455a9f5b1a714838fea21
|
|
| BLAKE2b-256 |
a80a4330f9ab65fa5619230e2d04f3ca4e0a4d2bebdfa777c595e251266a9769
|