Serialization & bulk indexing package for Elasticsearch; based on elasticsearch-dsl.py, supports multi-processing, Django
Project description
This is a modern replacement of django-simple-elasticsearch (DSE). Both Django and Elasticsearch have seen major changes over the years; this is a move to keep up.
Why not just update django-simple-elasticsearch?
- DSE is Django-specific; I wanted to build a solution that could be used in a broader scope of applications
- To start fresh and avoid assumptions made in the DSE project
- Dropped support for Python 2
Details
- Flexible and modular; eg. Django support is available via a 'contrib' module
- Supports multi-process indexing and asynchronous IO via
gevent - Depends on elasticsearch-dsl-py rather than the low level elasticsearch-py package
- You get a lot of functionality for free!
- Python 3 only
- esdocs >= 0.6 supports Elasticsearch 7.x
Installation
pip install esdocs
If multi-process indexing is desired, you will want to install it along with the necessary gevent dependencies:
pip install esdocs[gevent]
Command Line Usage
$ esdocs -h
usage: esdocs [-h] [-v] [--version] [--no_input] [--indexes INDEXES]
[--using USING] [--multi [MULTI]]
{list,init,update,rebuild,cleanup} ...
optional arguments:
-h, --help show this help message and exit
-v, --verbose increase output verbosity
--version show program's version number and exit
--no_input, --noinput
Do not prompt for user input (assumes 'Yes' for
actions)
--indexes INDEXES Comma-separate list of index names to target
--using USING Elasticsearch named connection to use
--multi [MULTI] Enable multiple processes and optionally set number of
CPU cores to use (defaults to all cores)
commands:
{list,init,update,rebuild,cleanup}
list List indexes
init Initialize indexes
update Update indexes
rebuild Rebuild indexes
cleanup Delete unaliased indexes
To rebuild indexes specified by document serializers in ESDOCS_SERIALIZER_MODULES:
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"
esdocs rebuild
Multi-process indexing:
export ESDOCS_GEVENT=y
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"
# auto detect number of CPU cores to use
esdocs rebuild --multiproc
# specify the number of cores to use
esdocs rebuild --multiproc --numprocs=4
Django
You must specify ESDOCS_SERIALIZER_MODULES in your Django settings and add esdocs.contrib.esdjango to your
INSTALLED_APPS. You can optionally set ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS as well:
INSTALLED_APPS = [
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
...,
'esdocs.contrib.esdjango'
]
ESDOCS_SERIALIZER_MODULES = [
'mypackage.module1',
'myotherpackage.module2'
]
# these are the current defaults for this setting
ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS = [
'esdocs.contrib.esdjango.compatibility.manager',
'esdocs.contrib.esdjango.compatibility.geosgeometry',
'esdocs.contrib.postgresql.compatibility.range_field'
]
Serializing Data
For esdocs to work, you need to define Document and Serializer (or DjangoSerializer) subclasses to index
your data. Document comes from the excellent elasticsearch-dsl-py, while Serializer/DjangoSerializer are
a part of esdocs.
Documentdefines the Elasticsearch field mappingsSerializeris associated with aDocumentSerializerdefines how to retrieve the dataset- For each record in your dataset, the
Serializerwill attempt to retrieve a value for each field defined on the associatedDocument- There are a number of methods you can implement on a
Serializerto retrieve (or construct/munge) each value
- There are a number of methods you can implement on a
Examples
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file esdocs-0.6.2.tar.gz.
File metadata
- Download URL: esdocs-0.6.2.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d3f1fddde8e98776ccb67336d5cf992486a5f93e04e1f5439ddf5e4c73ff166
|
|
| MD5 |
183a35b246b37f502a9c5c420ef6c615
|
|
| BLAKE2b-256 |
a90a8a77ede2a886432461aef27179f5df7fce3fb3e91bb26893748acd5f3961
|
File details
Details for the file esdocs-0.6.2-py3-none-any.whl.
File metadata
- Download URL: esdocs-0.6.2-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e099c8acf374ed07673242cd9f38da422e3ff6cd91c41ac02bf5d75ff6eab76d
|
|
| MD5 |
22d1def1fd16c41098e0b011c85ee419
|
|
| BLAKE2b-256 |
8dadda649be06622c5ffca420cd0858dae95ef56a9256db8cff036f5b4a98159
|