Serialization & bulk indexing package for Elasticsearch; based on elasticsearch-dsl.py, supports multi-processing, Django
Project description
This is a modern replacement of django-simple-elasticsearch (DSE). Both Django and Elasticsearch have seen major changes over the years; this is a move to keep up.
Why not just update django-simple-elasticsearch?
- DSE is Django-specific; I wanted to build a solution that could be used in a broader scope of applications
- To start fresh and avoid assumptions made in the DSE project
- Dropped support for Python 2
Details
- Flexible and modular; eg. Django support is available via a 'contrib' module
- Supports multi-process indexing and asynchronous IO via
gevent
- Depends on elasticsearch-dsl-py rather than the low level elasticsearch-py package
- You get a lot of functionality for free!
- Python 3 only
- esdocs >= 0.6 supports Elasticsearch 7.x
Installation
pip install esdocs
If multi-process indexing is desired, you will want to install it along with the necessary gevent
dependencies:
pip install esdocs[gevent]
Command Line Usage
$ esdocs -h
usage: esdocs [-h] [-v] [--version] [--no_input] [--indexes INDEXES]
[--using USING] [--multi [MULTI]]
{list,init,update,rebuild,cleanup} ...
optional arguments:
-h, --help show this help message and exit
-v, --verbose increase output verbosity
--version show program's version number and exit
--no_input, --noinput
Do not prompt for user input (assumes 'Yes' for
actions)
--indexes INDEXES Comma-separate list of index names to target
--using USING Elasticsearch named connection to use
--multi [MULTI] Enable multiple processes and optionally set number of
CPU cores to use (defaults to all cores)
commands:
{list,init,update,rebuild,cleanup}
list List indexes
init Initialize indexes
update Update indexes
rebuild Rebuild indexes
cleanup Delete unaliased indexes
To rebuild indexes specified by document serializers in ESDOCS_SERIALIZER_MODULES
:
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"
esdocs rebuild
Multi-process indexing:
export ESDOCS_GEVENT=y
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"
# auto detect number of CPU cores to use
esdocs rebuild --multiproc
# specify the number of cores to use
esdocs rebuild --multiproc --numprocs=4
Django
You must specify ESDOCS_SERIALIZER_MODULES
in your Django settings and add esdocs.contrib.esdjango
to your
INSTALLED_APPS
. You can optionally set ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS
as well:
INSTALLED_APPS = [
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
...,
'esdocs.contrib.esdjango'
]
ESDOCS_SERIALIZER_MODULES = [
'mypackage.module1',
'myotherpackage.module2'
]
# these are the current defaults for this setting
ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS = [
'esdocs.contrib.esdjango.compatibility.manager',
'esdocs.contrib.esdjango.compatibility.geosgeometry',
'esdocs.contrib.postgresql.compatibility.range_field'
]
Serializing Data
For esdocs to work, you need to define Document
and Serializer
(or DjangoSerializer
) subclasses to index
your data. Document
comes from the excellent elasticsearch-dsl-py, while Serializer
/DjangoSerializer
are
a part of esdocs.
Document
defines the Elasticsearch field mappingsSerializer
is associated with aDocument
Serializer
defines how to retrieve the dataset- For each record in your dataset, the
Serializer
will attempt to retrieve a value for each field defined on the associatedDocument
- There are a number of methods you can implement on a
Serializer
to retrieve (or construct/munge) each value
- There are a number of methods you can implement on a
Examples
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file esdocs-0.6.2.tar.gz
.
File metadata
- Download URL: esdocs-0.6.2.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d3f1fddde8e98776ccb67336d5cf992486a5f93e04e1f5439ddf5e4c73ff166 |
|
MD5 | 183a35b246b37f502a9c5c420ef6c615 |
|
BLAKE2b-256 | a90a8a77ede2a886432461aef27179f5df7fce3fb3e91bb26893748acd5f3961 |
File details
Details for the file esdocs-0.6.2-py3-none-any.whl
.
File metadata
- Download URL: esdocs-0.6.2-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e099c8acf374ed07673242cd9f38da422e3ff6cd91c41ac02bf5d75ff6eab76d |
|
MD5 | 22d1def1fd16c41098e0b011c85ee419 |
|
BLAKE2b-256 | 8dadda649be06622c5ffca420cd0858dae95ef56a9256db8cff036f5b4a98159 |