Skip to main content

Serialization & bulk indexing package for Elasticsearch; based on elasticsearch-dsl.py, supports multi-processing, Django

Project description

This is a modern replacement of django-simple-elasticsearch (DSE). Both Django and Elasticsearch have seen major changes over the years; this is a move to keep up.

Why not just update django-simple-elasticsearch?
  • DSE is Django-specific; I wanted to build a solution that could be used in a broader scope of applications
  • To start fresh and avoid assumptions made in the DSE project
  • Dropped support for Python 2
Details
  • Flexible and modular; eg. Django support is available via a 'contrib' module
  • Supports multi-process indexing and asynchronous IO via gevent
  • Depends on elasticsearch-dsl-py rather than the low level elasticsearch-py package
    • You get a lot of functionality for free!
  • Python 3 only
  • esdocs >= 0.6 supports Elasticsearch 7.x
Installation
pip install esdocs

If multi-process indexing is desired, you will want to install it along with the necessary gevent dependencies:

pip install esdocs[gevent]
Command Line Usage
$ esdocs -h
usage: esdocs [-h] [-v] [--version] [--no_input] [--indexes INDEXES]
              [--using USING] [--multi [MULTI]]
              {list,init,update,rebuild,cleanup} ...

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  --version             show program's version number and exit
  --no_input, --noinput
                        Do not prompt for user input (assumes 'Yes' for
                        actions)
  --indexes INDEXES     Comma-separate list of index names to target
  --using USING         Elasticsearch named connection to use
  --multi [MULTI]       Enable multiple processes and optionally set number of
                        CPU cores to use (defaults to all cores)

commands:
  {list,init,update,rebuild,cleanup}
    list                List indexes
    init                Initialize indexes
    update              Update indexes
    rebuild             Rebuild indexes
    cleanup             Delete unaliased indexes

To rebuild indexes specified by document serializers in ESDOCS_SERIALIZER_MODULES:

export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"

esdocs rebuild

Multi-process indexing:

export ESDOCS_GEVENT=y
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"

# auto detect number of CPU cores to use
esdocs rebuild --multiproc

# specify the number of cores to use
esdocs rebuild --multiproc --numprocs=4
Django

You must specify ESDOCS_SERIALIZER_MODULES in your Django settings and add esdocs.contrib.esdjango to your INSTALLED_APPS. You can optionally set ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS as well:


INSTALLED_APPS = [
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    ...,
    'esdocs.contrib.esdjango'
]


ESDOCS_SERIALIZER_MODULES = [
    'mypackage.module1',
    'myotherpackage.module2'
]

# these are the current defaults for this setting
ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS = [
    'esdocs.contrib.esdjango.compatibility.manager',
    'esdocs.contrib.esdjango.compatibility.geosgeometry',
    'esdocs.contrib.postgresql.compatibility.range_field'
]
Serializing Data

For esdocs to work, you need to define Document and Serializer (or DjangoSerializer) subclasses to index your data. Document comes from the excellent elasticsearch-dsl-py, while Serializer/DjangoSerializer are a part of esdocs.

  • Document defines the Elasticsearch field mappings
  • Serializer is associated with a Document
  • Serializer defines how to retrieve the dataset
  • For each record in your dataset, the Serializer will attempt to retrieve a value for each field defined on the associated Document
    • There are a number of methods you can implement on a Serializer to retrieve (or construct/munge) each value
Examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esdocs-0.6.2.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

esdocs-0.6.2-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file esdocs-0.6.2.tar.gz.

File metadata

  • Download URL: esdocs-0.6.2.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for esdocs-0.6.2.tar.gz
Algorithm Hash digest
SHA256 5d3f1fddde8e98776ccb67336d5cf992486a5f93e04e1f5439ddf5e4c73ff166
MD5 183a35b246b37f502a9c5c420ef6c615
BLAKE2b-256 a90a8a77ede2a886432461aef27179f5df7fce3fb3e91bb26893748acd5f3961

See more details on using hashes here.

File details

Details for the file esdocs-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: esdocs-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for esdocs-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e099c8acf374ed07673242cd9f38da422e3ff6cd91c41ac02bf5d75ff6eab76d
MD5 22d1def1fd16c41098e0b011c85ee419
BLAKE2b-256 8dadda649be06622c5ffca420cd0858dae95ef56a9256db8cff036f5b4a98159

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page