Skip to main content

An opinionated Elasticsearch bulk indexer for Python.

Project description

py-es-bulk

A simple wrapper around the Python elasticsearch client put_template(), streaming_bulk(), and parallel_bulk() helper APIs with robust error handling.

This library is designed to work across various versions of the elasticsearch Python module and of the Elasticsearch server, by dynamically identifying the module used to create the Elasticsearch object.

These names are available for import:

  • put_template

    Push a document template to the server using a specified Elasticsearch object. This module will determine whether a template document of the same name and version already exists, and PUT the new template if not.

    Args:

    • es: An instance of the Elasticsearch class.
    • name: The name of the template.
    • mapping_name: The name of the mapping used in the template.
    • body: The payload body of the template.

    Returns: A tuple (start_time, end_time, retry_count, error_keys)

  • streaming_bulk

    Push multiple source documents to Elasticsearch indices, using proper error handling and retry logic.

    Args:

    • 'es': An instance of the Elasticsearch class.
    • actions: An iterable of Elasticsearch action records (passed directly to Elasticsearch).
    • errorsfp: A file pointer where HTTP 400 errors are logged.
    • logger: A Logger object where messages can be logged.

    Returns: A tuple (start_time, end_time, successfully_indexed, duplicate, failed, retry_count).

  • parallel_bulk

    Push multiple source documents to Elasticsearch indices in parallel across multiple threads, using proper error handling and retry logic.

    Args:

    • es: An instance of the Elasticsearch class.
    • actions: An iterable of Elasticsearch action records (passed directly to Elasticsearch)
    • errorsfp: A file pointer where HTTP 400 errors are logged.
    • logger: A Logger object where messages can be logged.
    • chunk_size=10000000: Number of docs sent in one chunk to Elasticsearch.
    • max_chunk_bytes=104857600: The maximum size of a request.
    • thread_count=8: The size of the thread pool to use.
    • queue_size=4: The size of the task queue between the controller and processing threads.

    Returns: A tuple (start_time, end_time, successfully_indexed, duplicate, failed, retry_count)

  • TemplateException

    This exception is raised by put_template when a template document does not contain the required version metadata ({"_meta": {"version": <integer>}}); or, when multiple template documents are included in a single call to put_template, if the versions of those documents are not all identical.

Unit testing support

The pyesbulk package attempts to dynamically determine the Python module used to produce the Elasticsearch object that's passed in to pyesbulk methods. This is necessary in order to properly resolve exception classes for the error handling and retry logic.

However, unit tests often work with mocked objects which won't have "real" Python package structure, and the dynamic module recognition algorithm may fail. When this happens, pyesbulk will attempt to import elasticsearch. If that's not correct (e.g., if you're using elasticsearch1 or elasticsearch5), you can override the automatic search by including a force_elastic_search_module property on your mocked Elasticsearch object.

For example,

class MockElasticsearch:
    def __init__(self):
        self.force_elastic_search_module = "elasticsearch5"

or

    es = MockElasticSearch()
    es.force_elastic_search_module = "elasticsearch1"

See also https://pypi.org/project/pyesbulk/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyesbulk-2.1.1-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file pyesbulk-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyesbulk-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.20.0 setuptools/50.2.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.8

File hashes

Hashes for pyesbulk-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c693635bd9c4600331a98f8c4b6e91f26e92ea24eda64268e9c963b10648b98c
MD5 06c8ce1610a64da189ecafc78e16a042
BLAKE2b-256 02bf11655609b9f1426ee14436ad19ef6169f59496661ff5bbbddefc352b19f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page