Skip to main content

Flexible, high-scale API to elasticsearch

Project description

===============
pyelasticsearch
===============

.. image:: https://travis-ci.org/pyelasticsearch/pyelasticsearch.png
:alt: Build Status
:align: right
:target: https://travis-ci.org/pyelasticsearch/pyelasticsearch

pyelasticsearch is a clean, future-proof, high-scale API to elasticsearch. It
provides...

* Transparent conversion of Python data types to and from JSON, including
datetimes and the arbitrary-precision Decimal type
* Translation of HTTP failure status codes into exceptions
* Connection pooling
* HTTP basic auth and HTTPS support
* Load balancing across nodes in a cluster
* Failed-node marking to avoid downed nodes for a period
* Optional automatic retrying of failed requests
* Thread safety
* Loosely coupled design, letting you customize things like JSON encoding and
bulk indexing

For more on our philosophy and history, see `Comparison with elasticsearch-py, the “Official Client” <https://pyelasticsearch.readthedocs.org/en/latest/elasticsearch-py/>`_.


A Taste of the API
==================

Make a pooling, balancing, all-singing, all-dancing connection object::

>>> from pyelasticsearch import ElasticSearch
>>> es = ElasticSearch('http://localhost:9200/')

Index a document::

>>> es.index('contacts',
... 'person',
... {'name': 'Joe Tester', 'age': 25, 'title': 'QA Master'},
... id=1)
{u'_type': u'person', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'contacts'}

Index a couple more documents, this time in a single request using the
bulk-indexing API::

>>> docs = [{'id': 2, 'name': 'Jessica Coder', 'age': 32, 'title': 'Programmer'},
... {'id': 3, 'name': 'Freddy Tester', 'age': 29, 'title': 'Office Assistant'}]
>>> es.bulk((es.index_op(doc, id=doc.pop('id')) for doc in docs),
... index='contacts',
... doc_type='person')

If we had many documents and wanted to chunk them for performance,
`bulk_chunks() <https://pyelasticsearch.readthedocs.org/en/latest/api/#pyelasticsearch.bulk_chunks>`_ would easily rise to the task,
dividing either at a certain number of documents per batch or, for curated
platforms like Google App Engine, at a certain number of bytes. Thanks to
the decoupled design, you can even substitute your own batching function if
you have unusual needs. Bulk indexing is the most demanding ES task in most
applications, so we provide very thorough tools for representing operations,
optimizing wire traffic, and dealing with errors. See
`bulk() <https://pyelasticsearch.readthedocs.org/en/latest/api/#pyelasticsearch.ElasticSearch.bulk>`_ for more.

Refresh the index to pick up the latest::

>>> es.refresh('contacts')
{u'ok': True, u'_shards': {u'successful': 5, u'failed': 0, u'total': 10}}

Get just Jessica's document::

>>> es.get('contacts', 'person', 2)
{u'_id': u'2',
u'_index': u'contacts',
u'_source': {u'age': 32, u'name': u'Jessica Coder', u'title': u'Programmer'},
u'_type': u'person',
u'_version': 1,
u'exists': True}

Perform a simple search::

>>> es.search('name:joe OR name:freddy', index='contacts')
{u'_shards': {u'failed': 0, u'successful': 42, u'total': 42},
u'hits': {u'hits': [{u'_id': u'1',
u'_index': u'contacts',
u'_score': 0.028130024999999999,
u'_source': {u'age': 25,
u'name': u'Joe Tester',
u'title': u'QA Master'},
u'_type': u'person'},
{u'_id': u'3',
u'_index': u'contacts',
u'_score': 0.028130024999999999,
u'_source': {u'age': 29,
u'name': u'Freddy Tester',
u'title': u'Office Assistant'},
u'_type': u'person'}],
u'max_score': 0.028130024999999999,
u'total': 2},
u'timed_out': False,
u'took': 4}

Perform a search using the `elasticsearch query DSL`_:

.. _`elasticsearch query DSL`: http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

::

>>> query = {
... 'query': {
... 'filtered': {
... 'query': {
... 'query_string': {'query': 'name:tester'}
... },
... 'filter': {
... 'range': {
... 'age': {
... 'from': 27,
... 'to': 37,
... },
... },
... },
... },
... },
... }
>>> es.search(query, index='contacts')
{u'_shards': {u'failed': 0, u'successful': 42, u'total': 42},
u'hits': {u'hits': [{u'_id': u'3',
u'_index': u'contacts',
u'_score': 0.19178301,
u'_source': {u'age': 29,
u'name': u'Freddy Tester',
u'title': u'Office Assistant'},
u'_type': u'person'}],
u'max_score': 0.19178301,
u'total': 1},
u'timed_out': False,
u'took': 2}

Delete the index::

>>> es.delete_index('contacts')
{u'acknowledged': True, u'ok': True}

For more, see the full `API Documentation <https://pyelasticsearch.readthedocs.org/en/latest/api/>`_.


Changelog
=========

v1.4.1 (2018-04-02)
------
* Recognize new "index already exists" spelling so we raise the right
exceptions. Close #195.
* Fix CI setup.
* Drop Python 2.6 support.
* Drop nose for testing.

v1.4
----
* Add support for custom certificate authorities via the ``ca_certs`` arg to
the ``ElasticSearch`` constructor.
* Add support for client certificates via the ``client_cert`` arg.

v1.3
----
* Add support for HTTPS.
* Add username, password, and port kwargs to the constructor so you don't have
to repeat their values if they're the same across many servers.


v1.2.4 (2015-05-21)
-------------------
* Don't crash when the ``query_params`` kwarg is omitted from calls to
``send_request()``.


v1.2.3 (2015-04-17)
-------------------
* Make ``delete_all_indexes()`` work.
* Fix a bug in which specifying ``_all`` as an index name sometimes caused
doctype names to be treated as index names.


v1.2.2 (2015-04-10)
-------------------
* Correct a typo in the ``bulk()`` docs.


v1.2.1 (2015-04-09)
-------------------
* Update ES doc links, now that Elastic has changed domains and reorganized
its docs.
* Require elasticsearch lib 1.3 or greater, as that's when it started exposing
``ConnectionTimeout``.


v1.2 (2015-03-06)
-----------------
* Make sure the Content-Length header gets set when calling ``create_index()``
with no explicit ``settings`` arg. This solves 411s when using nginx as a
proxy.
* Add ``doc_as_upsert()`` arg to ``update()``.
* Make ``bulk_chunks()`` compute perfectly optimal results, no longer ever
exceeding the byte limit unless a single document is over the limit on its own.


v1.1 (2015-02-12)
-----------------
* Introduce new bulk API, supporting all types of bulk operations (index,
update, create, and delete), providing chunking via ``bulk_chunks()``, and
introducing per-action error-handling. All errors raise exceptions--even
individual failed operations--and the exceptions expose enough data to
identify operations for retrying or reporting. The design is decoupled in
case you want to create your own chunkers or operation builders.
* Deprecate ``bulk_index()`` in favor of the more capable ``bulk()``.
* Make one last update to ``bulk_index()``. It now catches individual
operation failures, raising ``BulkError``. Also add the ``index_field`` and
``type_field`` args, allowing you to index across different indices and doc
types within one request.
* ``ElasticSearch`` object now defaults to http://localhost:9200/ if you don't provide any node URLs.
* Improve docs: give a better overview on the front page, and document how to
customize JSON encoding.


v1.0 (2015-01-23)
-----------------

* Switch to elasticsearch-py's transport and downtime-pooling machinery,
much of which was borrowed from us anyway.
* Make bulk indexing (and likely other network things) 15 times faster.
* Add a comparison with the official client to the docs.
* Fix ``delete_by_query()`` to work with ES 1.0 and later.
* Bring ``percolate()`` es_kwargs up to date.
* Fix all tests that were failing on modern versions of ES.
* Tolerate errors that are non-strings and create exceptions for them properly.

.. note::

Backward incompatible:

* Drop compatibility with elasticsearch < 1.0.
* Redo ``cluster_state()`` to work with ES 1.0 and later. Arguments have
changed.
* InvalidJsonResponseError no longer provides access to the HTTP response
(in the ``response`` property): just the bad data (the ``input`` property).
* Change from the logger "pyelasticsearch" to "elasticsearch.trace".
* Remove ``revival_delay`` param from ElasticSearch object.
* Remove ``encode_body`` param from ``send_request()``. Now all dicts are
JSON-encoded, and all strings are left alone.


v0.7.1 (2014-08-12)
-------------------

* Brings tests up to date with ``update_aliases()`` API change.


v0.7 (2014-08-12)
-----------------

* When an ``id_field`` is specified for ``bulk_index()``, don't index it under
its original name as well; use it only as the ``_id``.
* Rename ``aliases()`` to ``get_aliases()`` for consistency with other
methods. Original name still works but is deprecated. Add an ``alias`` kwarg
to the method so you can fetch specific aliases.

.. note::

Backward incompatible:

* ``update_aliases()`` no longer requires a dict with an ``actions`` key;
that much is implied. Just pass the value of that key.


v0.6.1 (2013-11-01)
-------------------

* Update package requirements to allow requests 2.0, which is in fact
compatible. (Natim)
* Properly raise ``IndexAlreadyExistsException`` even if the error is reported
by a node other than the one to which the client is directly connected.
(Jannis Leidel)


v0.6 (2013-07-23)
-----------------

.. note::

Note the change in behavior of ``bulk_index()`` in this release. This change
probably brings it more in line with your expectations. But double check,
since it now overwrites existing docs in situations where it didn't before.

Also, we made a backward-incompatible spelling change to a little-used
``index()`` kwarg.

* ``bulk_index()`` now overwrites any existing doc of the same ID and doctype.
Before, in certain versions of ES (like 0.90RC2), it did nothing at all if a
document already existed, probably much to your surprise. (We removed the
``'op_type': 'create'`` pair, whose intentions were always mysterious.)
(Gavin Carothers)
* Rename the ``force_insert`` kwarg of ``index()`` to ``overwrite_existing``.
The old name implied the opposite of what it actually did. (Gavin Carothers)


v0.5 (2013-04-20)
-----------------

* Support multiple indices and doctypes in ``delete_by_query()``. Accept both
string and JSON queries in the ``query`` arg, just as ``search()`` does.
Passing the ``q`` arg explicitly is now deprecated.
* Add ``multi_get``.
* Add ``percolate``. Thanks, Adam Georgiou and Joseph Rose!
* Add ability to specify the parent document in ``bulk_index()``. Thanks, Gavin
Carothers!
* Remove the internal, undocumented ``from_python`` method. django-haystack
users will need to upgrade to a newer version that avoids using it.
* Refactor JSON encoding machinery. Now it's clearer how to customize it: just
plug your custom JSON encoder class into ``ElasticSearch.json_encoder``.
* Don't crash under ``python -OO``.
* Support non-ASCII URL path components (like Unicode document IDs) and query
string param values.
* Switch to the nose testrunner.


v0.4.1 (2013-03-25)
-------------------

* Fix a bug introduced in 0.4 wherein "None" was accidentally sent to ES when
an ID wasn't passed to ``index()``.


v0.4 (2013-03-19)
-----------------

* Support Python 3.
* Support more APIs:

* ``cluster_state``
* ``get_settings``
* ``update_aliases`` and ``aliases``
* ``update`` (existed but didn't work before)

* Support the ``size`` param of the ``search`` method. (You can now change
``es_size`` to ``size`` in your code if you like.)
* Support the ``fields`` param on ``index`` and ``update`` methods, new since
ES 0.20.
* Maintain better precision of floats when passed to ES.
* Change endpoint of bulk indexing so it works on ES < 0.18.
* Support documents whose ID is 0.
* URL-escape path components, so doc IDs containing funny chars work.
* Add a dedicated ``IndexAlreadyExistsError`` exception for when you try to
create an index that already exists. This helps you trap this situation
unambiguously.
* Add docs about upgrading from pyes.
* Remove the undocumented and unused ``to_python`` method.


v0.3 (2013-01-10)
-----------------

* Correct the ``requests`` requirement to require a version that has everything
we need. In fact, require requests 1.x, which has a stable API.
* Add ``update()`` method.
* Make ``send_request`` method public so you can use ES APIs we don't yet
explicitly support.
* Handle JSON translation of Decimal class and sets.
* Make ``more_like_this()`` take an arbitrary request body so you can filter
the returned docs.
* Replace the ``fields`` arg of ``more_like_this`` with ``mlt_fields``. This
makes it actually work, as it's the param name ES expects.
* Make explicit our undeclared dependency on simplejson.


v0.2 (2012-10-06)
-----------------

Many thanks to Erik Rose for almost completely rewriting the API to follow
best practices, improve the API user experience, and make pyelasticsearch
future-proof.

.. note::

This release is **backward-incompatible** in numerous ways, please
read the following section carefully. If in doubt, you can easily stick
with pyelasticsearch 0.1.

Backward-incompatible changes:

* Simplify ``search()`` and ``count()`` calling conventions. Each now supports
either a textual or a dict-based query as its first argument. There's no
longer a need to, for example, pass an empty string as the first arg in order
to use a JSON query (a common case).

* Standardize on the singular for the names of the ``index`` and ``doc_type``
kwargs. It's not always obvious whether an ES API allows for multiple
indexes. This was leading me to have to look aside to the docs to determine
whether the kwarg was called ``index`` or ``indexes``. Using the singular
everywhere will result in fewer doc lookups, especially for the common case
of a single index.

* Rename ``morelikethis`` to ``more_like_this`` for consistency with other
methods.

* ``index()`` now takes ``(index, doc_type, doc)`` rather than ``(doc, index,
doc_type)``, for consistency with ``bulk_index()`` and other methods.

* Similarly, ``put_mapping()`` now takes ``(index, doc_type, mapping)``
rather than ``(doc_type, mapping, index)``.

* To prevent callers from accidentally destroying large amounts of data...

* ``delete()`` no longer deletes all documents of a doctype when no ID is
specified; use ``delete_all()`` instead.
* ``delete_index()`` no longer deletes all indexes when none are given; use
``delete_all_indexes()`` instead.
* ``update_settings()`` no longer updates the settings of all indexes when
none are specified; use ``update_all_settings()`` instead.

* ``setup_logging()`` is gone. If you want to configure logging, use the
logging module's usual facilities. We still log to the "pyelasticsearch"
named logger.

* Rethink error handling:

* Raise a more specific exception for HTTP error codes so callers can catch
it without examining a string.
* Catch non-JSON responses properly, and raise the more specific
``NonJsonResponseError`` instead of the generic ``ElasticSearchError``.
* Remove mentions of nonexistent exception types that would cause crashes
in their ``except`` clauses.
* Crash harder if JSON encoding fails: that always indicates a bug in
pyelasticsearch.
* Remove the ill-defined ``ElasticSearchError``.
* Raise ``ConnectionError`` rather than ``ElasticSearchError`` if we can't
connect to a node (and we're out of auto-retries).
* Raise ``ValueError`` rather than ``ElasticSearchError`` if no documents
are passed to ``bulk_index``.
* All exceptions are now more introspectable, because they don't
immediately mash all the context down into a string. For example, you can
recover the unmolested response object from ``ElasticHttpError``.
* Removed ``quiet`` kwarg, meaning we always expose errors.

Other changes:

* Add Sphinx documentation.
* Add load-balancing across multiple nodes.
* Add failover in the case where a node doesn't respond.
* Add ``close_index``, ``open_index``, ``update_settings``, ``health``.
* Support passing arbitrary kwargs through to the ES query string. Known ones
are taken verbatim; unanticipated ones need an "\es_" prefix to guarantee
forward compatibility.
* Automatically convert ``datetime`` objects when encoding JSON.
* Recognize and convert datetimes and dates in pass-through kwargs. This is
useful for ``timeout``.
* In routines that can take either one or many indexes, don't require the
caller to wrap a single index name in a list.
* Many other internal improvements


v0.1 (2012-08-30)
-----------------

Initial release based on the work of Robert Eanes and other authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyelasticsearch-1.4.1.tar.gz (55.6 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page