eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

These details have not been verified by PyPI

Project links

Homepage

Project description

About

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, or scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows Eland to access large datasets stored in Elasticsearch.

Eland also provides tools to upload trained machine learning models from common libraries like scikit-learn, XGBoost, and LightGBM into Elasticsearch.

Getting Started

Eland can be installed from PyPI with Pip:

$ python -m pip install eland

If using Eland to upload NLP models to Elasticsearch install the PyTorch extras:

$ python -m pip install 'eland[pytorch]'

Eland can also be installed from Conda Forge with Conda:

$ conda install -c conda-forge eland

Compatibility

Supports Python 3.10, 3.11, 3.12 and 3.13.
Supports Pandas 1.5 and 2.
Supports Elasticsearch 9+ clusters. If you are using the NLP with PyTorch feature make sure your Eland minor version matches the minor version of your Elasticsearch cluster. For all other features it is sufficient for the major version to match. Use Eland 8.x for Elasticsearch 8.x support.
You need to install the appropriate version of PyTorch to import an NLP model. Run python -m pip install 'eland[pytorch]' to install that version.

Prerequisites

Users installing Eland on Debian-based distributions may need to install prerequisite packages for the transitive dependencies of Eland:

$ sudo apt-get install -y \
  build-essential pkg-config cmake \
  python3-dev libzip-dev libjpeg-dev

Note that other distributions such as CentOS, RedHat, Arch, etc. may require using a different package manager and specifying different package names.

Docker

If you want to use Eland without installing it just to run the available scripts, use the Docker image. It can be used interactively:

$ docker run -it --rm --network host docker.elastic.co/eland/eland

Running installed scripts is also possible without an interactive shell, e.g.:

$ docker run -it --rm --network host \
    docker.elastic.co/eland/eland \
    eland_import_hub_model \
      --url http://host.docker.internal:9200/ \
      --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
      --task-type ner

Connecting to Elasticsearch

Eland uses the Elasticsearch low level client to connect to Elasticsearch. This client supports a range of connection options and authentication options.

You can pass either an instance of elasticsearch.Elasticsearch to Eland APIs or a string containing the host to connect to:

import eland as ed

# Connecting to an Elasticsearch instance running on 'http://localhost:9200'
df = ed.DataFrame("http://localhost:9200", es_index_pattern="flights")

# Connecting to an Elastic Cloud instance
from elasticsearch import Elasticsearch

es = Elasticsearch(
    cloud_id="cluster-name:...",
    basic_auth=("elastic", "<password>")
)
df = ed.DataFrame(es, es_index_pattern="flights")

DataFrames in Eland

eland.DataFrame wraps an Elasticsearch index in a Pandas-like API and defers all processing and filtering of data to Elasticsearch instead of your local machine. This means you can process large amounts of data within Elasticsearch from a Jupyter Notebook without overloading your machine.

➤ Eland DataFrame API documentation

➤ Advanced examples in a Jupyter Notebook

>>> import eland as ed

>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   AvgTicketPrice      13059 non-null  float64       
 1   Cancelled           13059 non-null  bool          
 2   Carrier             13059 non-null  object        
...      
 24  OriginWeather       13059 non-null  object        
 25  dayOfWeek           13059 non-null  int64         
 26  timestamp           13059 non-null  datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB

# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
     AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
8        960.869736       True  ...         0 2018-01-01 12:09:35
26       975.812632       True  ...         0 2018-01-01 15:38:32
311      946.358410       True  ...         0 2018-01-01 11:51:12
651      975.383864       True  ...         2 2018-01-03 21:13:17
950      907.836523       True  ...         2 2018-01-03 05:14:51

[5 rows x 27 columns]

# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
     DistanceKilometers  AvgTicketPrice
sum        9.261629e+07    8.204365e+06
min        0.000000e+00    1.000205e+02
std        4.578263e+03    2.663867e+02

Machine Learning in Eland

Regression and classification

Eland allows transforming trained regression and classification models from scikit-learn, XGBoost, and LightGBM libraries to be serialized and used as an inference model in Elasticsearch.

➤ Eland Machine Learning API documentation

>>> from sklearn import datasets
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> training_data = datasets.make_classification(n_features=5)
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])

>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
    es_client="http://localhost:9200",
    model_id="xgb-classifier",
    model=xgb_model,
    feature_names=["f0", "f1", "f2", "f3", "f4"],
)

# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

NLP with PyTorch

[!WARNING]
PyTorch models can execute code on your Elasticsearch server, exposing your cluster to potential security vulnerabilities. Only use models from trusted sources and never use models from unverified or unknown providers.

For NLP tasks, Eland allows importing PyTorch trained BERT models into Elasticsearch. Models can be either plain PyTorch models, or supported transformers models from the Hugging Face model hub.

$ eland_import_hub_model \
  --url http://localhost:9200/ \
  --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
  --task-type ner \
  --start

The example above will automatically start a model deployment. This is a good shortcut for initial experimentation, but for anything that needs good throughput you should omit the --start argument from the Eland command line and instead start the model using the ML UI in Kibana. The --start argument will deploy the model with one allocation and one thread per allocation, which will not offer good performance. When starting the model deployment using the ML UI in Kibana or the Elasticsearch API you will be able to set the threading options to make the best use of your hardware.

>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.common import es_version
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel

>>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200")
>>> es_cluster_version = es_version(es)

# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel(model_id="elastic/distilbert-base-cased-finetuned-conll03-english", task_type="ner", es_version=es_cluster_version)
Downloading: 100%|██████████| 257/257 [00:00<00:00, 108kB/s]
Downloading: 100%|██████████| 954/954 [00:00<00:00, 372kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 668kB/s] 
Downloading: 100%|██████████| 112/112 [00:00<00:00, 43.9kB/s]
Downloading: 100%|██████████| 249M/249M [00:23<00:00, 11.2MB/s]

# Export the model in a TorchScrpt representation which Elasticsearch uses
>>> tmp_path = "models"
>>> Path(tmp_path).mkdir(parents=True, exist_ok=True)
>>> model_path, config, vocab_path = tm.save(tmp_path)

# Import model into Elasticsearch
>>> ptm = PyTorchModel(es, tm.elasticsearch_model_id())
>>> ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
100%|██████████| 63/63 [00:12<00:00,  5.02it/s]

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

9.2.0

Oct 30, 2025

9.0.1

Apr 30, 2025

9.0.0

Apr 16, 2025

8.18.2

Apr 30, 2025

8.18.1

Apr 16, 2025

8.18.0 yanked

Apr 15, 2025

Reason this release was yanked:

This release contains the code for Eland 9.0.0 instead.

8.17.0

Jan 7, 2025

8.16.0

Nov 14, 2024

8.15.4

Oct 18, 2024

8.15.3

Oct 9, 2024

8.15.2

Oct 2, 2024

8.15.1

Oct 1, 2024

8.15.0

Aug 13, 2024

8.14.0

Jun 10, 2024

8.13.1

May 3, 2024

8.13.0

Mar 27, 2024

8.12.1

Feb 1, 2024

8.12.0

Jan 19, 2024

8.11.1

Nov 22, 2023

8.11.0

Nov 8, 2023

8.10.1

Oct 11, 2023

8.10.0

Oct 9, 2023

8.9.0

Aug 24, 2023

8.7.0

Mar 30, 2023

8.3.0

Jul 11, 2022

8.2.0

May 11, 2022

8.1.0

Mar 31, 2022

8.0.0

Feb 10, 2022

8.0.0b1 pre-release

Dec 16, 2021

7.14.1b1 pre-release

Aug 30, 2021

7.14.0b1 pre-release

Aug 9, 2021

7.13.0b1 pre-release

Jun 22, 2021

7.10.1b1 pre-release

Jan 12, 2021

7.10.0b1 pre-release

Oct 29, 2020

7.9.1a1 pre-release

Sep 30, 2020

7.9.0a1 pre-release

Aug 18, 2020

7.7.0a1 pre-release

May 20, 2020

7.6.0a5 pre-release

Apr 14, 2020

7.6.0a4 pre-release

Mar 23, 2020

7.6.0a3 pre-release

Feb 15, 2020

7.6.0a2 pre-release

Feb 15, 2020

7.6.0a1 pre-release

Feb 15, 2020

7.5.1a4 pre-release

Feb 5, 2020

7.5.1a3 pre-release

Jan 16, 2020

7.5.1a2 pre-release

Jan 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eland-9.2.0.tar.gz (132.1 kB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

eland-9.2.0-py3-none-any.whl (158.4 kB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file eland-9.2.0.tar.gz.

File metadata

Download URL: eland-9.2.0.tar.gz
Upload date: Oct 30, 2025
Size: 132.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for eland-9.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9874ec6c5ed01920195e58c702ae2f4de59efc3a8b1a1d49baafcb0b3bb91e1b`
MD5	`d0b238c796b01e78221237d964ceb1b2`
BLAKE2b-256	`c71bcea102b2a269d76a54b24745dcb13e6f1da066a8a21cfa7e8524789a9182`

See more details on using hashes here.

File details

Details for the file eland-9.2.0-py3-none-any.whl.

File metadata

Download URL: eland-9.2.0-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 158.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for eland-9.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff84f803e77793a000d2a69070f894039e929896280b81898496247f88226446`
MD5	`a35765fa5a8be90babc765eef050c9c5`
BLAKE2b-256	`f96f64e502626af3eeee0533056dbce34b05b64f5ed46d7c088f6497f6854cd4`

See more details on using hashes here.

eland 9.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About

Getting Started

Compatibility

Prerequisites

Docker

Connecting to Elasticsearch

DataFrames in Eland

Machine Learning in Eland

Regression and classification

NLP with PyTorch

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes