Skip to main content

Python elasticsearch client to analyse, explore and manipulate data that resides in elasticsearch.

Project description

What is it?

eland is a Elasticsearch client Python package to analyse, explore and manipulate data that resides in Elasticsearch. Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows eland to access large datasets stored in Elasticsearch.

For example, to explore data in a large Elasticsearch index, simply create an eland DataFrame from an Elasticsearch index pattern, and explore using an API that mirrors a subset of the pandas.DataFrame API:

>>> import eland as ed

>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('localhost:9200', 'flights') 

>>> df.head()
   AvgTicketPrice  Cancelled           Carrier  ...        OriginWeather dayOfWeek           timestamp
0      841.265642      False   Kibana Airlines  ...                Sunny         0 2018-01-01 00:00:00
1      882.982662      False  Logstash Airways  ...                Clear         0 2018-01-01 18:27:00
2      190.636904      False  Logstash Airways  ...                 Rain         0 2018-01-01 17:11:14
3      181.694216       True   Kibana Airlines  ...  Thunder & Lightning         0 2018-01-01 10:33:28
4      730.041778      False   Kibana Airlines  ...        Damaging Wind         0 2018-01-01 05:13:00

[5 rows x 27 columns]

>>> df.describe()
       AvgTicketPrice  DistanceKilometers  DistanceMiles  FlightDelayMin  FlightTimeHour  FlightTimeMin     dayOfWeek
count    13059.000000        13059.000000   13059.000000    13059.000000    13059.000000   13059.000000  13059.000000
mean       628.253689         7092.142457    4406.853010       47.335171        8.518797     511.127842      2.835975
std        266.386661         4578.263193    2844.800855       96.743006        5.579019     334.741135      1.939365
min        100.020531            0.000000       0.000000        0.000000        0.000000       0.000000      0.000000
25%        410.008918         2470.545974    1535.126118        0.000000        4.194976     251.738513      1.000000
50%        640.362667         7612.072403    4729.922470        0.000000        8.385816     503.148975      3.000000
75%        842.254990         9735.082407    6049.459005       15.000000       12.009396     720.534532      4.141221
max       1199.729004        19881.482422   12353.780273      360.000000       31.715034    1902.901978      6.000000

>>> df[['Carrier', 'AvgTicketPrice', 'Cancelled']]
                Carrier  AvgTicketPrice  Cancelled
0       Kibana Airlines      841.265642      False
1      Logstash Airways      882.982662      False
2      Logstash Airways      190.636904      False
3       Kibana Airlines      181.694216       True
4       Kibana Airlines      730.041778      False
...                 ...             ...        ...
13054  Logstash Airways     1080.446279      False
13055  Logstash Airways      646.612941      False
13056  Logstash Airways      997.751876      False
13057          JetBeats     1102.814465      False
13058          JetBeats      858.144337      False

[13059 rows x 3 columns]

>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
     AvgTicketPrice  Cancelled          Carrier  ...        OriginWeather dayOfWeek           timestamp
8        960.869736       True  Kibana Airlines  ...            Heavy Fog         0 2018-01-01 12:09:35
26       975.812632       True  Kibana Airlines  ...                 Rain         0 2018-01-01 15:38:32
311      946.358410       True  Kibana Airlines  ...            Heavy Fog         0 2018-01-01 11:51:12
651      975.383864       True  Kibana Airlines  ...                 Rain         2 2018-01-03 21:13:17
950      907.836523       True  Kibana Airlines  ...  Thunder & Lightning         2 2018-01-03 05:14:51

[5 rows x 27 columns]

>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
     DistanceKilometers  AvgTicketPrice
sum        9.261629e+07    8.204365e+06
min        0.000000e+00    1.000205e+02
std        4.578263e+03    2.663867e+02

>>> df[['Carrier', 'Origin', 'Dest']].nunique()
Carrier      4
Origin     156
Dest       156
dtype: int64

>>> s = df.AvgTicketPrice * 2 + df.DistanceKilometers - df.FlightDelayMin
>>> s
0        18174.857422
1        10589.365723
2          381.273804
3          739.126221
4        14818.327637
             ...     
13054    10219.474121
13055     8381.823975
13056    12661.157104
13057    20819.488281
13058    18315.431274
Length: 13059, dtype: float64

>>> print(s.info_es())
index_pattern: flights
Index:
 index_field: _id
 is_source_field: False
Mappings:
 capabilities:
         es_field_name  is_source es_dtype es_date_format pd_dtype  is_searchable  is_aggregatable  is_scripted aggregatable_es_field_name
NaN  script_field_None      False   double           None  float64           True             True         True          script_field_None
Operations:
 tasks: []
 size: None
 sort_params: None
 _source: ['script_field_None']
 body: {'script_fields': {'script_field_None': {'script': {'source': "(((doc['AvgTicketPrice'].value * 2) + doc['DistanceKilometers'].value) - doc['FlightDelayMin'].value)"}}}}
 post_processing: []

>>> pd_df = ed.eland_to_pandas(df)
>>> pd_df.head()
   AvgTicketPrice  Cancelled           Carrier  ...        OriginWeather dayOfWeek           timestamp
0      841.265642      False   Kibana Airlines  ...                Sunny         0 2018-01-01 00:00:00
1      882.982662      False  Logstash Airways  ...                Clear         0 2018-01-01 18:27:00
2      190.636904      False  Logstash Airways  ...                 Rain         0 2018-01-01 17:11:14
3      181.694216       True   Kibana Airlines  ...  Thunder & Lightning         0 2018-01-01 10:33:28
4      730.041778      False   Kibana Airlines  ...        Damaging Wind         0 2018-01-01 05:13:00

[5 rows x 27 columns]

See docs and demo_notebook.ipynb for more examples.

Where to get it

The source code is currently hosted on GitHub at: https://github.com/elastic/eland

Binary installers for the latest released version are available at the Python package index.

pip install eland

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eland-7.6.0a1.tar.gz (91.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eland-7.6.0a1-py3-none-any.whl (145.2 kB view details)

Uploaded Python 3

File details

Details for the file eland-7.6.0a1.tar.gz.

File metadata

  • Download URL: eland-7.6.0a1.tar.gz
  • Upload date:
  • Size: 91.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for eland-7.6.0a1.tar.gz
Algorithm Hash digest
SHA256 2426bfabbdc824f51fc4ef138cb185eec370e7e18287ca5744441d26bf8f7a29
MD5 52624d5f835b3e1ecf1e6ce56017e606
BLAKE2b-256 ad9919d26e131f44231c1984197ea71d101b7e6c7a25cd71654509bf3cd7010b

See more details on using hashes here.

File details

Details for the file eland-7.6.0a1-py3-none-any.whl.

File metadata

  • Download URL: eland-7.6.0a1-py3-none-any.whl
  • Upload date:
  • Size: 145.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for eland-7.6.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 350c3c56f944d9cc92469a6f1deb76593a796ee92e8216cca03ecf4d76e03732
MD5 154fdc036fbb13415cae94ab4fc41214
BLAKE2b-256 e3b525dcb3cdf12ee2829a9fac3a0157ec7415c13b09bd615a0acdae6a479764

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page