Skip to main content

Read, write and update large scale pandas DataFrame with ElasticSearch

Project description

es_pandas

Build Status 996.icu LICENSE PyPi version PyPi downloads

Read, write and update large scale pandas DataFrame with ElasticSearch.

Requirements

This package should work on Python3(>=3.4) and ElasticSearch should be version 6.x or 7.x(>=6.8).

Installation The package is hosted on PyPi and can be installed with pip:

pip install es_pandas

Usage

import time

import pandas as pd

from es_pandas import es_pandas


# Information of es cluseter
es_host = 'localhost:9200'
index = 'demo'

# crete es_pandas instance
ep = es_pandas(es_host)

# Example data frame
df = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 128)], 
                    'Num': [x for x in range(31)], 
                    'Date': pd.date_range(start='2019/01/01', end='2019/01/31')})

# init template if you want
doc_type = 'demo'
ep.init_es_tmpl(df, doc_type)

# Example of write data to es, use the template you create
ep.to_es(df, index, doc_type=doc_type)
# set use_index=True if you want to use DataFrame index as records' _id
ep.to_es(df, index, doc_type=doc_type, use_index=True)

time.sleep(10)

# Example of read data from es
df = ep.to_pandas(index)
print(df.head())

# return certain fields in es
heads = ['Num', 'Date']
df = ep.to_pandas(index, heads=heads)
print(df.head())

# set certain columns dtype
dtype = {'Num': 'float', 'Alpha': object}
df = ep.to_pandas(index, dtype=dtype)
print(df.dtypes)

# delete records from es
ep.delete_es(df.iloc[0:10, :], index)


df2 = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 129)],
                    'Num': [x for x in range(32)],
                    'Date': pd.date_range(start='2019/01/01', end='2019/02/01')})

df2.loc[df2['Num']==10, ['Alpha']] = 'change'

# Example of update data in es
ep.to_es_dev(df2, index, 'Num', doc_type=doc_type)

More about update

to_es_dev(df, index, key_col, ignore_cols=[]) function is available if you want to write or update data with ElasticSearch.

to_es_dev function writes df to es if index not exits, or it reads data from ElasticSearch in batches, and compare the data with df by merging them on key_col, if you want to ignore some columns when comparing, set it with ignore_col parameter. Moreover, new records in df will be written to ElasticSearch.

License

(c) 2019 Frank

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for es-pandas, version 0.0.9
Filename, size File type Python version Upload date Hashes
Filename, size es_pandas-0.0.9-py3-none-any.whl (7.1 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size es_pandas-0.0.9.tar.gz (5.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page