Read, write and update large scale pandas DataFrame with ElasticSearch
Project description
es_pandas
Read, write and update large scale pandas DataFrame with ElasticSearch.
Requirements
This package should work on Python3(>=3.4) and ElasticSearch should be version 6.x or 7.x(>=6.8).
Installation The package is hosted on PyPi and can be installed with pip:
pip install es_pandas
Usage
import time
import pandas as pd
from es_pandas import es_pandas
# Information of es cluseter
es_host = 'localhost:9200'
index = 'demo'
# crete es_pandas instance
ep = es_pandas(es_host)
# Example data frame
df = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 128)],
'Num': [x for x in range(31)],
'Date': pd.date_range(start='2019/01/01', end='2019/01/31')})
# init template if you want
doc_type = 'demo'
ep.init_es_tmpl(df, doc_type)
# Example of write data to es, use the template you create
ep.to_es(df, index, doc_type=doc_type)
# set use_index=True if you want to use DataFrame index as records' _id
ep.to_es(df, index, doc_type=doc_type, use_index=True)
time.sleep(10)
# Example of read data from es
df = ep.to_pandas(index)
print(df.head())
# return certain fields in es
heads = ['Num', 'Date']
df = ep.to_pandas(index, heads=heads)
print(df.head())
# set certain columns dtype
dtype = {'Num': 'float', 'Alpha': object}
df = ep.to_pandas(index, dtype=dtype)
print(df.dtypes)
# delete records from es
ep.delete_es(df.iloc[0:10, :], index)
df2 = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 129)],
'Num': [x for x in range(32)],
'Date': pd.date_range(start='2019/01/01', end='2019/02/01')})
df2.loc[df2['Num']==10, ['Alpha']] = 'change'
# Example of update data in es
ep.to_es_dev(df2, index, 'Num', doc_type=doc_type)
More about update
to_es_dev(df, index, key_col, ignore_cols=[])
function is available if you want to write or update data with ElasticSearch.
to_es_dev
function writes df
to es if index
not exits, or it reads data from ElasticSearch in batches, and compare the data with df
by merging them on key_col
, if you want to ignore some columns when comparing, set it with ignore_col
parameter. Moreover, new records in df
will be written to ElasticSearch.
License
(c) 2019 Frank
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for es_pandas-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85fcde6dd982c62f4c525720e7281ab65ebbc2baa10c5658d4501f05ef2dcbc7 |
|
MD5 | 7d5f033c41e9dea734e7c49fe87123f2 |
|
BLAKE2b-256 | 1c5bb1508ea55a9b953761e532448365e7a8c6188a3fc1a93a0942e42c5ab320 |