Read, write and update large scale pandas DataFrame with ElasticSearch
Project description
es_pandas
Read, write and update large scale pandas DataFrame with ElasticSearch.
Requirements
This package should work on Python3(>=3.4) and ElasticSearch should be version 5.x, 6.x or 7.x.
Installation The package is hosted on PyPi and can be installed with pip:
pip install es_pandas
Deprecation Notice
Supporting of ElasticSearch 5.x will by deprecated in future version.
Usage
import time
import pandas as pd
from es_pandas import es_pandas
# Information of es cluseter
es_host = 'localhost:9200'
index = 'demo'
# crete es_pandas instance
ep = es_pandas(es_host)
# Example data frame
df = pd.DataFrame({'Num': [x for x in range(100000)]})
df['Alpha'] = 'Hello'
df['Date'] = pd.datetime.now()
# init template if you want
doc_type = 'demo'
ep.init_es_tmpl(df, doc_type)
# Example of write data to es, use the template you create
ep.to_es(df, index, doc_type=doc_type, thread_count=2, chunk_size=10000)
# set use_index=True if you want to use DataFrame index as records' _id
ep.to_es(df, index, doc_type=doc_type, use_index=True, thread_count=2, chunk_size=10000)
# delete records from es
ep.to_es(df.iloc[5000:], index, doc_type=doc_type, _op_type='delete', thread_count=2, chunk_size=10000)
# Update doc by doc _id
df.iloc[:1000, 1] = 'Bye'
df.iloc[:1000, 2] = pd.datetime.now()
ep.to_es(df.iloc[:1000, 1:], index, doc_type=doc_type, _op_type='update')
# Example of read data from es
df = ep.to_pandas(index)
print(df.head())
# return certain fields in es
heads = ['Num', 'Date']
df = ep.to_pandas(index, heads=heads)
print(df.head())
# set certain columns dtype
dtype = {'Num': 'float', 'Alpha': object}
df = ep.to_pandas(index, dtype=dtype)
print(df.dtypes)
# infer dtype from es template
df = ep.to_pandas(index, infer_dtype=True)
print(df.dtypes)
# use query_sql parameter if you want to do query in sql
# Example of write data to es with pandas.io.json
ep.to_es(df, index, doc_type=doc_type, use_pandas_json=True, thread_count=2, chunk_size=10000)
print('write es doc with pandas.io.json finished')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
es_pandas-0.0.22.tar.gz
(6.3 kB
view details)
Built Distribution
File details
Details for the file es_pandas-0.0.22.tar.gz
.
File metadata
- Download URL: es_pandas-0.0.22.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.62.3 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5686d095eb6997d45db57ff8b3fcc8e4d3ef5a512a0e76eca910a4b457e6c4e9 |
|
MD5 | e77d232e1c59257f9295049a55dcecde |
|
BLAKE2b-256 | 8dfbd0323116dd231002763468a3f7fb03482a47a8d1f8dd359225b0c9e3b02a |
File details
Details for the file es_pandas-0.0.22-py3-none-any.whl
.
File metadata
- Download URL: es_pandas-0.0.22-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.62.3 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1dd7d0e0f3be908073912d3f13c6f394771966752324cb11f024b518dad45c24 |
|
MD5 | c9e252bb277b47730007900268933867 |
|
BLAKE2b-256 | 259bbf1b60cfbcd519e080f8783b0a3e4ee683c1f3e403b8a8c204ec81862f60 |