A Module to add read and write capabilities to pandas for several nosql databases
Project description
Pandas-NoSQL
Pandas-NoSQL adds read and write capabilities to pandas for several nosql databases
Pandas read and write methods for:
- MongoDB
- Elasticsearch
- Redis
- Apache Cassandra
The dependencies used for the functions are:
- pymongo by MongoDB
- elasticsearch by Elastic
- redis-py by Redis
- cassandra-driver by DataStax
Installation
$ pip install pandas-nosql
Documentation
MongoDB
# Read a MongoDB collection into a DataFrame
# Defaults: host='localhost' port=27017
# For nested collections use normalize=True
import pandas as pd
import pandas_nosql
df = pd.read_mongo(
database = 'test_db',
collection = 'test_col',
normalize = True,
host = 'localhost',
port = 27017
)
# Write DataFrame to MongoDB collection
# modify_colletion parameter is to help prevent accidental overwrites
df.to_mongo(
database = 'test_db',
collection = 'test_col2',
mode = 'a',
host = 'localhost',
port = 27017
)
Elasticsearch
import pandas as pd
import pandas_nosql
# Read an Elastic index into a DataFrame
# To Access localhost:
# * hosts='https://localhost:9200'
# * verify_certs=False
# If "xpack.security.enabled: false" use http instead
# To split out _source use split_source=True
elastic_cols = ('make', 'model', 'purchase_date', 'miles')
df = pd.read_elastic(
hosts = 'https://localhost:9200',
username = 'elastic',
password = 'strong_password',
index = 'test_index',
fields = elastic_cols,
verify_certs = False
split_source = True
)
# Write DataFrame to Elastic Index
df.to_elastic(
hosts = 'https://localhost:9200',
username = 'elastic',
password = 'strong_password',
index = 'test_index',
mode = 'w'
)
Redis
import pandas as pd
import pandas_nosql
# Read a DataFrame that was sent to Redis using the to_redis method
# A DataFrame not sent to Redis using the to_redis method is not guaranteed to be read properly
# To Access localhost:
# * host='localhost'
# To persist the DataFrame in Redis use expire_seconds=None
# To set an expiration for the DataFrame pass an integer to expire_seconds
df = pd.read_redis(
host = 'localhost',
port = 6379,
redis_key = 'test_key'
)
# Write a DataFrame to Redis
df.to_redis(
host = 'localhost',
port = 6379,
redis_key = 'test_key',
expire_seconds = None
)
Apache Cassandra
import pandas as pd
import pandas_nosql
# Read an Apache Cassandra table into a Panda DataFrame
# To Access localhost:
# * contact_points=['localhost']
# * contact_points must be a list
df = pd.read_cassandra(
contact_points = ['localhost'],
port = 9042,
keyspace = 'test_keyspace',
table = 'test_table'
)
# Append a DataFrame to Apache Cassandra
# DataFrame Columns must match table Columns
replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
}
df.to_cassandra(
contact_points = ['localhost'],
port = 9042,
keyspace = 'test_keyspace',
table = 'test_table',
mode = 'w',
replication = replication
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandas-nosql-1.2.0.tar.gz
(9.2 kB
view hashes)
Built Distribution
Close
Hashes for pandas_nosql-1.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2870a4289335d3e56093237dad591f5653e721de430a1324f404aaeb064ff3fa |
|
MD5 | 84d77bc65f2b41d6391fa6d5bf82f8e6 |
|
BLAKE2b-256 | 067269a9f839aa11e29e3bb926dba688b276a78daf80d0407aceb2c0f5598197 |