Project description

DataframeToDB

This project is a personal non-profit project rather than accepting donations to buy more coffee and carry out more open source projects!

If you want to donate: https://ko-fi.com/juanretamales

Introduction

note: for the moment, i create a function for tested the library, but i changed the way how to work because to use for controling a database and not only a table.

When to use:

When you need save dataframes, for example when scraping a table
You need shared a database estructure for use in proyects

DataframeToDB is an improved way to upload pandas dataframes to Microsoft SQL Server, MySQL, PostgreDB and support for other type of database.

DataframeToDB takes advantage of SQLAlchemy. This allows for a much lighter weight import for writing pandas dataframes to db server.

Installation

For instalation, for now, copy the file 'DataframeToDB.py into the project' and import

Requirements

Written for Python 3.8+
Pandas library
SQLalchemy library
driver for db conection

Example

import pandas as pd
from DataframeToDB import dataframetodb as toDB
from datetime import datetime

import pyodbc
# Test Dataframe for insertion
df = pd.DataFrame({
    "Col1": [1, 2, 3],
    "Col2": ["A", "B", "C"],
    "Col3": [True, False, True],
    "Col4": [datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3)]
})

# Create a pyodbc connection
conn = pyodbc.connect(
    """
    Driver={ODBC Driver 17 for SQL Server};
    Server=localhost;
    Database=my_database;
    UID=my_user;
    PWD=my_pass;
    """
)

# You can use this optional function for asign type data to dataframe and use the best way the library
df = toDB.refactor(df)

# If a table is created, the generated sql is returned
create_statement = toDB.save(df, "my_great_table", conn, if_exists="replace", custom={"Col1":"INT PRIMARY KEY"}, temp=False)

# Commit upload actions and close connection
conn.commit()
conn.close()

How to work

when you use, the library create a folden with the structure

#root of proyect
==================
|-.dataframeToDb
|   + Databasename
|      + TablesFiles
|

USAGE

SAVE

toDB.save(df, name, conn, if_exists="append", custom=None, temp=False)

df: pandas DataFrame to upload
name: String of desired name for the table
conn: A valid connection object
if_exists: Option for what to do if the specified table name already exists in the database. If the table does not exist a new one will be created. By default this option is set to 'append'
- 'append': Appends the dataframe to the table if it already exists.
- 'fail': Purposely raises a FailError if the table already exists.
- 'replace': Drops the old table with the specified name, and creates a new one. Be careful with this option, it will completely delete a table with the specified name.
- 'upsert ': clean and insert
custom: A dictionary object with one or more of the column names being uploaded as the key, and a valid SQL column definition as the value. The value must contain a type (INT, FLOAT, VARCHAR(500), etc.), and can optionally also include constraints (NOT NULL, PRIMARY KEY, etc.)
- Examples: {'ColumnName':'varchar(1000)'} {'ColumnName2':'int primary key'}
temp: Either True if creating a local sql server temporary table for the connection, or False (default) if not.

inspired in fast-to-sql

Thanx to: Joseph Astrahan for the answer in stackoverflow

FAQ

What databases does DataframeToDB support?

The same as SQLAlchemy, for now they are SQLite, Postgresql, MySQL, Oracle, MS-SQL, Firebird, Sybase and others. But you should check this link.

why use pickle and not json?

It really cost me a lot to decide, json could give a lot of transparency, but finally I decided to use cpickle due to its speed and that it could have a greater impact on other projects

why did you decide to create this library?

For scrapping projects it is very tedious to be configuring tables, I wanted something more automatic, I found one but it was only compatible with MS-SQL, and in projects that could not afford that database I decided to create this and create things that I felt were missing.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0

Jul 8, 2021

This version

0.1

Jun 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframetodb-0.1.tar.gz (35.6 kB view hashes)

Uploaded Jun 5, 2021 Source

Hashes for dataframetodb-0.1.tar.gz

Hashes for dataframetodb-0.1.tar.gz
Algorithm	Hash digest
SHA256	`7849764ad53107dba61d2823b9e5680ec950b3c391a3eb9d2245ed1a63ce724c`
MD5	`4832156ab999f737d03f5c560d30bc44`
BLAKE2b-256	`3134930b92f26cd4a666e5d633e032eba633e07c3a7a7eaa8ccb20c984362900`