Skip to main content

A library for turning stings into dataframes

Project description

DataFrame Literal

Build Status

A library for turning stings into dataframes which supports both Pandas and PySpark.

To create a dataframe a header is required which contains both the column names and column types on the first row. All the following rows will be taken as data rows.

Install

Basic

pip install dataframe_literal

Extras:

Also installs some extra libraries, you can pick if you are using PySpark or Pandas.

pip install dataframe_literal[all]
pip install dataframe_literal[pyspark]
pip install dataframe_literal[pandas]

PySpark

Getting Started

The simplest way to create a DataFrame is by doing:

from dataframe_literal.spark import dataframe

df = dataframe(
    """
    | a (str) | b (int) | c (bool) | d (date)   | e (timestamp)       |
    | aaa     | 123     | True     | 2019-10-10 | 2019-10-20 10:11:12 |
    | aaa     | 123     | False    | 2019-10-10 | 2019-10-20 10:11:12 |
    """
)
df.printSchema()
df.show()

This will use an existing Spark Session or make a new one and then construct a DataFrame with the following schema:

root
 |-- a: string (nullable = true)
 |-- b: integer (nullable = true)
 |-- c: boolean (nullable = true)
 |-- d: date (nullable = true)
 |-- e: timestamp (nullable = true)

Supported datatypes:

  • int: T.IntegerType
  • integer: T.IntegerType
  • str: T.StringType
  • string: T.StringType
  • bool: T.BooleanType
  • boolean: T.BooleanType
  • date: T.DateType
  • timestamp: T.TimestampType

Advanced Usage

You can also pass in your own SparkSession using:

from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe

spark = SparkSession.builder.getOrCreate()
dataframe(
    ...
    spark=spark
)

We also have the ability to create nested PySpark DataFrames such as

from pyspark.sql import SparkSession
from dataframe_literal.spark import dataframe

spark = SparkSession.builder.getOrCreate()
dataframe(
    data="""
    | a.col1 (str) | a.col2 (str) | b.col1 (str) | c.col1 (str) | d (str) |
    | aaa          | bbb          | ccc          | ddd          | eee        |
    | aaa          | bbb          | ccc          | ddd          | eee        |
    """,
    spark=spark
)

This will construct a DataFrame with the following schema:

root
 |-- a: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |    |-- col2: string (nullable = true)
 |-- b: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |-- c: struct (nullable = true)
 |    |-- col1: string (nullable = true)
 |-- d: string (nullable = true)

Pandas

Coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dataframe-literal, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size dataframe_literal-0.1.3.tar.gz (3.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page