Skip to main content

Converts Pandas Dataframe to Tensorflow TFRecord

Project description

Logo

Build Status

fastTF is a easy way to convert a Pandas DataFrame into a Tensorflow TFRecord. Also with fastTF you will be able to get the example_spec.

Why would you do so?

  • With a TFRecord file you will be able to make your input pipeline faster
  • Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk.

Tech

fastTF uses a number of open source projects to work properly:

  • Tensorflow - "An end-to-end open source machine learning platform"
  • Pandas - "pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."

Installation

tfFast requires Python 3.6 to run.

Install the necessary packages and dependencies

$ pip3 install tensorflow
$ pip3 install pandas

Development

Want to contribute? Great!

fastTF uses Tensorflow + Pandas for fast development.

Fork these repository and change app.py.

Open your Terminal and run these commands to edit the files

$ cd fastTF
$ nano app.py

Example

def test_function():
    """
        Test the package
        :return: if the program was successful.
        >>> test_function()
        True

    """
    data = pd.read_csv('diabetes.csv')
    test = tfRecordWriter(data)
    test.write('new.tfrecords')

    with open('example_spec.pickle','rb') as f:
        example_spec = pickle.load(f)
    assert example_spec == test.get_example_spec()

    data = tf.data.TFRecordDataset('new.tfrecords')
    func = lambda x: tf.io.parse_single_example(x,example_spec)
    data = data.map(func)
    y = data.take(1)
    for x in y:
      assert x['Age'].numpy() == 50
    return True

Metrics

Memory Test

Memory Test

Line #    Mem usage    Increment   Line Contents
================================================
     1                             import pandas as pd
     2                             from fastTF import tfRecordWriter
     3                             import tensorflow as tf
     4                             import pickle
     5                             import doctest
     6                             import pytest
     7                             
     8    300.7 MiB    300.7 MiB   
     9    301.0 MiB      0.2 MiB   def test_function():
    10    301.0 MiB      0.0 MiB       """
    11    301.0 MiB      0.0 MiB           Test the package
    12                                     >>> test_function()
    13    301.0 MiB      0.0 MiB           True
    14    301.0 MiB      0.0 MiB       
    15    301.0 MiB      0.0 MiB       """
    16                                 data = pd.read_csv('diabetes.csv')
    17    301.0 MiB      0.0 MiB       test = tfRecordWriter(data)
    18    301.0 MiB      0.0 MiB       test.write('new.tfrecords')
    19    301.0 MiB      0.0 MiB   
    20    301.0 MiB      0.0 MiB       with open('example_spec.pickle','rb') as f:
    21    301.3 MiB      0.2 MiB           example_spec = pickle.load(f)
    22    301.3 MiB      0.0 MiB       assert example_spec == test.get_example_spec()
    23                             
    24                                 data = tf.data.TFRecordDataset('new.tfrecords')
    25                                 func = lambda x: tf.io.parse_single_example(x,example_spec)
    26                                 data = data.map(func)
    27                                 y = data.take(1)
    28                                 for x in y:
    29                                   assert x['Age'].numpy() == 50
    30                                 return True

Speed Test

Timer unit: 1e-06 s

Total time: 0.644076 s
File: /notebooks/package/tests/test_sample.py
Function: test_function at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           def test_function():
     9         1       6395.0   6395.0      1.0      data = pd.read_csv('diabetes.csv')
    10         1        602.0    602.0      0.1      test = tfRecordWriter(data)
    11         1     589870.0 589870.0     91.6      test.write('new.tfrecords')
    12                                           
    13         1         57.0     57.0      0.0      with open('example_spec.pickle','rb') as f:
    14         1         79.0     79.0      0.0          example_spec = pickle.load(f)
    15         1         28.0     28.0      0.0      assert example_spec == test.get_example_spec()
    16                                           
    17         1       8591.0   8591.0      1.3      data = tf.data.TFRecordDataset('new.tfrecords')
    18         1          3.0      3.0      0.0      func = lambda x: tf.io.parse_single_example(x,example_spec)
    19         1      25952.0  25952.0      4.0      data = data.map(func)
    20         1        245.0    245.0      0.0      y = data.take(1)
    21         2      12227.0   6113.5      1.9      for x in y:
    22         1         27.0     27.0      0.0        assert x['Age'].numpy() == 50

Another Example

>>> import pandas as pd
>>> data = pd.read_csv('diabetes.csv')
>>> from fastTF import tfRecordWriter
>>> demo = tfRecordWriter(data)
>>> demo.write("name.tfrecord")
>>> test.get_example_spec()
{'Pregnancies': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Glucose', FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'BloodPressure': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None),  'SkinThickness': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Insulin': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Age': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Outcome': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'BMI': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None), 'DiabetesPedigreeFunction': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None)}

Todos

  • Write more Tests
  • Make the app faster

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastTF-1.0.3.tar.gz (6.3 kB view hashes)

Uploaded Source

Built Distribution

fastTF-1.0.3-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page