Skip to main content

Converts Pandas Dataframe to Tensorflow TFRecord

Project description

Logo

Build Status

fastTF is a easy way to convert a Pandas DataFrame into a Tensorflow TFRecord. Also with fastTF you will be able to get the example_spec.

Why would you do so?

  • With a TFRecord file you will be able to make your input pipeline faster
  • Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk.

Tech

fastTF uses a number of open source projects to work properly:

  • Tensorflow - "An end-to-end open source machine learning platform"
  • Pandas - "pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."

Installation

tfFast requires Python 3.6 to run.

Install the necessary packages and dependencies

$ pip3 install tensorflow
$ pip3 install pandas

Development

Want to contribute? Great!

fastTF uses Tensorflow + Pandas for fast development.

Fork these repository and change app.py.

Open your Terminal and run these commands to edit the files

$ cd fastTF
$ nano app.py

Example

def test_function():
    """
        Test the package
        :return: if the program was successful.
        >>> test_function()
        True

    """
    data = pd.read_csv('diabetes.csv')
    test = tfRecordWriter(data)
    test.write('new.tfrecords')

    with open('example_spec.pickle','rb') as f:
        example_spec = pickle.load(f)
    assert example_spec == test.get_example_spec()

    data = tf.data.TFRecordDataset('new.tfrecords')
    func = lambda x: tf.io.parse_single_example(x,example_spec)
    data = data.map(func)
    y = data.take(1)
    for x in y:
      assert x['Age'].numpy() == 50
    return True

Metrics

Memory Test

Memory Test

Line #    Mem usage    Increment   Line Contents
================================================
     1                             import pandas as pd
     2                             from fastTF import tfRecordWriter
     3                             import tensorflow as tf
     4                             import pickle
     5                             import doctest
     6                             import pytest
     7                             
     8    300.7 MiB    300.7 MiB   
     9    301.0 MiB      0.2 MiB   def test_function():
    10    301.0 MiB      0.0 MiB       """
    11    301.0 MiB      0.0 MiB           Test the package
    12                                     >>> test_function()
    13    301.0 MiB      0.0 MiB           True
    14    301.0 MiB      0.0 MiB       
    15    301.0 MiB      0.0 MiB       """
    16                                 data = pd.read_csv('diabetes.csv')
    17    301.0 MiB      0.0 MiB       test = tfRecordWriter(data)
    18    301.0 MiB      0.0 MiB       test.write('new.tfrecords')
    19    301.0 MiB      0.0 MiB   
    20    301.0 MiB      0.0 MiB       with open('example_spec.pickle','rb') as f:
    21    301.3 MiB      0.2 MiB           example_spec = pickle.load(f)
    22    301.3 MiB      0.0 MiB       assert example_spec == test.get_example_spec()
    23                             
    24                                 data = tf.data.TFRecordDataset('new.tfrecords')
    25                                 func = lambda x: tf.io.parse_single_example(x,example_spec)
    26                                 data = data.map(func)
    27                                 y = data.take(1)
    28                                 for x in y:
    29                                   assert x['Age'].numpy() == 50
    30                                 return True

Speed Test

Timer unit: 1e-06 s

Total time: 0.644076 s
File: /notebooks/package/tests/test_sample.py
Function: test_function at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           def test_function():
     9         1       6395.0   6395.0      1.0      data = pd.read_csv('diabetes.csv')
    10         1        602.0    602.0      0.1      test = tfRecordWriter(data)
    11         1     589870.0 589870.0     91.6      test.write('new.tfrecords')
    12                                           
    13         1         57.0     57.0      0.0      with open('example_spec.pickle','rb') as f:
    14         1         79.0     79.0      0.0          example_spec = pickle.load(f)
    15         1         28.0     28.0      0.0      assert example_spec == test.get_example_spec()
    16                                           
    17         1       8591.0   8591.0      1.3      data = tf.data.TFRecordDataset('new.tfrecords')
    18         1          3.0      3.0      0.0      func = lambda x: tf.io.parse_single_example(x,example_spec)
    19         1      25952.0  25952.0      4.0      data = data.map(func)
    20         1        245.0    245.0      0.0      y = data.take(1)
    21         2      12227.0   6113.5      1.9      for x in y:
    22         1         27.0     27.0      0.0        assert x['Age'].numpy() == 50

Another Example

>>> import pandas as pd
>>> data = pd.read_csv('diabetes.csv')
>>> from fastTF import tfRecordWriter
>>> demo = tfRecordWriter(data)
>>> demo.write("name.tfrecord")
>>> test.get_example_spec()
{'Pregnancies': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Glucose', FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'BloodPressure': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None),  'SkinThickness': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Insulin': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Age': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Outcome': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'BMI': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None), 'DiabetesPedigreeFunction': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None)}

Todos

  • Write more Tests
  • Make the app faster

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastTF-1.0.3.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

fastTF-1.0.3-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file fastTF-1.0.3.tar.gz.

File metadata

  • Download URL: fastTF-1.0.3.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.16

File hashes

Hashes for fastTF-1.0.3.tar.gz
Algorithm Hash digest
SHA256 07468555de017adc7c0d5fb92f7254320491d83b67f62816e7cadbde707546ea
MD5 9a7f501603d72bf80639043d1ca71889
BLAKE2b-256 591ddbb4497be81dce5d25a3d19e2e23c7d84bf58e667e0c919ba6fbf172c15b

See more details on using hashes here.

File details

Details for the file fastTF-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: fastTF-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.16

File hashes

Hashes for fastTF-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 91142174de04eedd8ac6bf649ac3321885ec524dff490f5042e7284e53e34b29
MD5 bd0f1f7aae2bfc82c5cab8d5f59a58d2
BLAKE2b-256 f4caeca836fb411b63bc16c1c7ce76697fa035fbbdca47717b22ed48dfb48c24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page