Converts Pandas Dataframe to Tensorflow TFRecord
Project description
fastTF is a easy way to convert a Pandas DataFrame into a Tensorflow TFRecord. Also with fastTF you will be able to get the example_spec.
Why would you do so?
- With a TFRecord file you will be able to make your input pipeline faster
- Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk.
Tech
fastTF uses a number of open source projects to work properly:
- Tensorflow - "An end-to-end open source machine learning platform"
- Pandas - "pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."
Installation
tfFast requires Python 3.6 to run.
Install the necessary packages and dependencies
$ pip3 install tensorflow
$ pip3 install pandas
Development
Want to contribute? Great!
fastTF uses Tensorflow + Pandas for fast development.
Fork these repository and change app.py.
Open your Terminal and run these commands to edit the files
$ cd fastTF
$ nano app.py
Example
def test_function():
"""
Test the package
:return: if the program was successful.
>>> test_function()
True
"""
data = pd.read_csv('diabetes.csv')
test = tfRecordWriter(data)
test.write('new.tfrecords')
with open('example_spec.pickle','rb') as f:
example_spec = pickle.load(f)
assert example_spec == test.get_example_spec()
data = tf.data.TFRecordDataset('new.tfrecords')
func = lambda x: tf.io.parse_single_example(x,example_spec)
data = data.map(func)
y = data.take(1)
for x in y:
assert x['Age'].numpy() == 50
return True
Metrics
Memory Test
Memory Test
Line # Mem usage Increment Line Contents
================================================
1 import pandas as pd
2 from fastTF import tfRecordWriter
3 import tensorflow as tf
4 import pickle
5 import doctest
6 import pytest
7
8 300.7 MiB 300.7 MiB
9 301.0 MiB 0.2 MiB def test_function():
10 301.0 MiB 0.0 MiB """
11 301.0 MiB 0.0 MiB Test the package
12 >>> test_function()
13 301.0 MiB 0.0 MiB True
14 301.0 MiB 0.0 MiB
15 301.0 MiB 0.0 MiB """
16 data = pd.read_csv('diabetes.csv')
17 301.0 MiB 0.0 MiB test = tfRecordWriter(data)
18 301.0 MiB 0.0 MiB test.write('new.tfrecords')
19 301.0 MiB 0.0 MiB
20 301.0 MiB 0.0 MiB with open('example_spec.pickle','rb') as f:
21 301.3 MiB 0.2 MiB example_spec = pickle.load(f)
22 301.3 MiB 0.0 MiB assert example_spec == test.get_example_spec()
23
24 data = tf.data.TFRecordDataset('new.tfrecords')
25 func = lambda x: tf.io.parse_single_example(x,example_spec)
26 data = data.map(func)
27 y = data.take(1)
28 for x in y:
29 assert x['Age'].numpy() == 50
30 return True
Speed Test
Timer unit: 1e-06 s
Total time: 0.644076 s
File: /notebooks/package/tests/test_sample.py
Function: test_function at line 8
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 def test_function():
9 1 6395.0 6395.0 1.0 data = pd.read_csv('diabetes.csv')
10 1 602.0 602.0 0.1 test = tfRecordWriter(data)
11 1 589870.0 589870.0 91.6 test.write('new.tfrecords')
12
13 1 57.0 57.0 0.0 with open('example_spec.pickle','rb') as f:
14 1 79.0 79.0 0.0 example_spec = pickle.load(f)
15 1 28.0 28.0 0.0 assert example_spec == test.get_example_spec()
16
17 1 8591.0 8591.0 1.3 data = tf.data.TFRecordDataset('new.tfrecords')
18 1 3.0 3.0 0.0 func = lambda x: tf.io.parse_single_example(x,example_spec)
19 1 25952.0 25952.0 4.0 data = data.map(func)
20 1 245.0 245.0 0.0 y = data.take(1)
21 2 12227.0 6113.5 1.9 for x in y:
22 1 27.0 27.0 0.0 assert x['Age'].numpy() == 50
Another Example
>>> import pandas as pd
>>> data = pd.read_csv('diabetes.csv')
>>> from fastTF import tfRecordWriter
>>> demo = tfRecordWriter(data)
>>> demo.write("name.tfrecord")
>>> test.get_example_spec()
{'Pregnancies': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Glucose', FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'BloodPressure': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'SkinThickness': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Insulin': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Age': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'Outcome': FixedLenFeature(shape=(), dtype=tf.int64, default_value=None), 'BMI': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None), 'DiabetesPedigreeFunction': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None)}
Todos
- Write more Tests
- Make the app faster
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fastTF-1.0.3.tar.gz
(6.3 kB
view details)
Built Distribution
File details
Details for the file fastTF-1.0.3.tar.gz
.
File metadata
- Download URL: fastTF-1.0.3.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07468555de017adc7c0d5fb92f7254320491d83b67f62816e7cadbde707546ea |
|
MD5 | 9a7f501603d72bf80639043d1ca71889 |
|
BLAKE2b-256 | 591ddbb4497be81dce5d25a3d19e2e23c7d84bf58e667e0c919ba6fbf172c15b |
File details
Details for the file fastTF-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: fastTF-1.0.3-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91142174de04eedd8ac6bf649ac3321885ec524dff490f5042e7284e53e34b29 |
|
MD5 | bd0f1f7aae2bfc82c5cab8d5f59a58d2 |
|
BLAKE2b-256 | f4caeca836fb411b63bc16c1c7ce76697fa035fbbdca47717b22ed48dfb48c24 |