convert json data to Pandas DataFrame
Project description
Json2df
-------
|PyPI version|
json2df is a library batch processing a lists of json data (multiple
instances of same structured json data) into Pandas DataFrame
installation
~~~~~~~~~~~~
``$ pip install json2df``
usage example
~~~~~~~~~~~~~
e.g. when you scrape some users info data from a website, usually some
fields contains json data format
.. code:: python
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/shichaoji/json2df/master/sample.csv')
df.shape
::
(100, 5)
.. code:: python
df.head(3)
.. raw:: html
<div>
.. raw:: html
<table border="1" class="dataframe">
.. raw:: html
<thead>
::
<tr style="text-align: right;">
<th></th>
<th>mock_id</th>
<th>location</th>
<th>primary_currency</th>
<th>status</th>
<th>timezone</th>
</tr>
.. raw:: html
</thead>
.. raw:: html
<tbody>
::
<tr>
<th>0</th>
<td>0</td>
<td>{u'administrative_area': None, u'city': u'Sing...</td>
<td>{u'code': u'USD', u'name': u'US Dollar', u'cou...</td>
<td>{u'payment_verified': False, u'identity_verifi...</td>
<td>{u'country': u'SG', u'offset': 8, u'id': 210, ...</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
<td>{u'administrative_area': None, u'city': u'Bris...</td>
<td>{u'code': u'USD', u'name': u'US Dollar', u'cou...</td>
<td>{u'payment_verified': False, u'identity_verifi...</td>
<td>{u'country': u'UK', u'offset': 1, u'id': 262, ...</td>
</tr>
<tr>
<th>2</th>
<td>2</td>
<td>{u'administrative_area': None, u'city': u'Ambo...</td>
<td>{u'code': u'USD', u'name': u'US Dollar', u'cou...</td>
<td>{u'payment_verified': False, u'identity_verifi...</td>
<td>{u'country': u'MG', u'offset': 3, u'id': 295, ...</td>
</tr>
.. raw:: html
</tbody>
.. raw:: html
</table>
.. raw:: html
</div>
e.g. we want to extract the location field and convert into a dataframe
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
select location
.. code:: python
df['location'].head()
::
0 {u'administrative_area': None, u'city': u'Sing...
1 {u'administrative_area': None, u'city': u'Bris...
2 {u'administrative_area': None, u'city': u'Ambo...
3 {u'administrative_area': None, u'city': u'Drob...
4 {u'administrative_area': None, u'city': u'Torr...
Name: location, dtype: object
view first row
.. code:: python
first_row = df['location'].head()[0]
print type(first_row)
::
<type 'str'>
.. code:: python
first_row
::
"{u'administrative_area': None, u'city': u'Singapore', u'country': {u'highres_flag_url': u'/img/flags/highres_png/singapore.png', u'code': u'sg', u'name': u'Singapore', u'seo_url': None, u'flag_url_cdn': u'//cdn2.f-cdn.com/img/flags/png/sg.png', u'highres_flag_url_cdn': u'//cdn6.f-cdn.com/img/flags/highres_png/singapore.png', u'phone_code': None, u'language_code': None, u'demonym': None, u'language_id': None, u'person': None, u'iso3': None, u'sanction': None, u'flag_url': u'/img/flags/png/sg.png', u'flag_class': u'singapore', u'region_id': None}, u'vicinity': None, u'longitude': None, u'full_address': None, u'latitude': None}"
convert the string representation into a python dictionary
as you can see the json data has inner loop
.. code:: python
import ast
ast.literal_eval(first_row)
::
{u'administrative_area': None,
u'city': u'Singapore',
u'country': {u'code': u'sg',
u'demonym': None,
u'flag_class': u'singapore',
u'flag_url': u'/img/flags/png/sg.png',
u'flag_url_cdn': u'//cdn2.f-cdn.com/img/flags/png/sg.png',
u'highres_flag_url': u'/img/flags/highres_png/singapore.png',
u'highres_flag_url_cdn': u'//cdn6.f-cdn.com/img/flags/highres_png/singapore.png',
u'iso3': None,
u'language_code': None,
u'language_id': None,
u'name': u'Singapore',
u'person': None,
u'phone_code': None,
u'region_id': None,
u'sanction': None,
u'seo_url': None},
u'full_address': None,
u'latitude': None,
u'longitude': None,
u'vicinity': None}
user json2df to convert the entire location field (Series) into a DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
from json2df import series2df
extract_df = series2df(df['location'])
print (extract_df.shape)
extract_df.head(5)
::
(100, 22)
.. raw:: html
<div>
.. raw:: html
<table border="1" class="dataframe">
.. raw:: html
<thead>
::
<tr style="text-align: right;">
<th></th>
<th>administrative_area</th>
<th>city</th>
<th>country_code</th>
<th>country_demonym</th>
<th>country_flag_class</th>
<th>country_flag_url</th>
<th>country_flag_url_cdn</th>
<th>country_highres_flag_url</th>
<th>country_highres_flag_url_cdn</th>
<th>country_iso3</th>
<th>...</th>
<th>country_name</th>
<th>country_person</th>
<th>country_phone_code</th>
<th>country_region_id</th>
<th>country_sanction</th>
<th>country_seo_url</th>
<th>full_address</th>
<th>latitude</th>
<th>longitude</th>
<th>vicinity</th>
</tr>
.. raw:: html
</thead>
.. raw:: html
<tbody>
::
<tr>
<th>0</th>
<td>None</td>
<td>Singapore</td>
<td>sg</td>
<td>None</td>
<td>singapore</td>
<td>/img/flags/png/sg.png</td>
<td>//cdn2.f-cdn.com/img/flags/png/sg.png</td>
<td>/img/flags/highres_png/singapore.png</td>
<td>//cdn6.f-cdn.com/img/flags/highres_png/singapo...</td>
<td>None</td>
<td>...</td>
<td>Singapore</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>1</th>
<td>None</td>
<td>Bristol</td>
<td>gb</td>
<td>None</td>
<td>united-kingdom</td>
<td>/img/flags/png/gb.png</td>
<td>//cdn6.f-cdn.com/img/flags/png/gb.png</td>
<td>/img/flags/highres_png/united-kingdom.png</td>
<td>//cdn5.f-cdn.com/img/flags/highres_png/united-...</td>
<td>None</td>
<td>...</td>
<td>United Kingdom</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>2</th>
<td>None</td>
<td>Ambohidratrimo</td>
<td>mg</td>
<td>None</td>
<td>madagascar</td>
<td>/img/flags/png/mg.png</td>
<td>//cdn3.f-cdn.com/img/flags/png/mg.png</td>
<td>/img/flags/highres_png/madagascar.png</td>
<td>//cdn3.f-cdn.com/img/flags/highres_png/madagas...</td>
<td>None</td>
<td>...</td>
<td>Madagascar</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>3</th>
<td>None</td>
<td>Drobak</td>
<td>no</td>
<td>None</td>
<td>norway</td>
<td>/img/flags/png/no.png</td>
<td>//cdn2.f-cdn.com/img/flags/png/no.png</td>
<td>/img/flags/highres_png/norway.png</td>
<td>//cdn3.f-cdn.com/img/flags/highres_png/norway.png</td>
<td>None</td>
<td>...</td>
<td>Norway</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>4</th>
<td>None</td>
<td>Torronto</td>
<td>ca</td>
<td>None</td>
<td>canada</td>
<td>/img/flags/png/ca.png</td>
<td>//cdn6.f-cdn.com/img/flags/png/ca.png</td>
<td>/img/flags/highres_png/canada.png</td>
<td>//cdn6.f-cdn.com/img/flags/highres_png/canada.png</td>
<td>None</td>
<td>...</td>
<td>Canada</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
.. raw:: html
</tbody>
.. raw:: html
</table>
.. raw:: html
<p>
5 rows × 22 columns
.. raw:: html
</p>
.. raw:: html
</div>
.. |PyPI version| image:: https://badge.fury.io/py/json2df.svg
:target: https://badge.fury.io/py/json2df
-------
|PyPI version|
json2df is a library batch processing a lists of json data (multiple
instances of same structured json data) into Pandas DataFrame
installation
~~~~~~~~~~~~
``$ pip install json2df``
usage example
~~~~~~~~~~~~~
e.g. when you scrape some users info data from a website, usually some
fields contains json data format
.. code:: python
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/shichaoji/json2df/master/sample.csv')
df.shape
::
(100, 5)
.. code:: python
df.head(3)
.. raw:: html
<div>
.. raw:: html
<table border="1" class="dataframe">
.. raw:: html
<thead>
::
<tr style="text-align: right;">
<th></th>
<th>mock_id</th>
<th>location</th>
<th>primary_currency</th>
<th>status</th>
<th>timezone</th>
</tr>
.. raw:: html
</thead>
.. raw:: html
<tbody>
::
<tr>
<th>0</th>
<td>0</td>
<td>{u'administrative_area': None, u'city': u'Sing...</td>
<td>{u'code': u'USD', u'name': u'US Dollar', u'cou...</td>
<td>{u'payment_verified': False, u'identity_verifi...</td>
<td>{u'country': u'SG', u'offset': 8, u'id': 210, ...</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
<td>{u'administrative_area': None, u'city': u'Bris...</td>
<td>{u'code': u'USD', u'name': u'US Dollar', u'cou...</td>
<td>{u'payment_verified': False, u'identity_verifi...</td>
<td>{u'country': u'UK', u'offset': 1, u'id': 262, ...</td>
</tr>
<tr>
<th>2</th>
<td>2</td>
<td>{u'administrative_area': None, u'city': u'Ambo...</td>
<td>{u'code': u'USD', u'name': u'US Dollar', u'cou...</td>
<td>{u'payment_verified': False, u'identity_verifi...</td>
<td>{u'country': u'MG', u'offset': 3, u'id': 295, ...</td>
</tr>
.. raw:: html
</tbody>
.. raw:: html
</table>
.. raw:: html
</div>
e.g. we want to extract the location field and convert into a dataframe
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
select location
.. code:: python
df['location'].head()
::
0 {u'administrative_area': None, u'city': u'Sing...
1 {u'administrative_area': None, u'city': u'Bris...
2 {u'administrative_area': None, u'city': u'Ambo...
3 {u'administrative_area': None, u'city': u'Drob...
4 {u'administrative_area': None, u'city': u'Torr...
Name: location, dtype: object
view first row
.. code:: python
first_row = df['location'].head()[0]
print type(first_row)
::
<type 'str'>
.. code:: python
first_row
::
"{u'administrative_area': None, u'city': u'Singapore', u'country': {u'highres_flag_url': u'/img/flags/highres_png/singapore.png', u'code': u'sg', u'name': u'Singapore', u'seo_url': None, u'flag_url_cdn': u'//cdn2.f-cdn.com/img/flags/png/sg.png', u'highres_flag_url_cdn': u'//cdn6.f-cdn.com/img/flags/highres_png/singapore.png', u'phone_code': None, u'language_code': None, u'demonym': None, u'language_id': None, u'person': None, u'iso3': None, u'sanction': None, u'flag_url': u'/img/flags/png/sg.png', u'flag_class': u'singapore', u'region_id': None}, u'vicinity': None, u'longitude': None, u'full_address': None, u'latitude': None}"
convert the string representation into a python dictionary
as you can see the json data has inner loop
.. code:: python
import ast
ast.literal_eval(first_row)
::
{u'administrative_area': None,
u'city': u'Singapore',
u'country': {u'code': u'sg',
u'demonym': None,
u'flag_class': u'singapore',
u'flag_url': u'/img/flags/png/sg.png',
u'flag_url_cdn': u'//cdn2.f-cdn.com/img/flags/png/sg.png',
u'highres_flag_url': u'/img/flags/highres_png/singapore.png',
u'highres_flag_url_cdn': u'//cdn6.f-cdn.com/img/flags/highres_png/singapore.png',
u'iso3': None,
u'language_code': None,
u'language_id': None,
u'name': u'Singapore',
u'person': None,
u'phone_code': None,
u'region_id': None,
u'sanction': None,
u'seo_url': None},
u'full_address': None,
u'latitude': None,
u'longitude': None,
u'vicinity': None}
user json2df to convert the entire location field (Series) into a DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
from json2df import series2df
extract_df = series2df(df['location'])
print (extract_df.shape)
extract_df.head(5)
::
(100, 22)
.. raw:: html
<div>
.. raw:: html
<table border="1" class="dataframe">
.. raw:: html
<thead>
::
<tr style="text-align: right;">
<th></th>
<th>administrative_area</th>
<th>city</th>
<th>country_code</th>
<th>country_demonym</th>
<th>country_flag_class</th>
<th>country_flag_url</th>
<th>country_flag_url_cdn</th>
<th>country_highres_flag_url</th>
<th>country_highres_flag_url_cdn</th>
<th>country_iso3</th>
<th>...</th>
<th>country_name</th>
<th>country_person</th>
<th>country_phone_code</th>
<th>country_region_id</th>
<th>country_sanction</th>
<th>country_seo_url</th>
<th>full_address</th>
<th>latitude</th>
<th>longitude</th>
<th>vicinity</th>
</tr>
.. raw:: html
</thead>
.. raw:: html
<tbody>
::
<tr>
<th>0</th>
<td>None</td>
<td>Singapore</td>
<td>sg</td>
<td>None</td>
<td>singapore</td>
<td>/img/flags/png/sg.png</td>
<td>//cdn2.f-cdn.com/img/flags/png/sg.png</td>
<td>/img/flags/highres_png/singapore.png</td>
<td>//cdn6.f-cdn.com/img/flags/highres_png/singapo...</td>
<td>None</td>
<td>...</td>
<td>Singapore</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>1</th>
<td>None</td>
<td>Bristol</td>
<td>gb</td>
<td>None</td>
<td>united-kingdom</td>
<td>/img/flags/png/gb.png</td>
<td>//cdn6.f-cdn.com/img/flags/png/gb.png</td>
<td>/img/flags/highres_png/united-kingdom.png</td>
<td>//cdn5.f-cdn.com/img/flags/highres_png/united-...</td>
<td>None</td>
<td>...</td>
<td>United Kingdom</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>2</th>
<td>None</td>
<td>Ambohidratrimo</td>
<td>mg</td>
<td>None</td>
<td>madagascar</td>
<td>/img/flags/png/mg.png</td>
<td>//cdn3.f-cdn.com/img/flags/png/mg.png</td>
<td>/img/flags/highres_png/madagascar.png</td>
<td>//cdn3.f-cdn.com/img/flags/highres_png/madagas...</td>
<td>None</td>
<td>...</td>
<td>Madagascar</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>3</th>
<td>None</td>
<td>Drobak</td>
<td>no</td>
<td>None</td>
<td>norway</td>
<td>/img/flags/png/no.png</td>
<td>//cdn2.f-cdn.com/img/flags/png/no.png</td>
<td>/img/flags/highres_png/norway.png</td>
<td>//cdn3.f-cdn.com/img/flags/highres_png/norway.png</td>
<td>None</td>
<td>...</td>
<td>Norway</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<th>4</th>
<td>None</td>
<td>Torronto</td>
<td>ca</td>
<td>None</td>
<td>canada</td>
<td>/img/flags/png/ca.png</td>
<td>//cdn6.f-cdn.com/img/flags/png/ca.png</td>
<td>/img/flags/highres_png/canada.png</td>
<td>//cdn6.f-cdn.com/img/flags/highres_png/canada.png</td>
<td>None</td>
<td>...</td>
<td>Canada</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
.. raw:: html
</tbody>
.. raw:: html
</table>
.. raw:: html
<p>
5 rows × 22 columns
.. raw:: html
</p>
.. raw:: html
</div>
.. |PyPI version| image:: https://badge.fury.io/py/json2df.svg
:target: https://badge.fury.io/py/json2df
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
json2df-0.1.3.5.tar.gz
(5.3 kB
view hashes)
Built Distributions
json2df-0.1.3.5-py2.7.egg
(7.2 kB
view hashes)
Close
Hashes for json2df-0.1.3.5-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37182765a3e3c9cba819aa3719df500c357c94d678bb036102a5369c8c0a3ca8 |
|
MD5 | 132487fcab86a63ef5d48ac4a66780ad |
|
BLAKE2b-256 | 847a5da4a2bfe903e84a335526d2ce7da0a3551031e6835149a5b0a6fbd1ea4d |