A Python package for working with datasets from the UK Data Service (UKDS)
Project description
ukds
A Python package for working with datasets from the UK Data Service (UKDS).
Any problems? Please raise an Issue on GitHub
To install:
pip install ukds
Quick Demo
(This demonstration uses the following dataset: Gershuny, J., Sullivan, O. (2017). United Kingdom Time Use Survey, 2014-2015. Centre for Time Use Research, University of Oxford. [data collection]. UK Data Service. SN: 8128, http://doi.org/10.5255/UKDA-SN-8128-1)
The following code reads a UK Data Service .tab data file and its associated .rtf data dictionary file, and converts them to a Pandas DataFrame:
import ukds
dt=UKDS.DataTable(fp_tab=r'.../uktus15_household.tab'
fp_dd=r'.../uktus15_household_ukda_data_dictionary.rtf')
df=dt.get_dataframe()
The DataFrame looks like this:
User Guide
The ukds package provides two classes:
The DataTable
class
The DataTable class converts a UKDS .tab data file and .rtf data dictionary file into a single Pandas DataFrame ready for further analysis.
Importing the DataTable class
from ukds import DataTable
Creating an instance of DataTable and reading in the data file and the datadictionary file
Either:
dt=DataTable()
dt.read_tab(r'.../uktus15_household.tab')
df.read_datadictionary(r'.../uktus15_household_ukda_data_dictionary.rtf')
or:
dt=DataTable(fp_tab=r'.../uktus15_household.tab',
fp_dd=r'.../uktus15_household_ukda_data_dictionary.rtf')
Attributes
As the files are read in, a number of attributes are populated. These are:
dt.tab # a pandas.DataFrame object
dt.datadictionary # a ukds.DataDictionary object
get_dataframe method
The method get_dataframe
is available which converts the information in the tab
and datadictionary
attributes into a new pandas DataFrame.
dt=df.get_dataframe()
See the datatable_demo.ipynb Jupyter Notebook in the 'demo' section for more information.
The DataDictionary
class
The DataDictionary class provides access to UKDS .rtf data dictionary files.
Importing the DataDictionary class
from ukds import DataDictionary
Creating an instance of DataTable and reading in the data file and the datadictionary file
Either:
dd=DataDictionary()
dd.read_rtf(r'.../uktus15_household_ukda_data_dictionary.rtf')
or:
dd=DataDictionary(fp_dd=r'.../uktus15_household_ukda_data_dictionary.rtf')
Attributes
As the file are read in, a number of attributes are populated. These are:
dt.rtf # a string of the raw contents of the rtf file
dt.variablelist # a list of dictionaries with the variable information
get_variable_dict method
Returns a dictionary with the information for a single variable. For example:
serial=dd.get_variable_dict('serial')
returns:
{'pos': '1',
'variable': 'serial',
'variable_label': 'Household number',
'variable_type': 'numeric',
'SPSS_measurement_level': 'SCALE',
'SPSS_user_missing_values': '',
'value_labels': ''}
get_variable_names method
Returns a list of the variable names:
dd.get_variable_names()
See the datadictionary_demo.ipynb Jupyter Notebook in the 'demo' section for more examples based on this class.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ukds-0.0.1.tar.gz
.
File metadata
- Download URL: ukds-0.0.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1744c44cbe221e65252c1badbff80cad9010d1841611dbc157bb949c891263a4 |
|
MD5 | e83f1addd0878406e8a20fe1549bf960 |
|
BLAKE2b-256 | e5d31c2b5b57b0a0f5dbbecf060dd486fa9d20226a486bad6cf5cda27b7dbacb |
File details
Details for the file ukds-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: ukds-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1138a585fba5f0a00b43235b14ff72357d982c245da8ce638a4706ec84ea134d |
|
MD5 | 8d5eadd9f0eff4034c7804680e44e523 |
|
BLAKE2b-256 | 5207cfad686abcbf1830b3c301726e8067be18c3ac7712167f277ef64d618bfe |