Use pandas with clinicedc/edc projects
Project description
edc-pdutils
Use pandas with the Edc
Using the management command to export to CSV and STATA
The export_models management command requires you to log in with an account that has export permissions.
The basic command requires an app_label (-a) and a path to the export folder (-p)
By default, the export format is CSV but delimited using the pipe delimiter, |.
Export one or more modules
python manage.py export_models -a ambition_subject -p /ambition/export
The -a excepts more than one app_label
python manage.py export_models -a ambition_subject,ambition_prn,ambition_ae -p /ambition/export
Export data in CSV format or STATA format
To export as CSV where the delimiter is |
python manage.py export_models -a ambition_subject -p /ambition/export
To export as STATA dta use option -f stata
python manage.py export_models -a ambition_subject -p /ambition/export -f stata
Export encrypted data
To export encrypted fields include option --decrypt:
python manage.py export_models -a ambition_subject -p /ambition/export --decrypt
Note: If using the --decrypt option, the user account will need PII_EXPORT permissions
Export with a simple file name
To export using a simpler filename that drops the tablename app_label prefix and does not include a datestamp suffix.
Add option --use_simple_filename.
python manage.py export_models -a ambition_subject -p /ambition/export --use_simple_filename
Export for a country only
Add option --country.
python manage.py export_models -a ambition_subject -p /ambition/export --country="uganda"
Export manually
To export Crf data, for example:
from edc_pdutils.df_exporters import CsvCrfTablesExporter
from edc_pdutils.df_handlers import CrfDfHandler
app_label = 'ambition_subject'
csv_path = '/Users/erikvw/Documents/ambition/export/'
date_format = '%Y-%m-%d'
sep = '|'
exclude_history_tables = True
class MyDfHandler(CrfDfHandler):
visit_tbl = f'{app_label}_subjectvisit'
exclude_columns = ['form_as_json', 'survival_status','last_alive_date',
'screening_age_in_years', 'registration_datetime',
'subject_type']
class MyCsvCrfTablesExporter(CsvCrfTablesExporter):
visit_column = 'subject_visit_id'
datetime_fields = ['randomization_datetime']
df_handler_cls = MyDfHandler
app_label = app_label
export_folder = csv_path
sys.stdout.write('\n')
exporter = MyCsvCrfTablesExporter(
export_folder=csv_path,
exclude_history_tables=exclude_history_tables
)
exporter.to_csv(date_format=date_format, delimiter=sep)
To export INLINE data for any CRF configured with an inline, for example:
class MyDfHandler(CrfDfHandler):
visit_tbl = 'ambition_subject_subjectvisit'
exclude_columns = ['form_as_json', 'survival_status','last_alive_date',
'screening_age_in_years', 'registration_datetime',
'subject_type']
class MyCsvCrfInlineTablesExporter(CsvCrfInlineTablesExporter):
visit_columns = ['subject_visit_id']
df_handler_cls = MyDfHandler
app_label = 'ambition_subject'
export_folder = csv_path
exclude_inline_tables = [
'ambition_subject_radiology_abnormal_results_reason',
'ambition_subject_radiology_cxr_type']
sys.stdout.write('\n')
exporter = MyCsvCrfInlineTablesExporter()
exporter.to_csv(date_format=date_format, delimiter=sep)
Using model_to_dataframe
from edc_pdutils.model_to_dataframe import ModelToDataframe
from edc_pdutils.utils import get_model_names
from edc_pdutils.df_exporters.csv_exporter import CsvExporter
app_label = 'ambition_subject'
csv_path = '/Users/erikvw/Documents/ambition/export/'
date_format = '%Y-%m-%d'
sep = '|'
for model_name in get_model_names(
app_label=app_label,
# with_columns=with_columns,
# without_columns=without_columns,
):
m = ModelToDataframe(model=model_name)
exporter = CsvExporter(
data_label=model_name,
date_format=date_format,
delimiter=sep,
export_folder=csv_path,
)
exported = exporter.to_csv(dataframe=m.dataframe)
Settings
EXPORT_FILENAME_TIMESTAMP_FORMAT: True/False (Default: False)
By default a timestamp of the current date is added as a suffix to CSV export filenames.
By default a timestamp of format %Y%m%d%H%M%S is added.
EXPORT_FILENAME_TIMESTAMP_FORMAT may be set to an empty string or a valid format for strftime.
If EXPORT_FILENAME_TIMESTAMP_FORMAT is set to an empty string, “”, a suffix is not added.
For example:
# default
registered_subject_20190203112555.csv
# EXPORT_FILENAME_TIMESTAMP_FORMAT = "%Y%m%d"
registered_subject_20190203.csv
# EXPORT_FILENAME_TIMESTAMP_FORMAT = ""
registered_subject.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file edc_pdutils-0.3.45.tar.gz
.
File metadata
- Download URL: edc_pdutils-0.3.45.tar.gz
- Upload date:
- Size: 62.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3156b3e14b02c91bab97a168e52818030199827d1fedf9eaf7bb861535d4228 |
|
MD5 | c1a3504c767e9ada5f58eac1b6fdfc30 |
|
BLAKE2b-256 | d499f3adacba8b0fb78adb3afd95bc88483d2832ccb6495179a4b7b7f246da66 |
File details
Details for the file edc_pdutils-0.3.45-py3-none-any.whl
.
File metadata
- Download URL: edc_pdutils-0.3.45-py3-none-any.whl
- Upload date:
- Size: 80.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe50aa74bee655b2f067887a7431d285f05f33634c7bbc6bb5cfe011d3f018dc |
|
MD5 | f02af7400153c7e1bb7113db63afb0f2 |
|
BLAKE2b-256 | b52549af0c2b63970664734a5cbd3fe63c24aee9a7c172dd60ee2a92c5f8babc |