Amphibious new data transformer to prepare various sources for CGP DSS Data Loader
Metadata transformer to convert from gen3 to something readable by cgp-dss-data-loader
(optional) We recommend using a Python 3 virtual environment.
pip3 install newt-transformer
Setup for Development
Clone the repo:
git clone https://github.com/jessebrennan/newt-transformer.git
Go to the root directory of the cloned project:
Run (ideally in a new virtual environment):
make sure you followed Setup for Development
Transforming data from sheepdog-exporter
The first step is to extract the Gen3 data you want using the sheepdog exporter. The TopMed public data extracted from sheepdog is available on the release page under Assets. Assuming you use this data, you will now have a file called
Make sure you are running the virtual environment you set up in the Setup instructions.
Now we need to transform the data. From the root of the project run:
newt new /path/to/topmed-public.json --output-json transformed-topmed-public.json
This will generate a transformed output file called
newargument specifies that we want the most recent version of the transformer output format. It can be replaced with a
gen3argument, but this older format will soon be deprecated.
Likely you want to upload this data to the DSS. Instructions for this can be found at the DSS data loader repo.