No project description provided
Project description
UKBsearch
UKBsearch is a search tool to retreive term(or terms) from UKBiobank HTML files and tab files downloaded in the local drive.
Installation
- from pypi
pip install ukbsearch
- from github
pip install https://github.com/danielmsk/ukbsearch/raw/main/dist/ukbsearch-0.2.1-py3-none-any.whl
Dependency
This UKBsearch requires the following packages:
- rich
- pyreadr
- prettytable
- pandas
- pytabix
Options
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-s, --searchterm search. terms (ex: age smoking)
-s age
-s age smoking
-s 'smok*'
-s '*age' 'smok*'
-l, --logic logical operator for multiple terms [or(default), and]
-s '*age' 'smok*' -l and
-s age 'smok*' -l or
-o, --out title of output file
-o searchresult_20220322
-t, --outtype output type [console(default), csv, udi]
-t csv
-t console csv
-t udi
-t console udi
-p, --path directory path for data files (.html, .Rdata) (default: /data2/UKbiobank/ukb_phenotype)
-p /other/path/for/ukb/html/.
-u, --udilist FileID and UDI list for saving data from tcf files
-u ukb39003 3536-0.0 3536-1.0 3536-2.0
-d, --savedata save data from .Rdata [csv, rdata]
-d csv
-d rdata
-d csv rdata
-i, --index
index tab file and make tcf file (ex. ukb39003.tab)
Usage
Search result
ukbsearch -s 'ag*' 'smok*' -l and
Search for single term
ukbsearch -s age
ukbsearch --searchterm age
ukbsearch -s 'ag*'
ukbsearch -s '*ge'
Search for multiple terms
- The logical operators (
and
oror
) are supported.
ukbsearch -s age smoking
ukbsearch -s age smoking -l or
ukbsearch -s age smoking -l and
ukbsearch -s 'ag*' 'smok*' -l and
Print only html and UDI
ukbsearch -s 'ag*' 'smok*' -l and -t udi
Save the search result as csv file
ukbsearch -s 'ag*' 'rep*' -l and -o test1 -t csv
(= ukbsearch --searchterm 'ag*' 'rep*' --logic and --out test1 --outtype csv)
ukbsearch -s 'ag*' 'rep*' -l and -o test1 -t console csv
ukbsearch -s 'ag*' 'rep*' -l and -o test1 -t console udi csv
Set a particular directory
- The default path is
/data2/UKbiobank/ukb_phenotype
.
ukbsearch -s age -p /other/path/for/ukb/html/.
Index tab file
ukbsearch -i ukb26086.tab
This step generates .tab.tcf.gz
, .tab.tcf.gz.tbi
, and .tab.tcf.gz.idx
. After generating tcf files, the tab file is no longer required to search.
Save data (.csv and .rdata) from .tcf.gz
ukbsearch -u ukb39003 3536-0.0 3536-1.0 3536-2.0 -d csv -o test3
(=ukbsearch --udilist ukb39003 3536-0.0 3536-1.0 3536-2.0 --savedata csv --out test3)
ukbsearch -u ukb39003 3536-0.0 3536-1.0 ukb26086 20161-0.0 21003-1.0 -d csv rdata -o test3
ukbsearch -s 'ag*' 'rep*' -l and -d csv -o test3
ukbsearch -s 'ag*' 'rep*' -l and -d rdata -o test3
Version History
- 0.2.1 (2022-03-25)
- add csvi (inversed form) option.
- debug unsaved values issue.
- 0.2.0 (2022-03-24)
- implementated tab file indexing based on tabix.
- 0.1.1 (2022-03-23)
- changed default path to
/data2/UKbiobank/ukb_phenotype
- changed default path to
- 0.1.0 (2022-03-21)
- first released.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ukbsearch-0.2.1.tar.gz
(12.6 kB
view hashes)
Built Distribution
ukbsearch-0.2.1-py3-none-any.whl
(13.2 kB
view hashes)
Close
Hashes for ukbsearch-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b07085aa92563421127556d8aa885df58470f6bd7a4995addb0262c4c2255a7 |
|
MD5 | 93c49ad5a0a463a9871d4b903f346cde |
|
BLAKE2b-256 | 4b6be32205940e03ba597f2b37348c65989dd1b5565d97083f7848cbe494eee9 |