ncbi-api·PyPI

NCBI数据下载及解析

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

安装

pip install ncbi_api

或者

pip install ncbi_api -i https://pypi.python.org/simple

使用

Geo数据下载

Accession列表

example

from ncbi_api.geo import AccessionDownloader

downloader = AccessionDownloader()
downloader.start(page_nums=299, display=500)

series_matrix

example1：传入accession列表

from ncbi_api.geo import GeoDownloader, GeoDataType

accessions = ['GSE113138', 'GSE171935', 'GSE164612', 'GSE166066']

geo = GeoDownloader(accession_list=accessions)
geo.run(data_type=GeoDataType.SeriesMatrix, workers=4)

运行

-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
正在读取 Accession 列表
	--> 共有 Accession 数量：4
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
成功读取并完成过滤 -> takes 0.104 seconds
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE166nnn/GSE166066/matrix/
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE171nnn/GSE171935/matrix/
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE113nnn/GSE113138/matrix/
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE164nnn/GSE164612/matrix/
GSE113138 200
GSE164612 200
GSE171935 200
GSE166066 200
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
正在读取 Accession 列表
	--> 共有 Accession 数量：4, 成功下载：4, 下载失败(404)：0,  还剩：0
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
成功读取并完成过滤 -> takes 0.046 seconds
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
正在读取 url 列表
	--> 共有 url 数量：4
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
成功读取并完成过滤 -> takes 0.004 seconds
正在下载 -> GSE113138_series_matrix.txt.gz -> 3.18 KB: 4KB [00:00, 1999.91KB/s]                   
正在下载 -> GSE166066_series_matrix.txt.gz -> 3.01 KB: 4KB [00:00, 1333.64KB/s]                   
正在下载 -> GSE164612_series_matrix.txt.gz -> 2.21 MB:  21%|█    | 465/2258 [00:31<02:02, 14.60KB/s]

example2：传入series_matrix_url列表

from ncbi_api.geo import GeoDownloader, GeoDataType

series_matrix_urls = {
    'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE93nnn/GSE93247/matrix/GSE93247_series_matrix.txt.gz',
    'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE64nnn/GSE64216/matrix/GSE64216_series_matrix.txt.gz',
    'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE53nnn/GSE53596/matrix/GSE53596-GPL13534_series_matrix.txt.gz'
}
geo = GeoDownloader()
geo.run(data_type=GeoDataType.SeriesMatrix, series_matrix_urls=series_matrix_urls)

example3: 传入series_matrix_url文件路径

history/series_matrix_urls.txt

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE41nnn/GSE41032/matrix/GSE41032_series_matrix.txt.gz
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE65nnn/GSE65908/matrix/GSE65908_series_matrix.txt.gz
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1183/matrix/GSE1183_series_matrix.txt.gz
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE130nnn/GSE130755/matrix/GSE130755_series_matrix.txt.gz
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE31nnn/GSE31020/matrix/GSE31020_series_matrix.txt.gz
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE23nnn/GSE23218/matrix/GSE23218_series_matrix.txt.gz
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34117/matrix/GSE34117_series_matrix.txt.gz
...

示例代码

from ncbi_api.geo import GeoDownloader, GeoDataType

geo = GeoDownloader()
geo.run(data_type=GeoDataType.SeriesMatrix, workers=4, series_matrix_url_filepath='history/series_matrix_urls.txt')

运行

-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
正在读取 url 列表
	--> 共有 url 数量：4, 成功下载：3, 下载失败(404)：0,  还剩：1
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
成功读取并完成过滤 -> takes 0.002 seconds

正在下载 -> GSE171935_series_matrix.txt.gz -> 1.86 KB: 2KB [00:00, 666.93KB/s]                     
正在下载 -> GSE113138_series_matrix.txt.gz -> 3.18 KB: 4KB [00:00, 1335.45KB/s]                   
正在下载 -> GSE166066_series_matrix.txt.gz -> 3.02 KB: 4KB [00:00, 14.50KB/s]                     
正在下载 -> GSE164612_series_matrix.txt.gz -> 2.21 MB:  49%|█▉  | 1105/2258 [00:46<00:28, 41.07KB/s]

Geo数据解压缩

example1

from zyf.file import scan_directory_contents

from ncbi_api.compress import Gzip

file_list = scan_directory_contents('download')

gzip = Gzip()
gzip.batch_file_unzip(file_list, unzip_dir='unzip')

example2

from ncbi_api.geo import GeoDownloader, GeoDataType
from zyf.file import scan_directory_contents
import os
import re
from ncbi_api.compress import Gzip


def start_unzip():
    file_list = set(scan_directory_contents('download'))
    unzip_dir = 'unzip'
    history_dir = 'history'
    success_filepath, error_filepath = f'{history_dir}/unzip_success.txt', f'{history_dir}/unzip_error.txt'
    while True:
        gzip = Gzip(history_dir=history_dir, success_filepath=success_filepath, error_filepath=error_filepath)
        gzip.batch_file_unzip(file_list, unzip_dir=unzip_dir)

        if os.path.exists(error_filepath):
            series_matrix_urls = set()
            with open(error_filepath, mode='r') as f:
                for line in f:
                    accession = re.search(r'download\\(.*?)_series_matrix.txt.gz', line).group(1)
                    url = f'https://ftp.ncbi.nlm.nih.gov/geo/series/{accession.split("-")[0][:-3]}nnn/{accession.split("-")[0]}/matrix/{accession}_series_matrix.txt.gz'
                    series_matrix_urls.add(url)
            if series_matrix_urls:
                print('正在对解压失败文件进行重新下载')
                geo = GeoDownloader(success_filepath='history/download_unzipfailed_success.txt', error_filepath='history/download_unzipfailed_error.txt')
                geo.run(data_type=GeoDataType.SeriesMatrix, series_matrix_urls=series_matrix_urls)
            else:
                return


if __name__ == '__main__':
    start_unzip()

Geo数据解析

example

from zyf.file import scan_directory_contents

from ncbi_api.geo import GeoParser, GeoDataType

file_list = scan_directory_contents('unzip')
parser = GeoParser(file_list=file_list)
parser.run(GeoDataType.SeriesMatrix, workers=8)

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0

May 10, 2021

0.9

May 10, 2021

0.8

May 10, 2021

0.7

May 9, 2021

0.6

May 6, 2021

0.5

May 6, 2021

0.4

Apr 19, 2021

0.3

Apr 18, 2021

0.2

Apr 17, 2021

0.1

Apr 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi_api-1.0.tar.gz (10.1 kB view details)

Uploaded May 10, 2021 Source

Built Distribution

ncbi_api-1.0-py3-none-any.whl (9.8 kB view details)

Uploaded May 10, 2021 Python 3

File details

Details for the file ncbi_api-1.0.tar.gz.

File metadata

Download URL: ncbi_api-1.0.tar.gz
Upload date: May 10, 2021
Size: 10.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for ncbi_api-1.0.tar.gz
Algorithm	Hash digest
SHA256	`f20c0f9d3ed6c5345f703c760c87ba4c7cb61a8f13327f5ee65ceb5f5d5415f4`
MD5	`4d6c549d07b83e2bc20cb59655f1c719`
BLAKE2b-256	`fa92ea54175c5ab35115b70deec3400b5c375829f943c4cbc2413fc44efaa149`

See more details on using hashes here.

File details

Details for the file ncbi_api-1.0-py3-none-any.whl.

File metadata

Download URL: ncbi_api-1.0-py3-none-any.whl
Upload date: May 10, 2021
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for ncbi_api-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`039f7428aa5aa9b20f9a24c5d773b3b6b47252c2f21835d2baecf40a2df263e8`
MD5	`961e009fcbf867677fe1ac2b354b1f95`
BLAKE2b-256	`fac2ec81a2cef0425a475e89a764495f953ec6b3a6754fcc0fa6f4320df32ec6`

See more details on using hashes here.

ncbi-api 1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

安装

使用

Geo数据下载

Accession列表

series_matrix

Geo数据解压缩

Geo数据解析

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes