Skip to main content

A data dump tool

Project description

ddump

ddump(Data Dump)数据转存工具。主要解决以下问题:

  1. 数据库的增量转存
  2. API数据转存的通用模式
  3. 数据的本地文件组织方案

本工具定位是数据下载,文件目录和文件名的组织方式以实现增量下载和减少下载量为首要目标,读取是否方便为次要目标。
用户可能需要根据自己的使用习惯,将数据转存成其它格式,如导入到数据库等

为何使用文件存储,而不用数据库

  1. 没有表结构的情况下,to_sql保存格式的效率很低,提前准备表结构又麻烦
  2. 金融类数据特殊,并不需要随机访问。全量加载或按日期加载都是更常用的方法
  3. 数据备份分享时,文件更方便

为何采用Parquet文件格式

  1. csv格式,文本格式,读写慢,容易丢失精度
  2. pickle格式,只能在Python下使用
  3. HDF5格式,强大灵活,跨语言
  4. parquet格式,列式存储,支持直接读取文件夹。跨语言,常用于大数据处理

安装

pip install ddump -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade # 国内镜像下载

pip install ddump -i https://pypi.org/simple --upgrade # 国外官方源下载

开发

pip install -e .

数据库转存

请访问 数据库转存文档

API转存

请访问 API转存文档

数据库工具

在开发本项目时,提炼了一个数据库ORM工具,它是对sqlalchemy的进一步封装,简单易用,可以直接映射已经存在的表。使用方法仿照聚宽的数据接口。

from ddump.db.tool import DbTool

db = DbTool(url="mysql+pymysql://user:pasword@127.0.0.1:3306/tushare?charset=utf8")
db.show_tables()

db.describe('FDT_STK_AUDIT')

q = db.query(db.FDT_STK_AUDIT).limit(10)
df = db.run_query(q)
df

样例

参考 examples,内有常见的几个库的调用示例,欢迎大家提供更多的案例

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddump-0.2.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddump-0.2.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file ddump-0.2.0.tar.gz.

File metadata

  • Download URL: ddump-0.2.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for ddump-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1e295e237df0d502e39ee578a2791b63f1032ed5d943efe02c8e72051a84340a
MD5 830b12f54618854c030e304b6049e7fd
BLAKE2b-256 85d48c45d4329c036fed069057ad5b117d535be3a8339a704a0e3a6bf5402291

See more details on using hashes here.

File details

Details for the file ddump-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ddump-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for ddump-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67ce789db420c91cf973063e6c807c62d0a62a19b38c79f6bb49fdecbd082b59
MD5 147edf86af1c77cc8f2321eb3d439b4a
BLAKE2b-256 a7750a5a993f18fc48944ef76ba6940dd7c36f833050fc92c0f0f6859f7bffca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page