Add your description here

These details have not been verified by PyPI

Project links

Project description

ALcedo PDBC 项目简介

一、概述

ALcedo PDBC ® (Python DataBase Connectivity) 是数智教育发展(山东)有限公司 AI Lab 100 团队开发的高效、灵活的数据接口(API) ，它能够使您以最快、最节省内存的方式将数据从数据库加载到 Python 中，它旨在简化大数据访问和处理，提供一种统一的方式来与各种数据源进行交互，加速 ML 和 ETL 开发过程。

ALcedo PDBC ® 与 JDBC、ODBC 类似，专注于为Python应用程序、AI模型提供访问数据的编程接口。它支持主流的 RMDBS (关系型数据库)、NoSQL ( 非结构化数据库)、DataLake (数据湖) 和 Data Warehouse (数据仓库) ，包括但不限于 MySQL、SQL SERVER、SQLite、ORACLE、MariaDB 、PostgreSQL、MongoDB、Redis、Elasticsearch、MinIO、Amazon S3、Google Cloud Storage (GCS)等。

二、技术特点及应用场景

ALcedo PDBC 具有以下技术特点：

兼容性广泛：支持多种数据库系统和文件存储服务，包括 RMDBS 关系型数据库和 NoSQL 数据库，并且兼容 MinIO 和 Flink 等大数据处理框架。
接口简明易用：将复杂的技术细节封装在内部，对 RMDBS 而言，接口统一、简洁明了。
性能优化： ALcedo PDBC 使用了缓存及多线程等技术，以提高查询速度和资源利用率。通过减少不必要的网络通信和提升批处理能力，性能比传统连接器有显著的提升。
动态代码生成： ALcedo PDBC 可以在运行时动态生成优化的查询执行代码，这种做法既保留了灵活性，又保证了效率。
灵活配置：允许用户自定义连接参数，满足不同环境和业务需求下的定制化要求。

ALcedo PDBC 的应用场景包括不限于以下：

大数据分析：在Python中进行大规模数据处理和分析，利用 ALcedo PDBC 可大大提升查询速度。
AI模型开发：在Python中进行AI模型开发过程中，利用 ALcedo PDBC 可快速的实现多源数据读取，为AI模型提供实时的数据流处理；
数据集成：可实现跨数据库迁移数据，或者将不同来源的数据集成到同一个处理平台。
云存储访问：方便地读写云存储上的文件，如azure S3，提升云计算场景下的数据操作效率。

三、数据源及输出

ALcedo PDBC 模块中通过mysql、nosql、datalake、datawarehouse 类的封装，实现了不同数据源接口的支持；

数据源:

Mysql
SQL Server
PostgreSQL
Oracle
MariaDB
SQLite
MongoDB
Elasticsearch
Redis
DynamoDB
MinIO
Amazon S3
Google Cloud Storage (GCS)
Microsoft AzureBlob
Doris
SnowFlake
BigQuery
Redshift
StarRocks
...

输出 DataFrame:

Pandas
Polars
Dask

输出 File:

CSV
Excel
JSON
HTML
HDF5
Feather
Parquet
Apache Avro

类型	数据源	DataFrame数据框	File文件	备注
Pandas	Polars	Dask
结构化SQL	MySQL	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	SQL SERVER	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	PostgreSQL	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	Oracle	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	MariaDB	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	SQLite	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
非结构化noSQL	MongoDB	✅ read ✅ write	✅ read ✅ write	✖	✅ CSV ✅ Excel ✖ JSON ✅ HTML ✖ HDF5 ✖ Feather ✖ Parquet ✖ Apache Avro
	ElasticSearch	✅ read ✅ write	✅ read ✖ write	✖	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	Redis	✅ read	✖	✖	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
	DynamoDB	✅ read ✅ write	✅ read ✅ write	✖	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro
datalake	MinIO	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	S3	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	GCS	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	AzureBlob	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
dataware house	Doris	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	SnowFlake	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	BigQuery	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	Redshift	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write
	StarRocks	✅ read ✅ write	✅ read ✅ write	✅ read ✖ write	✅ CSV ✅ Excel ✅ JSON ✅ HTML ✅ HDF5 ✅ Feather ✅ Parquet ✖ Apache Avro	✅ read ✖ write

备注：导出xlsx需要openpyxl ；Parquet列式存储数据文件； Feather压缩二进制文件。

四、快速启动

可通过 pip 安装 ALcedo PDBC，考虑到国内镜像源更新问题，安装时指定从PyPI官方源下载。

pip install --index-url https://pypi.org/simple/ alcedo-pdbc

以MySQL为例，您仅需要几行代码：

# 在 AI Lab 100 中通过ailab100.pdbc导入
# from ailab100.pdbc.sql import MySQL
from alcedo_pdbc.sql import MySQL

db_mysql = MySQL()
df = db_mysql.read_as_dataframe(table_name="public_rent_price_forecast_data",
                                params={"houseFloor":"低","totalFloor":2},
                                return_type='polars'
                                )

或者，您可以通过多线程加速数据加载。

# 在 AI Lab 100 中通过ailab100.pdbc导入

# from ailab100.pdbc.datalake import MinIO
from alcedo_pdbc.datalake import MinIO

minio_client = MinIO('127.0.0.1:9000',
                     access_key='Q3AM3UQ867SPQQA43P2F',
                     secret_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG'
                     )
minio_client.download_file(minio_path="s3://datalake/datasets/3财务困境研究数据集/ST财务预警.csv",num_threads=4)

该函数将通过将指定的线程数平均拆分为分区数来对查询进行分区，并为每个分区分配一个线程来并行加载和写入数据。

五、性能

实验室通过比较 Python 中的不同解决方案，采用 4 个线程并行处理，读取 MySQL 中一个 10,981,106 行的数据表（1,092,616,192 字节，1.02 GB）加载到 DataFrame 中，实验结果如下：

1.响应时间 (越短越好)

2.内存消耗 (越低越好)

总之，ALcedo PDBC 使用的内存减少了 1/3 ，与 Pandas 相比响应时间减少了近 1 倍（与 Polars 相比，响应时间相差无几）。

六、生态系统

七、版本更新

0.1.2b0

此版本增加了redis缓存配置，提高数据读取速度；
可通过系统环境变量定义各类数据库的连接信息；

0.1.3b0

增加命令行界面 CLI和docs文档集成，可通过alcedo-cli docs启动文档；
增加了自动化编译打包的脚本文件，支持
增加了各类数据库docker-compose部署模板,详见samples_docker目录中的docker-compose-db.yml

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3b0 pre-release

Aug 11, 2025

0.1.2a0 pre-release

Apr 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alcedo_pdbc-0.1.3b0-py3-none-any.whl (60.9 MB view details)

Uploaded Aug 11, 2025 Python 3

File details

Details for the file alcedo_pdbc-0.1.3b0-py3-none-any.whl.

File metadata

Download URL: alcedo_pdbc-0.1.3b0-py3-none-any.whl
Upload date: Aug 11, 2025
Size: 60.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.0

File hashes

Hashes for alcedo_pdbc-0.1.3b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca9fbbe89e3bbeb4940cc0c095135ae9b3598393cb94892ab045bea35f0f980e`
MD5	`11b7439a8d3f11a858f4940ab1efc1ee`
BLAKE2b-256	`e7edf57fd0e647d713d752446a040ee2641f412ab68162416d77fe86f7d08bd1`

See more details on using hashes here.

alcedo-pdbc 0.1.3b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ALcedo PDBC 项目简介

一、概述

二、技术特点及应用场景

三、数据源及输出

四、快速启动

五、性能

六、生态系统

七、版本更新

0.1.2b0

0.1.3b0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes