Lightweight Python helpers for YTsaurus, YQL, CHYT, and pandas analytics workflows.
Project description
YTsaurus Python Client
A lightweight Python helper library for day-to-day work with YTsaurus YQL CHYT and pandas DataFrames
PyPI · GitHub · Example notebook · Issues
The project wraps common analytics workflows into a small, readable API:
- run YQL queries and return results as
pandas.DataFrame - start long-running YQL queries without blocking the notebook
- write YQL results directly into YTsaurus tables
- read large query outputs through temporary YTsaurus tables with progress reporting
- execute CHYT queries through HTTP or the YTsaurus CLI
- upload pandas DataFrames into YTsaurus tables
This repository is designed as a clean portfolio-friendly version of the client: no company-specific hosts, pools, paths, tokens, or internal links are hardcoded
Installation
pip install ytsaurus_python_client
pip install -e .
For production packaging:
python -m build
pip install dist/ytsaurus_python_client-*.whl
Requirements
- Python 3.9+
pandasrequestsnumpy- YTsaurus Python client with
yt.wrapper - Optional: YTsaurus CLI binary
ytfor CLI-based CHYT helpers
Configuration
The library is configured through environment variables or explicit constructor arguments.
| Variable | Purpose | Default |
|---|---|---|
YT_PROXY |
YTsaurus proxy host | empty |
YT_TOKEN |
OAuth/token value used by HTTP CHYT helpers | read from YT_TOKEN_PATH |
YT_TOKEN_PATH |
Path to a local token file | ~/.yt/token |
YT_DEFAULT_TEMP_DIR |
Temp folder for large YQL result materialization | //tmp/ytsaurus-python-client |
YT_POOL |
Optional YQL pool pragma | unset |
YT_UI_BASE_URL |
Optional web UI base URL used only for printed links | unset |
CHYT_HOST |
CHYT HTTP host | YT_PROXY |
CHYT_PORT |
CHYT HTTP port | 8123 |
CHYT_CLIQUE_ALIAS |
Default CHYT clique alias | ch_public |
YT_BINARY |
YTsaurus CLI binary name/path | yt |
Example:
export YT_PROXY="your-ytsaurus-proxy.example.com"
export YT_TOKEN_PATH="$HOME/.yt/token"
export YT_DEFAULT_TEMP_DIR="//home/your-login/tmp"
export CHYT_CLIQUE_ALIAS="ch_public"
Quick start
Run a YQL query
from ytsaurus_python_client import YTsaurusHook
hook = YTsaurusHook(
yt_proxy="your-ytsaurus-proxy.example.com",
yt_query_result_temp_dir="//home/your-login/tmp",
)
df = hook.yql("""
SELECT
1 AS id,
"hello" AS value;
""")
print(df)
Start a long-running query and return the query ID
query_id = hook.yql(
"""
INSERT INTO `//home/your-login/output_table`
SELECT *
FROM `//home/your-login/source_table`;
""",
wait=False,
)
print(query_id)
Execute a query and wait without reading the result
query_id = hook.yql_wait("""
CREATE TABLE `//home/your-login/example_table` (
id Int64,
value String
);
""")
Materialize a large YQL result into a temp table and read it in chunks
df = hook.yql_unlim(
"""
SELECT *
FROM `//home/your-login/large_table`;
""",
chunksize=500_000,
)
Upload a DataFrame to YTsaurus
import pandas as pd
from ytsaurus_python_client import YTsaurusHook
hook = YTsaurusHook(yt_proxy="your-ytsaurus-proxy.example.com")
df = pd.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
schema = hook.generate_yt_schema(df)
hook.upload_df_to_yt(
df=df,
yt_path="//home/your-login/users",
schema=schema,
overwrite=True,
)
Run a CHYT query over HTTP
from ytsaurus_python_client import chyt_df
df = chyt_df(
"""
SELECT 1 AS ok
""",
host="your-chyt-host.example.com",
clique_alias="ch_public",
)
Run a CHYT query through the YTsaurus CLI
from ytsaurus_python_client import chyt_df_cli
df = chyt_df_cli(
"SELECT 1 AS ok",
yt_proxy="your-ytsaurus-proxy.example.com",
clique_alias="ch_public",
)
Public API
from ytsaurus_python_client import (
YTsaurusHook,
DOYTHook, # backward-compatible alias
chyt_df,
chyt_raw,
chyt_to_yt,
chyt_df_cli,
chyt_raw_cli,
chyt_to_yt_cli,
chyt_check_cli,
)
Design notes
- Defaults are intentionally generic and safe for public repositories
- Secrets are never hardcoded. Use
YT_TOKEN,YT_TOKEN_PATH, or explicit arguments - Printed YTsaurus UI links are optional and controlled by
YT_UI_BASE_URL - YQL pragmas can be provided through
query_pragma_configor environment variables such asYT_POOL DOYTHookis kept as a backward-compatible alias; new code should preferYTsaurusHook
Repository hygiene
Before publishing, the project was cleaned from:
- macOS metadata files
- Python cache files
- internal company hosts and UI links
- internal pools and temp paths
- Russian comments and runtime messages
- local tokens or secret values
License
MIT © 2026 Alexey Voronko
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ytsaurus_python_client-0.1.0.tar.gz.
File metadata
- Download URL: ytsaurus_python_client-0.1.0.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18939730c12e191bf9ba571b831303229405c58c4cd2e7730f336678ad6a67d8
|
|
| MD5 |
b9eb0c5ebab1ca1070aea2d627125d83
|
|
| BLAKE2b-256 |
a78b58899c0c6f6584e53514da90d809606e540a1c002296ebad08235b12f1ce
|