Skip to main content

Lightweight Python helpers for YTsaurus, YQL, CHYT, and pandas analytics workflows.

Project description

YTsaurus Python Client

A lightweight Python helper library for day-to-day work with YTsaurus YQL CHYT and pandas DataFrames

PyPI version Python 3.9+ MIT License Project status Notebook friendly

PyPI · GitHub · Example notebook · Issues

The project wraps common analytics workflows into a small, readable API:

  • run YQL queries and return results as pandas.DataFrame
  • start long-running YQL queries without blocking the notebook
  • write YQL results directly into YTsaurus tables
  • read large query outputs through temporary YTsaurus tables with progress reporting
  • execute CHYT queries through HTTP or the YTsaurus CLI
  • upload pandas DataFrames into YTsaurus tables

This repository is designed as a clean portfolio-friendly version of the client: no company-specific hosts, pools, paths, tokens, or internal links are hardcoded

Installation

pip install ytsaurus_python_client
pip install -e .

For production packaging:

python -m build
pip install dist/ytsaurus_python_client-*.whl

Requirements

  • Python 3.9+
  • pandas
  • requests
  • numpy
  • YTsaurus Python client with yt.wrapper
  • Optional: YTsaurus CLI binary yt for CLI-based CHYT helpers

Configuration

The library is configured through environment variables or explicit constructor arguments.

Variable Purpose Default
YT_PROXY YTsaurus proxy host empty
YT_TOKEN OAuth/token value used by HTTP CHYT helpers read from YT_TOKEN_PATH
YT_TOKEN_PATH Path to a local token file ~/.yt/token
YT_DEFAULT_TEMP_DIR Temp folder for large YQL result materialization //tmp/ytsaurus-python-client
YT_POOL Optional YQL pool pragma unset
YT_UI_BASE_URL Optional web UI base URL used only for printed links unset
CHYT_HOST CHYT HTTP host YT_PROXY
CHYT_PORT CHYT HTTP port 8123
CHYT_CLIQUE_ALIAS Default CHYT clique alias ch_public
YT_BINARY YTsaurus CLI binary name/path yt

Example:

export YT_PROXY="your-ytsaurus-proxy.example.com"
export YT_TOKEN_PATH="$HOME/.yt/token"
export YT_DEFAULT_TEMP_DIR="//home/your-login/tmp"
export CHYT_CLIQUE_ALIAS="ch_public"

Quick start

Run a YQL query

from ytsaurus_python_client import YTsaurusHook

hook = YTsaurusHook(
    yt_proxy="your-ytsaurus-proxy.example.com",
    yt_query_result_temp_dir="//home/your-login/tmp",
)

df = hook.yql("""
SELECT
    1 AS id,
    "hello" AS value;
""")

print(df)

Start a long-running query and return the query ID

query_id = hook.yql(
    """
    INSERT INTO `//home/your-login/output_table`
    SELECT *
    FROM `//home/your-login/source_table`;
    """,
    wait=False,
)

print(query_id)

Execute a query and wait without reading the result

query_id = hook.yql_wait("""
CREATE TABLE `//home/your-login/example_table` (
    id Int64,
    value String
);
""")

Materialize a large YQL result into a temp table and read it in chunks

df = hook.yql_unlim(
    """
    SELECT *
    FROM `//home/your-login/large_table`;
    """,
    chunksize=500_000,
)

Upload a DataFrame to YTsaurus

import pandas as pd

from ytsaurus_python_client import YTsaurusHook

hook = YTsaurusHook(yt_proxy="your-ytsaurus-proxy.example.com")

df = pd.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
schema = hook.generate_yt_schema(df)

hook.upload_df_to_yt(
    df=df,
    yt_path="//home/your-login/users",
    schema=schema,
    overwrite=True,
)

Run a CHYT query over HTTP

from ytsaurus_python_client import chyt_df

df = chyt_df(
    """
    SELECT 1 AS ok
    """,
    host="your-chyt-host.example.com",
    clique_alias="ch_public",
)

Run a CHYT query through the YTsaurus CLI

from ytsaurus_python_client import chyt_df_cli

df = chyt_df_cli(
    "SELECT 1 AS ok",
    yt_proxy="your-ytsaurus-proxy.example.com",
    clique_alias="ch_public",
)

Public API

from ytsaurus_python_client import (
    YTsaurusHook,
    DOYTHook,          # backward-compatible alias
    chyt_df,
    chyt_raw,
    chyt_to_yt,
    chyt_df_cli,
    chyt_raw_cli,
    chyt_to_yt_cli,
    chyt_check_cli,
)

Design notes

  • Defaults are intentionally generic and safe for public repositories
  • Secrets are never hardcoded. Use YT_TOKEN, YT_TOKEN_PATH, or explicit arguments
  • Printed YTsaurus UI links are optional and controlled by YT_UI_BASE_URL
  • YQL pragmas can be provided through query_pragma_config or environment variables such as YT_POOL
  • DOYTHook is kept as a backward-compatible alias; new code should prefer YTsaurusHook

Repository hygiene

Before publishing, the project was cleaned from:

  • macOS metadata files
  • Python cache files
  • internal company hosts and UI links
  • internal pools and temp paths
  • Russian comments and runtime messages
  • local tokens or secret values

License

MIT © 2026 Alexey Voronko

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ytsaurus_python_client-0.1.0.tar.gz (22.0 kB view details)

Uploaded Source

File details

Details for the file ytsaurus_python_client-0.1.0.tar.gz.

File metadata

  • Download URL: ytsaurus_python_client-0.1.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for ytsaurus_python_client-0.1.0.tar.gz
Algorithm Hash digest
SHA256 18939730c12e191bf9ba571b831303229405c58c4cd2e7730f336678ad6a67d8
MD5 b9eb0c5ebab1ca1070aea2d627125d83
BLAKE2b-256 a78b58899c0c6f6584e53514da90d809606e540a1c002296ebad08235b12f1ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page