Skip to main content

Treasure Data API library for Python

Project description

Build Status on GitHub Actions PyPI version

Treasure Data API library for Python

Requirements

td-client supports the following versions of Python.

  • Python 3.10+

  • PyPy

Install

You can install the releases from PyPI.

$ pip install td-client

It’d be better to install certifi to enable SSL certificate verification.

$ pip install certifi

Examples

Please see also the examples at Treasure Data Documentation.

The td-client documentation is hosted at https://tdclient.readthedocs.io/, or you can go directly to the API documentation.

For information on the parameters that may be used when reading particular types of data, see File import parameters.

Listing jobs

Treasure Data API key will be read from environment variable TD_API_KEY, if none is given via apikey= argument passed to tdclient.Client.

Treasure Data API endpoint https://api.treasuredata.com is used by default. You can override this with environment variable TD_API_SERVER, which in turn can be overridden via endpoint= argument passed to tdclient.Client. List of available Treasure Data sites and corresponding API endpoints can be found here.

import tdclient

with tdclient.Client() as td:
    for job in td.jobs():
        print(job.job_id)

Running jobs

Running jobs on Treasure Data.

import tdclient

with tdclient.Client() as td:
    job = td.query("sample_datasets", "SELECT COUNT(1) FROM www_access", type="hive")
    job.wait()
    for row in job.result():
        print(repr(row))

Running jobs via DBAPI2

td-client-python implements PEP 0249 Python Database API v2.0. You can use td-client-python with external libraries which supports Database API such like pandas.

import pandas
import tdclient

def on_waiting(cursor):
    print(cursor.job_status())

with tdclient.connect(db="sample_datasets", type="presto", wait_callback=on_waiting) as td:
    data = pandas.read_sql("SELECT symbol, COUNT(1) AS c FROM nasdaq GROUP BY symbol", td)
    print(repr(data))

We offer another package for pandas named pytd with some advanced features. You may prefer it if you need to do complicated things, such like exporting result data to Treasure Data, printing job’s progress during long execution, etc.

Importing data

Importing data into Treasure Data in streaming manner, as similar as fluentd is doing.

import sys
import tdclient

with tdclient.Client() as td:
    for file_name in sys.argv[:1]:
        td.import_file("mydb", "mytbl", "csv", file_name)

Bulk import

Importing data into Treasure Data in batch manner.

import sys
import tdclient
import uuid
import warnings

if len(sys.argv) <= 1:
    sys.exit(0)

with tdclient.Client() as td:
    session_name = "session-{}".format(uuid.uuid1())
    bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
    try:
        for file_name in sys.argv[1:]:
            part_name = "part-{}".format(file_name)
            bulk_import.upload_file(part_name, "json", file_name)
        bulk_import.freeze()
    except:
        bulk_import.delete()
        raise
    bulk_import.perform(wait=True)
    if 0 < bulk_import.error_records:
        warnings.warn("detected {} error records.".format(bulk_import.error_records))
    if 0 < bulk_import.valid_records:
        print("imported {} records.".format(bulk_import.valid_records))
    else:
        raise(RuntimeError("no records have been imported: {}".format(bulk_import.name)))
    bulk_import.commit(wait=True)
    bulk_import.delete()

If you want to import data as msgpack format, you can write as follows:

import io
import time
import uuid
import warnings

import tdclient

t1 = int(time.time())
l1 = [{"a": 1, "b": 2, "time": t1}, {"a": 3, "b": 9, "time": t1}]

with tdclient.Client() as td:
    session_name = "session-{}".format(uuid.uuid1())
    bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
    try:
        _bytes = tdclient.util.create_msgpack(l1)
        bulk_import.upload_file("part", "msgpack", io.BytesIO(_bytes))
        bulk_import.freeze()
    except:
        bulk_import.delete()
        raise
    bulk_import.perform(wait=True)
    # same as the above example

Changing how CSV and TSV columns are read

The td-client package will generally make sensible choices on how to read the columns in CSV and TSV data, but sometimes the user needs to override the default mechanism. This can be done using the optional file import parameters dtypes and converters.

For instance, consider CSV data that starts with the following records:

time,col1,col2,col3
1575454204,a,0001,a;b;c
1575454204,b,0002,d;e;f

If that data is read using the defaults, it will produce values that look like:

1575454204, "a", 1, "a;b;c"
1575454204, "b", 2, "d;e;f"

that is, an integer, a string, an integer and another string.

If the user wants to keep the leading zeroes in col2, then they can specify the column datatype as string. For instance, using bulk_import.upload_file to read data from input_data:

bulk_import.upload_file(
    "part", "msgpack", input_data,
    dtypes={"col2": "str"},
)

which would produce:

1575454204, "a", "0001", "a;b;c"
1575454204, "b", "0002", "d;e;f"

If they also wanted to treat col3 as a sequence of strings, separated by semicolons, then they could specify a function to process col3:

bulk_import.upload_file(
    "part", "msgpack", input_data,
    dtypes={"col2": "str"},
    converters={"col3", lambda x: x.split(";")},
)

which would produce:

1575454204, "a", "0001", ["a", "b", "c"]
1575454204, "b", "0002", ["d", "e", "f"]

Type Hints

td-client-python includes comprehensive type hints (PEP 484) for improved development experience with static type checkers like mypy and pyright. Type hints are available for all public APIs.

Features:

  • Fully typed public API with precise type annotations

  • py.typed marker file for PEP 561 compliance

  • Type aliases in tdclient.types for common patterns

  • Support for type checking with mypy, pyright, and other tools

Example with type checking:

import tdclient

# Type checkers will understand the types
with tdclient.Client(apikey="your_api_key") as client:
    # client is inferred as tdclient.Client
    job = client.query("sample_db", "SELECT COUNT(1) FROM table", type="presto")
    # job is inferred as tdclient.models.Job
    job.wait()
    for row in job.result():
        # row is inferred as dict[str, Any]
        print(row)

Using type aliases:

from tdclient.types import QueryEngineType, Priority

def run_query(engine: QueryEngineType, priority: Priority) -> None:
    with tdclient.Client() as client:
        job = client.query("mydb", "SELECT 1", type=engine, priority=priority)
        job.wait()

Development

Running tests

Install the project dependencies with uv (runtime and test extras) and execute pytest via uv run.

$ uv sync --extra test
$ uv run pytest tdclient/test

To run the coverage suite locally, use:

$ uv run coverage run --source=tdclient -m pytest tdclient/test
$ uv run coverage report

Linting and type checking

Install the development extras and invoke ruff and pyright using uv run.

$ uv sync --dev
$ uv run ruff format tdclient --diff --exit-non-zero-on-fix
$ uv run ruff check tdclient
$ uv run pyright tdclient

Running tests (tox)

You can run tests against all supported Python versions with tox. I’d recommend you to install pyenv to manage additional interpreters.

$ pyenv shell system
$ for version in $(cat .python-version); do [ -d "$(pyenv root)/versions/${version}" ] || pyenv install "${version}"; done
$ pyenv shell --unset

Install the development extras (which include tox) with uv.

$ uv sync --dev

Then, run tox via uv.

$ uv run tox

Release

  1. Update version x.x.x in pyproject.toml.

  2. Create a PR with release-x.x.x branch. Request and merge the PR.

  3. Create and push a tag x.x.x on release-x.x.x merge commit.

  4. Create a Release on GitHub will publish new version to PyPI.

Manual release

If you want to release manually, you can upload by twine.

$ python -m build
$ twine upload dist/*

License

Apache Software License, Version 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

td_client-1.7.0.tar.gz (50.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

td_client-1.7.0-py3-none-any.whl (65.3 kB view details)

Uploaded Python 3

File details

Details for the file td_client-1.7.0.tar.gz.

File metadata

  • Download URL: td_client-1.7.0.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for td_client-1.7.0.tar.gz
Algorithm Hash digest
SHA256 bbf80beaf0253816751d5bb945448d2581314d0a651a8a630d5164f2dcaf8150
MD5 e56e07d1fcdf2573e4d45a0e9639f9e5
BLAKE2b-256 c424eff41d9889b5e0e0fbb7d278b549603c45f57954aefb9e2dcf6ad23b2418

See more details on using hashes here.

File details

Details for the file td_client-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: td_client-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 65.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for td_client-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4de7c7c46fb91471b36f13feec416e87a82973135415be8e7f2e288cb7f0c37a
MD5 e30e897286bd5adba9b107a654b77d43
BLAKE2b-256 de2d9e3f24ef9fe13446fd7bd17433db17ec64feb1f971d21f3ff773f8f6eda7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page