Tapdata Python Sdk

These details have not been verified by PyPI

Project links

Project description

Tapdata Python Sdk

Install

Install python 3.7, pip By Yourself.
Run pip install tapdata_cli to install sdk.
If you use poetry, please run poetry add tapdata_cli to install sdk.

Initial

server = "127.0.0.1:3000"
access_code = "3324cfdf-7d3e-4792-bd32-571638d4562f"
from tapdata_cli import cli
cli.init(server, access_code)

Multi-thread concurrency is not supported

It will send a request to the server to obtain the identity information and save it as a global variable. Therefore, after multiple init the 'server' and 'access_code' variable will be overwritten.

For situations where you need to use different servers and access_codes concurrently, use Python's multiprocess.

DataSource

Create DataSource

The SDK supports the following data source operations:

Mongo
Mysql
Postgres
Oracle
Kafka

To create MySQL/Mongo:

from tapdata_cli import cli

connector = "mongodb"  # datasource type，mongodb mysql postgres
mongo = cli.DataSource("mongodb", name="mongo")
mongo.uri("mongodb://localhost:8080")  # datasource uri
mongo.save()

or:

from tapdata_cli import cli

mongo = cli.DataSource("mongodb", name="mongo")
mongo.host("localhost:27017").db("source").username("user").password("password").props("")
mongo.type("source")  # datasource type，source -> only source，target -> only target，source_and_target -> target and source (default)
mongo.save()  # success -> True, Failure -> False

To Create Oracle database:

from tapdata_cli import cli

datasource_name = "ds_name"  # datasource name
oracle = cli.Oracle(datasource_name)
oracle.thinType("SERVICE_NAME")  # connect type SID/SERVER_NAME (database name/service name)
oracle.host("106.55.169.3").password("Gotapd8!").port("3521").schema("TAPDATA").db("TAPDATA").username("tapdata")
oracle.save()

To create Kafka datasource:

from tapdata_cli import cli

database_name = "kafka_name"
kafka = cli.Kafka(database_name)
kafka.host("106.xx.xx.x").port("9092")
kafka.save()

To create Postgres datasource:

from tapdata_cli import cli

pg = cli.Postgres("jack_postgre") 
pg.host("106.55.169.3").port(5496).db("insurance").username("postgres").password("tapdata").type("source").schema("insurance")
pg.validate()
pg.save()

As for Kafka/Oracle/Postgres, the creation mode is heterogeneous. In the future, a unified interface will be provided in the form of datasource, which is backward compatible and will not affect the existing version.

DataSource List

from tapdata_cli import cli

cli.DataSource().list()

# return struct

{
    "total": 94,
    "items": [{
        "id": "",
        "lastUpdBy": "",
        "name": "",
        "config": {},
        "connection_type": "",
        "database_type": "",
        "definitionScope": "",
        "definitionVersion": "",
        "definitionGroup": "",
        "definitionPdkId": "",
        ...
    }]
}

Get datasource according to ID/name

from tapdata_cli import cli

cli.DataSource(id="")  # by id
cli.DataSource(name="")  # by name

Pipeline

A simple data migration Job

from tapdata_cli import cli

# Create datasource first
source = cli.DataSource("mongodb", name="source").uri("").save()
target = cli.DataSource("mongodb", name="target").uri("").save()
# create Pipeline
p = cli.Pipeline(name="example_job")
p.readFrom("source").writeTo("target")
# start
p.start()
# stop
p.stop()
# delete
p.delete()
# status
p.status()
# get job list
cli.Job.list()

Job is the underlying implementation of pipeline, so you can use job.start() like pipeline.start().

# init job (get job info) by id
from tapdata_cli import cli
job = cli.Job(id="some id string")
job.save() # success -> True, failure -> False
job.start() # success -> True, failure -> False

Data development job

Before performing data development tasks, you need to change the task type to Sync:

from tapdata_cli import cli

source = cli.DataSource("mongodb", name="source").uri("").save()
target = cli.DataSource("mongodb", name="target").uri("").save()
p = cli.Pipeline(name="")
p = p.readFrom("source.player") # source is db, player is table
p.dag.jobType = cli.JobType.sync

Then perform specific operations:

# filter cli.FilterType.keep (keep data) / cli.FilterType.delete (delete data)
p = p.filter("id > 2", cli.FilterType.keep)

# filerColumn cli.FilterType.keep (keep column) / cli.FilterType.delete (delete column)
p = p.filterColumn(["name"], cli.FilterType.delete)

# rename
p = p.rename("name", "player_name")

# valueMap
p = p.valueMap("position", 1) 

# js
p = p.js("return record;")

p.writeTo("target.player")  # target is db, player is table

master slave merge:

# merge
p2 = cli.Pipeline(name="source_2")  # Create merged pipeline
p3 = p.merge(p2, [('id', 'id')]).writeTo("target")  # Merge pipeline

p3.writeTo("target.player")  # target is db, player is table

Create initial_sync/cdc job

By default, all tasks created through pipeline are "full + incremental" job.

You can create a initial_sync job by:

from tapdata_cli import cli

p = cli.Pipeline(name="")
p.readFrom("source").writeTo("target")
config = {"type": "initial_sync"}  # initial_sync
p1 = p.config(config=config)
p1.start()

As above, changing config to {"type": "cdc"} can create an incremental task.

All pipeline configuration modification operations are passed in through the pipeline.config method through the config default parameters, and the parameters are verified.

For more configuration modification items, please see this file, get more configuration items.

API Operation

Update/Create ApiServer

from tapdata_cli import cli

# create
cli.ApiServer(name="test", uri="http://127.0.0.1:3000/").save()

# update
# 1.get ApiServer id
api_server_id = cli.ApiServer.list()["id"]
# 2.update ApiServer
cli.ApiServer(id=api_server_id, name="test_2", uri="http://127.0.0.1:3000/").save()

# delete
cli.ApiServer(id=api_server_id).delete()

Publish Api

from tapdata_cli import cli
cli.Api(name="test", table="source.player").publish() # source is db, player is table

Unpublish APi

from tapdata_cli import cli
cli.Api(name="test").unpublish()

Delete Api

from tapdata_cli import cli
cli.Api(name="test").delete()

Api Status

from tapdata_cli import cli
cli.Api().status("test")  # success -> "pending" or "active" / failure -> None

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.3.2

Nov 29, 2022

2.3.1

Nov 9, 2022

2.3.0

Oct 14, 2022

2.2.32

Nov 8, 2022

2.2.31

Nov 2, 2022

This version

2.2.30

Nov 1, 2022

2.2.29

Nov 1, 2022

2.2.27

Oct 13, 2022

2.2.15

Sep 7, 2022

2.2.13

Sep 6, 2022

2.2.12

Sep 2, 2022

2.2.11

Aug 30, 2022

2.2.10

Aug 30, 2022

2.2.9

Aug 30, 2022

2.2.8

Aug 26, 2022

2.2.7

Aug 26, 2022

2.2.6

Aug 26, 2022

2.2.5

Aug 26, 2022

2.2.4

Aug 26, 2022

2.2.3

Aug 26, 2022

2.2.2

Aug 25, 2022

2.2.1

Aug 25, 2022

2.1.0

Aug 15, 2022

2.0.0

Aug 15, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tapdata_cli-2.2.30.tar.gz (66.6 kB view hashes)

Uploaded Nov 1, 2022 Source

Built Distribution

tapdata_cli-2.2.30-py3-none-any.whl (65.6 kB view hashes)

Uploaded Nov 1, 2022 Python 3

Hashes for tapdata_cli-2.2.30.tar.gz

Hashes for tapdata_cli-2.2.30.tar.gz
Algorithm	Hash digest
SHA256	`2f9addf3aae0477c9bab0ab372aecbe3416f6411ac7f0a4a10f6448b38d45271`
MD5	`e785763c1bf05f1c1bf3b1760e42fd09`
BLAKE2b-256	`129714802e1b4f317c7612fee3cd9c6df5b16e69e930fcc2628a5efff0dd84df`

Hashes for tapdata_cli-2.2.30-py3-none-any.whl

Hashes for tapdata_cli-2.2.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`da78fd6526d5355322a64455102b5349eff2b30ce489b1f67d40192c10c62f9b`
MD5	`268b849e6ff728400a414453c30789c5`
BLAKE2b-256	`a4d70c575df0c0ce1d6a0e0f43e925fd19100f2a8e57486236b1048fb2a5e626`