Skip to main content

Records mover is a command-line tool and Python library you can use to move relational data from one place to another.

Project description

Records Mover

Documentation Status

CircleCI

Records mover is a command-line tool and Python library you can use to move relational data from one place to another.

Relational data here means anything roughly "rectangular" - with columns and rows. For example, it supports reading and writing from:

  • Databases, including using native high-speed methods of import/export of bulk data. Redshift, Vertica and PostgreSQL are well-supported, with some support for BigQuery and MySQL.
  • CSV files
  • Parquet files (initial support)
  • Google Sheets
  • Pandas DataFrames
  • Records directories - a structured directory of CSV/Parquet/etc files containing some JSON metadata about their format and origins. Records directories are especially helpful for the ever-ambiguous CSV format, where they solve the problem of 'hey, this may be a CSV - but what's the schema? What's the format of the CSV itself? How is it escaped?'

Records mover can be exended expand to handle additional databases and data file types. Databases are supported by building on top of their SQLAlchemy drivers. Records mover is able to auto-negotiate the most efficient way of moving data from one to the other.

CLI use example

Installing:

pip3 install 'records_mover[cli,postgres-binary,redshift-binary]'

Loading a CSV into a database:

mvrec file2table foo.csv redshiftdb1 myschema1 mytable1

Copying a table from a PostgreSQL to a Redshift database:

mvrec --help
mvrec table2table postgresdb1 myschema1 mytable1 redshiftdb2 myschema2 mytable2

Note records mover will automatically build an appropriate CREATE TABLE statement on the target end if the table doesn't already exist.

Note that the connection details for the database names here must be configured using db-facts.

For more installation notes, see INSTALL.md. To understand the security model here, see SECURITY.md.

CLI use demo (table creation and loading)

Python library use example

First, install records_mover. We'll also use Pandas, so we'll install that, too, as well as a driver for Postgres.

pip3 install records_mover[pandas,postgres-source]

Now we can run this code:

#!/usr/bin/env python3

# Pull in the records-mover library - be sure to run the pip install above first!
from records_mover import sources, targets, move
from pandas import DataFrame
import sqlalchemy
import os

sqlalchemy_url = f"postgresql+psycopg2://username:{os.environ['DB_PASSWORD']}@hostname/database_name"
db_engine = sqlalchemy.create_engine(sqlalchemy_url)

df = DataFrame.from_dict([{'a': 1}])  # or make your own!

source = sources.dataframe(df=df)
target = targets.table(schema_name='myschema',
                       table_name='mytable',
                       db_engine=db_engine)
results = move(source, target)

When moving data, the sources supported can be found here, and the targets supported can be found here.

Advanced Python library use example

Here's another example, using some additional features:

  • Loading from an existing dataframe.
  • Secrets management using db-facts, which is a way to configure credentials in YAML files or even fetch them dynamically from your secrets store.
  • Logging configuration to show the internal processing steps (helpful in optimizing performance or debugging issues)

you can use this:

#!/usr/bin/env python3

# Pull in the records-mover library - be sure to run the pip install above first!
from records_mover import Session
from pandas import DataFrame

session = Session()
session.set_stream_logging()
records = session.records

db_engine = session.get_default_db_engine()

df = DataFrame.from_dict([{'a': 1}])  # or make your own!

source = records.sources.dataframe(df=df)
target = records.targets.table(schema_name='myschema',
                               table_name='mytable',
                               db_engine=db_engine)
results = records.move(source, target)

Python library API documentation

You can can find more API documentation here. In particular, note:

Local Development

The included Dockerfile can be used to build a docker image that is suitable for local development.

docker build Dockerfile.dev -t records-mover:latest .

Or, using docker compose,

docker compose build

The following commands assume the records-mover container has been named records-mover.

Mount the directory containing your local copy of the repository to have it override what's in the container.

docker run -it --mount src="$(pwd)/records_mover",target=/records-mover/records_mover,type=bind records-mover

This will mount your local src overtop of the same directory in the container. Mount any additional directories you are working on with additional --mount entries

Alternatively, you can launch the container using docker-compose. This will start the container with pre-defined mounts.

docker-compose up -d

This will build the container image as necessary, launch it, and mount the most relevant volumes for local dev. The container is hosting a bash shell and will run until you manually shut it down.

Note, if you have to add a dependency to requirements.txt and want to test it locally, add the line COPY requirements.txt . after RUN git clone ... in the Dockerfile and rebuild the container.

Working within the container

The container presumes you're still working with python virtual environments when working with it. This could possibly be considered a "hat on a hat" situation. The practical consequence of this is that if you do open a shell in the container, ensure you activate the venv virtual environment. Otherwise, you will not have all the dependencies you'll need.

After you've started a shell with the docker run... command above, you can activate the virtual environment with:

source venv/bin/activate

Running unit and type tests

Unit and type tests can be run within the container by running:

make citest
make cicoverage
make typecheck
make citypecoverage

The tests can be run from outside the container using docker-compose.

docker compose run test

It's theoretically possible to build sufficient additional containers to support running integration tests locally but that has not been planned yet.

In-progress airbyte integration

Currently in progress integration airbyte as an alternative engine for executing source <-> destination transfers. This functionality is currently behind a feature flag. To enable all airbyte features in your environment, set the following environment variable:

export RECORDS_MOVER_AIRBYTE_ENABLED=1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

records_mover-1.6.4.tar.gz (468.9 kB view details)

Uploaded Source

Built Distribution

records_mover-1.6.4-py3-none-any.whl (664.4 kB view details)

Uploaded Python 3

File details

Details for the file records_mover-1.6.4.tar.gz.

File metadata

  • Download URL: records_mover-1.6.4.tar.gz
  • Upload date:
  • Size: 468.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for records_mover-1.6.4.tar.gz
Algorithm Hash digest
SHA256 913058deea0cc7c1b80a0e955bdcb3eec553be111e9a642b55745954bf744d66
MD5 7e6a8efd1fe727b1bac8a0cf39fae281
BLAKE2b-256 533ad376f86fdb21225124180b00ed30b507aa5e38d10752a5a633ecf48b14b5

See more details on using hashes here.

File details

Details for the file records_mover-1.6.4-py3-none-any.whl.

File metadata

File hashes

Hashes for records_mover-1.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 60e4c3a6906aac45f4352098fd2421f340a27a9a2b47c3aa30cd4189d5ca4538
MD5 28d151352fc885c855aa7405630158e5
BLAKE2b-256 f74f84e849bde9af60eccd5666c5ea2a913f0f6fb4cd5748a20b48a8e2fc26f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page