Migration tooling from Google App Engine (webapp2, ndb) to python-cdd supported (FastAPI, SQLalchemy).
Project description
cdd-python-gae
Migration tooling from Google App Engine (webapp2, ndb) to python-cdd supported (FastAPI, SQLalchemy).
Public SDK works with filenames, source code, and even in memory constructs (e.g., as imported into your REPL). CLI available also.
Note: Parquet files are supported as it takes too long to run NDB queries to batch acquire / batch insert into SQL.
Install package
PyPi
pip install python-cdd-gae
Master
pip install -r https://raw.githubusercontent.com/offscale/cdd-python-gae/master/requirements.txt
pip install https://api.github.com/repos/offscale/cdd-python-gae/zipball#egg=cdd
Goal
Migrate from Google App Engine to cloud-independent runtime (e.g., vanilla CPython 3.11 with SQLite).
Relation to other projects
This was created independent of cdd-python
project for two reasons:
- Unidirectional;
- Relevant to fewer people.
SDK
Approach
Traverse the AST for ndb and webapp2.
Advantages
Disadvantages
Alternatives
Minor other use-cases this facilitates
CLI for this project
$ python -m cdd_gae --help
usage: python -m cdd_gae [-h] [--version]
{ndb2sqlalchemy,ndb2sqlalchemy_migrator,parquet2sqlalchemy,webapp2_to_fastapi}
...
Migration tooling from Google App Engine (webapp2, ndb) to python-cdd
supported (FastAPI, SQLalchemy).
positional arguments:
{ndb2sqlalchemy,ndb2sqlalchemy_migrator,parquet2sqlalchemy,webapp2_to_fastapi}
ndb2sqlalchemy Parse NDB emit SQLalchemy
ndb2sqlalchemy_migrator
Create migration scripts from NDB to SQLalchemy
parquet2sqlalchemy Parse Parquet emit SQLalchemy
webapp2_to_fastapi Parse WebApp2 emit FastAPI
options:
-h, --help show this help message and exit
--version show program's version number and exit
ndb2sqlalchemy
(webapp2_to_fastapi
takes same args)
$ python -m cdd_gae ndb2sqlalchemy --help
usage: python -m cdd_gae ndb2sqlalchemy [-h] -i INPUT_FILE -o OUTPUT_FILE
[--dry-run]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Python file to parse NDB `class`es out of
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Empty file to generate SQLalchemy classes to
--dry-run Show what would be created; don't actually write to
the filesystem.
parquet2sqlalchemy
(webapp2_to_fastapi
takes same args)
$ python -m cdd_gae parquet2sqlalchemy -h
usage: python -m cdd_gae parquet2sqlalchemy [-h] -i INPUT_FILE -o OUTPUT_FILE
[--dry-run]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Parquet filepath
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Empty file to generate SQLalchemy classes to
--dry-run Show what would be created; don't actually write to
the filesystem.
webapp2_to_fastapi
(ndb2sqlalchemy
takes same args)
$ python -m cdd_gae webapp2_to_fastapi --help
usage: python -m cdd_gae webapp2_to_fastapi [-h] -i INPUT_FILE -o OUTPUT_FILE
[--dry-run]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Python file to parse WebApp2 `class`es out of
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Empty file to generate FastAPI functions to
--dry-run Show what would be created; don't actually write to
the filesystem.
python -m cdd_gae ndb2sqlalchemy_migrator --help
$ ndb2sqlalchemy_migrator
usage: python -m cdd_gae ndb2sqlalchemy_migrator [-h] --ndb-file NDB_FILE
--sqlalchemy-file
SQLALCHEMY_FILE
--ndb-mod-to-import
NDB_MOD_TO_IMPORT
--sqlalchemy-mod-to-import
SQLALCHEMY_MOD_TO_IMPORT -o
OUTPUT_FOLDER [--dry-run]
optional arguments:
-h, --help show this help message and exit
--ndb-file NDB_FILE Python file containing the NDB `class`es
--sqlalchemy-file SQLALCHEMY_FILE
Python file containing the NDB `class`es
--ndb-mod-to-import NDB_MOD_TO_IMPORT
NDB module name that the entity will be imported from
--sqlalchemy-mod-to-import SQLALCHEMY_MOD_TO_IMPORT
SQLalchemy module name that the entity will be
imported from
-o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
Empty folder to generate scripts that migrate from one
NDB class to one SQLalchemy class
--dry-run Show what would be created; don't actually write to
the filesystem.
Data migration
The most efficient way seems to be:
- Backup from NDB to Google Cloud Storage
- Import from Google Cloud Storage to Google BigQuery
- Export from Google BigQuery to Apache Parquet files in Google Cloud Storage
- Download and parse the Parquet files, then insert into SQL
(for the following scripts set GOOGLE_PROJECT_ID
, GOOGLE_BUCKET_NAME
, NAMESPACE
, GOOGLE_LOCATION
)
Backup from NDB to Google Cloud Storage
for entity in kind0 kind1; do
gcloud datastore export 'gs://'"$GOOGLE_BUCKET_NAME" --project "$GOOGLE_PROJECT_ID" --kinds "$entity" --async &
done
Import from Google Cloud Storage to Google BigQuery
printf 'bq mk "%s"\n' "$NAMESPACE" > migrate.bash
gsutil ls 'gs://'"$GOOGLE_BUCKET_NAME"'/**/all_namespaces/kind_*' | python3 -c 'import sys, posixpath, fileinput; f=fileinput.input(encoding="utf-8"); d=dict(map(lambda e: (posixpath.basename(posixpath.dirname(e)), posixpath.dirname(e)), sorted(f))); f.close(); print("\n".join(map(lambda k: "( bq mk \"'"$NAMESPACE"'.{k}\" && bq --location='"$GOOGLE_LOCATION"' load --source_format=DATASTORE_BACKUP \"'"$NAMESPACE"'.{k}\" \"{v}/all_namespaces_{k}.export_metadata\" ) &".format(k=k, v=d[k]), sorted(d.keys()))),sep="");' >> migrate.bash
# Then run `bash migrate.bash`
Export from Google BigQuery to Apache Parquet files in Google Cloud Storage
for entity in kind0 kind1; do
bq extract --location="$GOOGLE_LOCATION" --destination_format='PARQUET' "$NAMESPACE"'.kind_'"$entity" 'gs://'"$GOOGLE_BUCKET_NAME"'/'"$entity"'/*' &
done
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for python_cdd_gae-0.0.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9851e70ee198f35511538a4178d8c48d2cfa8d09d19d695de80c2fb36e4dac57 |
|
MD5 | 7446af40f6d23ba3902a33ba87fba11a |
|
BLAKE2b-256 | 743334e9da3348c229b54b3bf858a94672bca3f970d3a690738657b0d0c9bc02 |