Skip to main content

Converts SQL dump to a JSON stream

Project description

sqldump2json

PyPI - Python Version PyPI - Version Total Downloads

Converts SQL dump to a JSON stream.

A tool for administrators, data scientists and hackers. With this tool you no longer need to import dumps into Databases. You can extract INSERT data as JSON and analyze them with jq or insert into MongoDB/Elastic/etc. The dump is not read entirely into RAM, so this utility can be used to process files of any size. And it can even parse corrupted dumps. No dependencies!

Supported DBMS: MySQL, SQL Server, PotsgreSQL and some other (not all formats).

RESTRICTIONS:

  • Syntax is checked only for INSERT INTO and CREATE TABLE.
  • The common SQL syntax is used which does not fully correspond to either MySQL or Postgres.
  • Function calls and subquieries in INSERT satetements are not supported.

Installation for normal Arch-based Linux ditros:

# install pipx
yay -S python-pipx

# install package from pypi
pipx install sqldump2json

# install lastet version from github
pipx install git+https://github.com/s3rgeym/sqldump2json.git

For other shit like Ubuntu you need to do more steps:

  • Install pyenv or asdf-vm.
  • Install latest python version and make it global via pyenv or asdf-vm.
  • Install sqldump2json.
  • Or use Docker.

CLI

Usage:

sqldump2json [ -h ] [ -i INPUT ] [ -o OUTPUT ] [ ... ]

Output format is JSONL:

echo "INSERT INTO db.data VALUES (1, 'foo'), (2, 'bar'), (3, 'baz');" | sqldump2json
{"table": "data", "schema": "db", "values": [1, "foo"]}
{"table": "data", "schema": "db", "values": [2, "bar"]}
{"table": "data", "schema": "db", "values": [3, "baz"]}

Values are converted to dict only if the INSERT INTO contains a list of fields or the fields are declared in CREATE TABLE:

$ sqldump2json <<< "INSERT INTO data VALUES (NULL, 3.14159265, FALSE, 'Привет', 0xDEADBEEF);" | jq
{
  "table": "data",
  "values": [
    null,
    3.14159265,
    false,
    "Привет",
    "3q2+7w=="
  ]
}

$ sqldump2json <<< 'INSERT INTO `page` (title, contents) VALUES ("Title", "Text goes here");' | jq
{
  "table": "page",
  "values": {
    "title": "Title",
    "contents": "Text goes here"
  }
}

Using together with grep:

grep 'INSERT INTO `users`' /path/to/dump.sql | sqldump2json | jq -r '.values | [.username, .email, .password] | @tsv' > output.csv

Scripting

If you were looking for a way how to import data from SQL to NoSQL databases and etc:

#!/usr/bin/env python
from sqldump2json import DumpParser
...
if __name__ == '__main__':
    parse = DumpParser()
    for val in parse("/path/to/dump.sql"):
        do_something(val)

Development

Run tests:

poetry run python -m uniзщttest

TODO LIST

  • Add support mysql strings with charset (eg, _binary '\x00...'). + X'...'
  • Строки должны конкатенироваться, если идут подряд.

Notes

После создания этого пакета я случайно узнал про существование sqldump-to. Тот проект заброшен, и та утилита НЕ МОЖЕТ ПАРСИТЬ ДАМПЫ ПО 100500 ГИГАБАЙТ.

Я пробовал ускорить парсинг с помощью orjson (реализован на говнорасте и отвечает за парсинг JSON), но вопреки заявленному ускорению в 10 раз, получил замедление при парсинге 23-гигового дампа, содержащего 160 миллинов вставок, с 5 часов до 7.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqldump2json-1.0.3.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

sqldump2json-1.0.3-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file sqldump2json-1.0.3.tar.gz.

File metadata

  • Download URL: sqldump2json-1.0.3.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Linux/6.6.1-arch1-1

File hashes

Hashes for sqldump2json-1.0.3.tar.gz
Algorithm Hash digest
SHA256 6c50f42e0e4715dcf327736b0b33cf1fb3b324cc5546bcdac962b19c572ad7dd
MD5 2610eca8334e07552f55490672185807
BLAKE2b-256 fb148ae62c4ecd26f8b138d9728c5ec020aff7133f18776bcba071c685f6b0ac

See more details on using hashes here.

File details

Details for the file sqldump2json-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: sqldump2json-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Linux/6.6.1-arch1-1

File hashes

Hashes for sqldump2json-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3869ba19b40bea78141acc35f7a50ef0c08270c0dbab01f57a0407262cadb823
MD5 f007278151b52dc590a9f46facf1911c
BLAKE2b-256 26c598d4ba9949535246423f57d77c41147b93dbc2db5f2428010b1120429368

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page