Skip to main content

Extract information from databases to files, multiple formats supported, from a various SQL based servers

Project description

db-extractor

Code quality analysis and Build Status

Scrutinizer Code Quality Build Status

What is this repository for?

Extract information from databases (MySQL, MariaDB, SAP HANA to start with, other will be implemented later) using a combination of:

  • extraction sequences file (JSON format) that is easy enough to create and maintain but also provide very complex features to be set;
  • source system file (JSON format) to keep a central list of servers and/or databases to connect to that can be shared between people;
  • user settings file (JSON format) to keep a central list of credentials that is not to be shared with anyone or maybe with a small group of people;

Features implemented

  • Ability to extract from a single source system or multiple using 1 JSON extraction sequence file;
  • Ability to extract a single or multiple query for each source system using same JSON extraction sequence file;
  • Ability to extract a single or multiple files using sessions for each query where parameters can be specified (currently on CSV and Excel file format are supported, other will follow);
  • Multi-language (English, Italian, Romanian);
  • Enhance behaviour choices so that besides existing 'skip-if-output-file-exists' and 'overwrite-if-output-file-exists' to have the option to specify to overwrite but only if the file is older than any choice of a CalculatedDate expression is given, as this is very useful when extracting large amount of data over VPN in small pieces and VPN drops (could mean already extracted pieces would be already skipped as not older than threshold imposed);

Supported File Types/Formats

  • Comma Separated Values (with ability to specify a custom separator of your preference)
  • Excel 2013+
  • JSON
  • Parquet (with compression algorithms: brotli, gzip, snappy and none to choose from)
  • Pickle (with compression algorithms: bz2, gzip, xz, zip and none to choose from with as special value as "infer" to detect automatically the correct one from provided file extension)

Who do I talk to?

Repository owner is: Daniel Popiniuc

Installation

Installation can be completed in few steps as follows:

  • Ensure you have git available to your system:
    $ git --version

If you get an error depending on your system you need to install it.

For Windows you can do so from Git for Windows;

  • Download this project from Github:
    $ git clone https://github.com/danielgp/db-extractor <local_path_of_this_package>

conventions used:

<content_within_html_tags> = variables to be replaced with user values relevant strings

  • Create a Python Virtual Environment using following command executed from project root folder:
    $ python(.exe) -m venv <local_folder_on_your_computer_for_this_package>/virtual_environment/
  • Upgrade pip (PIP is a package manager for Python packages) and SetupTools using following command executed from newly created virtual environment and Scripts sub-folder:
    $ <local_path_of_this_package>/virtual_environment/Scripts/python(.exe) -m pip install --upgrade pip
    $ <local_path_of_this_package>/virtual_environment/Scripts/pip(.exe) install --upgrade setuptools
  • Install project prerequisites using following command executed from project root folder:
    $ <local_path_of_this_package>/virtual_environment/Scripts/python(.exe) <local_path_of_this_package>/setup.py install
  • Ensure all localization source files are compile properly in order for the package to work properly
    $ <local_path_of_this_package>/virtual_environment/Scripts/python(.exe) <local_path_of_this_package>/sources/localizations_compile.py

Maintaining local package up-to-date

Once the package is installed is quite important to keep up with latest releases as such are addressing important code improvements and potential security issues, and this can be achieved by following command:

    $ git --work-tree=<local_path_of_this_package> --git-dir=<local_path_of_this_package>/.git/ --no-pager pull origin master
  • conventions used:
    • <content_within_html_tags> = variables to be replaced with user values relevant strings

Usage

    $ python <local_path_of_this_package>/sources/extractor.py --input-source-system-file <input_source_system_file_name> --input-credentials-file <input_credentials_file_name> --input-extracting-sequence-file <input_extracting_sequence_file_name> (--output-log-file <full_path_and_file_name_to_log_running_details>)

conventions used:

(content_within_round_parenthesis) = optional <content_within_html_tags> = variables to be replaced with user values relevant strings single vertical pipeline = separator for alternative options

Example of usage

    $ python sources/extractor.py --input-source-system-file samples/sample---server-config.json --input-credentials-file samples/sample---user-settings.json --input-extracting-sequence-file samples/sample---list-of-fields.json --output-log-file samples/sample---list-of-fields.log

Code of conduct

Use CODE_OF_CONDUCT.md

Features already raised

  • Implement ability to store extracted result-set into HTML format file;

Features to request template

Use feature_request.md

Bug report template

Use bug_report.md

Required software/drivers/configurations

see readme_software.md

Used references

see readme_reference.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db-extractor-1.2.4.tar.gz (58.2 kB view details)

Uploaded Source

Built Distribution

db_extractor-1.2.4-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file db-extractor-1.2.4.tar.gz.

File metadata

  • Download URL: db-extractor-1.2.4.tar.gz
  • Upload date:
  • Size: 58.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for db-extractor-1.2.4.tar.gz
Algorithm Hash digest
SHA256 0618846032dbfdebf1d983ecfaaa349bc4985b98110263bcee21740dad19e854
MD5 964d317d313b67e4e2e9956b45ccbef0
BLAKE2b-256 cbdf0d48921abc7e7f0b130972044af6d9c74e66e79566440076192a6097a1ec

See more details on using hashes here.

File details

Details for the file db_extractor-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: db_extractor-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for db_extractor-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bc2dd96f3e8d99c264956a7c1370600947d94f947b688b6bec892b7b4725eea4
MD5 cd256c0dd278851d6beb63c1dbbe163b
BLAKE2b-256 73a3c85f7e65d7f662257746249826a20138beb32f0d0a70f91b69d0577f0115

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page