Skip to main content

Ingest a set of Google Analytics dimensions and put the data into a SQL database; can be run regularly to sync updates incrementally to DB

Project description

Google Analytics ETL

A python base class to extract and continuously sync data from Google Analytics, and write it to a SQL database.

Uses Analytics Reporting API v4 and it works similarly to Goole Analytics Custom Reports where you define a list of dimensions to be displayed in a flat table. But doesn't have the limitations of GA's UI. In addition, data can be custom transformed and will be written to a SQL database.

Installation

Requirements

You should prefer Python 3 packages from your operating system.

So under Fedora Linux:

dnf install python3-sqlalchemy python3-pandas python3-mysql

Under Red Hat Enterprise Linux 8:

dnf install python3-sqlalchemy mariadb-connector-c-devel mariadb-connector-c gcc
pip3 install mysqlclient --user

This MariaDB connector is the one that works with SQLAlchemy (used by the module). Other connectors as PyMySQL failed our tests.

If you chose to installing mysqlclient with pip, you'll require compilers and MariaDB development framework pre-installed in the system, as shown in the Red Hat Enterprise Linux section. But avoid doing it like that and use your OS's pre-compiled packages as shown above.

Install the module

pip3 install GoogleAnalyticsETL --user

All unsatisfied dependencies (such as Pandas on RHEL) will be installed along.

Or, to upgrade:

pip3 install -U GoogleAnalyticsETL --no-deps --user

Remove the --no-deps if you wish to upgrade also dependent modules as pandas and numpy.

Usage

Extracting data from Google Analytics is not easy due to its variety, complexity and amount of data, specially if you have a busy site with custom dimensions (GA Admin ➔ Property ➔ Custom Definitions ➔ Custom Dimensions).

So use the example file examples/GABradescoSegurosToDB.py. Copy it to a new place and edit it to create your own concrete classes with desired dimensions.

Then copy examples/etl.py and edit it to call your custom classes. This will be the executable script that you'll run every time.

Get credentials and API keys from Google

Follow instruction on GA Reporting API v4 to enable your Analytics instance to be accessed by this classes and to get credentials.

Run regularly with CRON

Once configured, easiest way to use it is with a crontab. I have this on my crontab:

@daily    python3 /home/avi/bin/etl.py

Which will run a sync daily. Change it to @hourly to get more recent updates.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GoogleAnalyticsETL-0.5.3.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

GoogleAnalyticsETL-0.5.3-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file GoogleAnalyticsETL-0.5.3.tar.gz.

File metadata

  • Download URL: GoogleAnalyticsETL-0.5.3.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.3

File hashes

Hashes for GoogleAnalyticsETL-0.5.3.tar.gz
Algorithm Hash digest
SHA256 e09692ba0b6f4194108c70ebcd59848809aaa0ff559746399395c9d1f8d8bc5a
MD5 2856a4fb3376839617a970207ab727d8
BLAKE2b-256 f54b2b70c1d5c2c89d4c713ee27afeeb75a278b33bb71ba9542ac9cbb1f15980

See more details on using hashes here.

File details

Details for the file GoogleAnalyticsETL-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: GoogleAnalyticsETL-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.3

File hashes

Hashes for GoogleAnalyticsETL-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3534e2c5dff5fde41ff7fe7e1a73c5ea68e4bed476978c479abc006ac2f4038b
MD5 37193be6206827a33913caa134b1a8f9
BLAKE2b-256 0c37e2ac9c7eb20a122b352801a491445117bc44dc34ed1caf2ee2c7cde5db4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page