Singer.io tap for extracting data from the mixpanel API - PipelineWise compatible
Project description
pipelinewise-tap-mixpanel
Singer tap that extracts data from a Mixpanel API and produces JSON-formatted data following the Singer spec.
This is a PipelineWise compatible tap connector.
This tap:
- Pulls raw data from the Mixpanel Event Export API and the Mixpanel Query API.
- Extracts the following resources:
- Export (Events)
- Engage (People/Users)
- Funnels
- Annotations
- Cohorts
- Cohort Members
- Revenue
- Outputs the schema for each resource
- Incrementally pulls data based on the input state
- Uses date-windowing to chunk/loop through
export
,revenue
,funnels
. - Incorporates attribution window for latency look-back to accommodate delays in data reconciliation.
Streams
- Endpoint: https://data.mixpanel.com/api/2.0/export
- Primary key fields:
event
,time
,distinct_id
- Replication strategy: INCREMENTAL (query filtered)
- Bookmark:
time
- Bookmark query field:
from_date
,to_date
- Bookmark:
- Transformations: De-nest
properties
to root-level, re-name properties with leading$...
tomp_reserved_...
, convert datetimes from project timezone to UTC. - Optional parameters
export_events
to export only certain events
- Endpoint: https://mixpanel.com/api/2.0/engage
- Primary key fields:
distinct_id
- Replication strategy: FULL_TABLE (all records, every load)
- Transformations: De-nest
$properties
to root-level, re-name properties with leading$...
tomp_reserved_...
.
- Endpoint 1 (name, id): https://data.mixpanel.com/api/2.0/export
- Endpoint 2 (date, measures): https://mixpanel.com/api/2.0/funnels
- Primary key fields:
funnel_id
,date
- Parameters:
funnel_id
: {funnel_id} (from Endpoint 1)unit
: day
- Replication strategy: INCREMENTAL (query filtered)
- Bookmark:
date
- Bookmark query field:
from_date
,to_date
- Bookmark:
- Transformations: Combine Endpoint 1 & 2 results, convert
date
keys to list toresults
list-array.
- Endpoint: https://mixpanel.com/api/2.0/engage/revenue
- Primary key fields:
date
- Parameters:
unit
: day
- Replication strategy: INCREMENTAL (query filtered)
- Bookmark:
date
- Bookmark query field:
from_date
,to_date
- Bookmark:
- Transformations: Convert
date
keys to list toresults
list-array.
- Endpoint: https://mixpanel.com/api/2.0/annotations
- Primary key fields:
date
- Replication strategy: FULL_TABLE
- Transformations: None.
- Endpoint: https://mixpanel.com/api/2.0/cohorts/list
- Primary key fields:
id
- Replication strategy: FULL_TABLE
- Transformations: None.
- Endpoint: https://mixpanel.com/api/2.0/cohorts/list
- Primary key fields:
distinct_id
,cohort_id
- Parameters:
filter_by_cohort
: {cohort_id} (fromcohorts
endpoint)
- Replication strategy: FULL_TABLE
- Transformations: For each
cohort_id
incohorts
endpoint, queryengage
endpoint withfilter_by_cohort
parameter to create list ofdistinct_id
for eachcohort_id
.
Authentication
The Mixpanel API uses Basic Authorization with the api_secret
from the tap config in base-64 encoded format. It is slightly different than normal Basic Authorization with username/password. All requests should include this header with the api_secret
as the username, with no password:
- Authorization:
Basic <base-64 encoded api_secret>
More details may be found in the Mixpanel API Authentication instructions.
Quick Start
-
Install
python3 -m venv venv . venv/bin/activate pip install --upgrade pip pip install .
-
Create your tap's
config.json
file. The tap config file for this tap should include these entries:start_date
- the default value to use if no bookmark exists for an endpoint (rfc3339 date string)user_agent
(string, optional): Process and email for API logging purposes. Example:tap-mixpanel <api_user_email@your_company.com>
api_secret
(string,ABCdef123
): an API secret for each project in Mixpanel. This can be found in the Mixpanel Console, upper-right Settings (gear icon), Organization Settings > Projects and in the Access Keys section. For this tap, only the api_secret is needed (the api_key is legacy and the token is used only for uploading data). Each Mixpanel project has a different api_secret; therefore each Singer tap pipeline instance is for a single project.date_window_size
(integer,30
): Number of days for date window looping through transactional endpoints with from_date and to_date. Default date_window_size is 30 days. Clients with large volumes of events may want to decrease this to 14, 7, or even down to 1-2 days.attribution_window
(integer,5
): Latency minimum number of days to look-back to account for delays in attributing accurate results. Default attribution window is 5 days.project_timezone
(string likeUS/Pacific
): Time zone in which integer date times are stored. The project timezone may be found in the project settings in the Mixpanel console. More info about timezones.select_properties_by_default
(true
orfalse
): Mixpanel properties are not fixed and depend on the date being uploaded. During Discovery mode and catalog.json setup, all current/existing properties will be captured. Setting this config parameter to true ensures that new properties on events and engage records are captured. Otherwise new properties will be ignored.
{ "api_secret": "YOUR_API_SECRET", "date_window_size": "30", "attribution_window": "5", "project_timezone": "US/Pacific", "select_properties_by_default": "true", "start_date": "2019-01-01T00:00:00Z", "user_agent": "tap-mixpanel <api_user_email@your_company.com>" }
If you want to export only certain events from the Raw export API then add
export_events
option to theconfig.json
and list the required event names:"export_events": ["event_one", "event_two"]
Optionally, also create a
state.json
file.currently_syncing
is an optional attribute used for identifying the last object to be synced in case the job is interrupted mid-stream. The next run would begin where the last job left off.{ "currently_syncing": "engage", "bookmarks": { "export": "2019-09-27T22:34:39.000000Z", "funnels": "2019-09-28T15:30:26.000000Z", "revenue": "2019-09-28T18:23:53Z" } }
-
Run the Tap in Discovery Mode This creates a catalog.json for selecting objects/fields to integrate:
tap-mixpanel --config config.json --discover > catalog.json
See the Singer docs on discovery mode here.
-
Run the Tap in Sync Mode (with catalog) and write out to state file
For Sync mode:
> tap-mixpanel --config tap_config.json --catalog catalog.json
Messages are written to standard output following the Singer specification. The resultant stream of JSON data can be consumed by a Singer target. To load to json files to verify outputs:
> tap-mixpanel --config tap_config.json --catalog catalog.json | target-json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To pseudo-load to Stitch Import API with dry run:
> tap-mixpanel --config tap_config.json --catalog catalog.json | target-stitch --config target_config.json --dry-run > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
Test
-
Install python test dependencies in a virtual env and run nose unit and integration tests
python3 -m venv venv . venv/bin/activate pip install --upgrade pip pip install .[test]
-
Run unit tests
pytest tests/unittests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pipelinewise-tap-mixpanel-1.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3e9a719a452c77bcf8db9eaca8cfc262d41a3f4627c510cb26a3cbd915e31d6 |
|
MD5 | 31c10fd7a0fc75301605c7262dce9690 |
|
BLAKE2b-256 | 0c749514effb6e610153e727139ac03eccdc79e4e8577ab08702c5f6e3bf4ca8 |
Hashes for pipelinewise_tap_mixpanel-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbe2cdff4e5ce56300cd086b4f1b06ce46252c793729fc651c0843436c40a1d2 |
|
MD5 | fb898fd1d644fa6dedb1f8abd426dcb3 |
|
BLAKE2b-256 | 4512a31cd16a367bdf8b6981c275c1ba18426e0af72dfb018b663c0623d9ab82 |