Skip to main content

Data pipeline for TikTok analytics. Your exports in, real metrics out.

Project description

tokpipe

Python 3.10+ License: MIT Tests

Data pipeline for TikTok analytics. Import your exported data, clean it, classify content, compute real metrics, and visualize what actually works.

No APIs, no scraping, no third-party tokens. Just your TikTok export files (CSV/XLSX) and Python.


Why tokpipe?

TikTok gives you a spreadsheet with raw numbers. That's it. No insights, no trends, no "why did this video work?".

tokpipe takes that file and builds a full analytics pipeline: cleans the data, classifies your content by topic, computes real metrics (engagement rate, best posting hour, growth trends), and generates an interactive dashboard, an Excel report with formulas, and static charts. One command, all outputs.

It's built for creators who want to understand their data without depending on third-party tools that ask for your credentials.


What you get

tokpipe analyze TikTok_Analytics.xlsx --followers 8728
Output What it is
report.csv Your data cleaned + engagement rate, completion rate, category per video
analytics.xlsx Excel with native formulas — open in Excel or Google Sheets
dashboard.html Interactive Plotly dashboard — open in any browser, hover for details
engagement.png Engagement rate distribution across all your videos
best_hours.png Which hours get the best engagement
growth.png 7-day rolling average of your views

Architecture

tokpipe follows a classic ETL pipeline structure:

  Export (TikTok XLSX/CSV)
        |
        v
  +-----------+
  |  ingest   |  --> Load and validate raw export files
  +-----------+
        |
        v
  +-----------+
  |  clean    |  --> Normalize columns, fix types, handle nulls
  +-----------+
        |
        v
  +-----------+
  | classify  |  --> Tag each video with a topic/category
  +-----------+
        |
        v
  +-----------+
  |  metrics  |  --> Compute engagement rate, retention, trends
  +-----------+
        |
        v
  +-----------+    +-----------+    +-----------+
  |  output   |    |   excel   |    | dashboard |
  | (CSV/PNG) |    |  (.xlsx)  |    |  (.html)  |
  +-----------+    +-----------+    +-----------+

Modules

Module What it does
tokpipe.ingest Reads TikTok export files (XLSX, CSV). Detects format, validates columns, returns a raw DataFrame.
tokpipe.clean Normalizes column names, converts date/number types, drops corrupted rows, fills missing values.
tokpipe.classify Assigns a topic/category to each video. Configurable via YAML rules or custom function.
tokpipe.metrics Computes derived metrics: engagement rate, average watch time, best posting hour, growth trends.
tokpipe.output Exports results to CSV/JSON. Generates matplotlib/seaborn PNG charts.
tokpipe.excel Generates Excel report with native formulas, formatting, and embedded charts.
tokpipe.dashboard Generates interactive Plotly HTML dashboard with all visualizations.
tokpipe.cli Command-line interface. Entry point for tokpipe analyze.

Prerequisites

You need two things before installing tokpipe:

Python 3.10+

Check your version:

python --version
# or
python3 --version

If you don't have it:

# macOS (Homebrew)
brew install python

# Ubuntu/Debian
sudo apt install python3 python3-venv python3-pip

# Windows (winget)
winget install Python.Python.3.12

Or download directly from python.org.

git (optional)

Only needed to clone the repo. You can also download the ZIP from GitHub.

git --version

Get your TikTok data

tokpipe works with the analytics files that TikTok lets you export. No API keys, no scraping — just the file TikTok gives you.

How to export:

  1. Open TikTok on desktop (not the app) or go to tiktok.com
  2. Go to your profile > Creator tools > Analytics
  3. Select the date range you want to analyze
  4. Click Export data (top right)
  5. Download the XLSX or CSV file

What the file should contain:

Required columns Optional columns
Views Watch time
Likes Video duration
Comments Post date/time
Shares Caption/description

tokpipe auto-detects column names in both English and Spanish. If your export uses different names, the pipeline will try to match them — if it can't find a views column, it will tell you.


Installation

# Clone the repo
git clone https://github.com/aroaxinping/tokpipe.git
cd tokpipe

# Create a virtual environment
python3 -m venv .venv

# Activate it
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows (cmd)
# .venv\Scripts\Activate.ps1     # Windows (PowerShell)

# Install tokpipe and all its dependencies
pip install -e .

This installs: pandas, openpyxl, matplotlib, seaborn, plotly, and pyyaml.

Verify it worked:

tokpipe --version
# tokpipe 0.1.0

Usage

Try it with sample data

Don't have a TikTok export yet? Use the included sample:

tokpipe analyze examples/sample_data.csv --output sample_results/

Quick start

# Make sure your venv is active
source .venv/bin/activate

# Run the pipeline on your export file
tokpipe analyze ~/Downloads/TikTok_Analytics.xlsx

That's it. It will create a results/ folder with everything.

Output files

results/
  report.csv           # Your data cleaned + engagement rate, completion rate, category
  analytics.xlsx       # Excel with native formulas (open in Excel/Google Sheets)
  dashboard.html       # Interactive dashboard (open in any browser)
  engagement.png       # Engagement rate distribution chart
  best_hours.png       # Which hours get the best engagement
  growth.png           # How your views are trending over time

All CLI options

tokpipe analyze <file> [options]
Option What it does Example
--output, -o Output directory (default: results/) --output my_report/
--followers Your follower count (shown in reports) --followers 8728
--period Label for the date range you're analyzing --period "24 Feb - 23 Mar 2026"
--rules Path to YAML file with custom classification rules --rules my_rules.yaml
--no-charts Skip PNG chart generation
--no-dashboard Skip HTML dashboard generation
--no-excel Skip Excel report generation

Full example

tokpipe analyze TikTok_Analytics.xlsx \
  --output results/ \
  --followers 8728 \
  --period "24 Feb - 23 Mar 2026" \
  --rules rules.yaml

Only want the CSV?

tokpipe analyze data.xlsx --no-dashboard --no-excel --no-charts

Python API

from tokpipe import ingest, clean, classify, metrics, output, excel, dashboard

# Load and clean
raw = ingest.load("TikTok_Analytics.xlsx")
df = clean.normalize(raw)

# Classify content
df["category"] = classify.classify(df)

# Compute metrics
report = metrics.compute(df)
print(report.summary())

# Export
output.to_csv(report, "report.csv")
excel.to_excel(report, "analytics.xlsx", followers=8728)
dashboard.generate(report, "dashboard.html")

Content classification

By default, tokpipe classifies videos into: setup, coding, data, study, tech, other.

Custom rules via YAML

Create a rules.yaml:

setup:
  - keyboard
  - monitor
  - desk
  - compra
coding:
  - python
  - debug
  - script
data:
  - dataset
  - pandas
  - sql
study:
  - exam
  - uni
  - homework
tokpipe analyze data.xlsx --rules rules.yaml

Custom function (Python API)

def my_classifier(text: str) -> str:
    if "python" in text:
        return "coding"
    if "setup" in text:
        return "setup"
    return "other"

df["category"] = classify.classify(df, classifier_fn=my_classifier)

SQL queries

The sql/ directory contains reference queries for analyzing your exported CSV with DuckDB, SQLite, or any SQL engine:

# Example with DuckDB
duckdb -c "
  CREATE TABLE videos AS SELECT * FROM read_csv_auto('results/report.csv');
  SELECT * FROM videos ORDER BY engagement_rate DESC LIMIT 10;
"

See sql/queries.sql for the full set.


Available metrics

Metric Formula / Description
Engagement rate (likes + comments + shares) / views
Average watch time Total watch time / views
Completion rate Average watch time / video duration
Best posting hour Hour with highest median engagement
Growth trend Rolling 7-day average of views
Top performers Videos above 90th percentile engagement

Project structure

tokpipe/
  .github/
    workflows/
      ci.yml              # GitHub Actions CI (tests on Python 3.10-3.13)
    ISSUE_TEMPLATE/
      bug_report.md       # Bug report template
      feature_request.md  # Feature request template
  src/
    tokpipe/
      __init__.py         # Package init, version
      cli.py              # Command-line interface
      ingest.py           # Load TikTok exports
      clean.py            # Normalize and clean data
      classify.py         # Content classifier (YAML / custom function)
      metrics.py          # Compute derived metrics
      output.py           # CSV/JSON export + matplotlib charts
      excel.py            # Excel report with formulas
      dashboard.py        # Interactive Plotly HTML dashboard
  tests/
    test_ingest.py
    test_clean.py
    test_metrics.py
  sql/
    queries.sql           # Reference SQL queries
  examples/
    basic_analysis.py     # Minimal working example
    sample_data.csv       # Fake data to test without a TikTok account
  pyproject.toml
  LICENSE
  CONTRIBUTING.md
  README.md

Troubleshooting

ModuleNotFoundError: No module named 'tokpipe'

Your virtual environment is not activated. Run:

source .venv/bin/activate        # macOS / Linux
.venv\Scripts\activate           # Windows

ModuleNotFoundError: No module named 'pandas'

Dependencies are not installed. Run:

pip install -e .

ValueError: Could not find a 'views' column

tokpipe couldn't match any column in your export to "views". This happens when the export uses a language tokpipe doesn't recognize yet. Open your file, check the column name for views, and open an issue with the column names so we can add support.

FileNotFoundError: File not found

Check that the path to your export file is correct. Use the full path:

tokpipe analyze /Users/you/Downloads/TikTok_Analytics.xlsx

Charts are not generated

If you see -- Skipping best hours or -- Skipping growth trend, your export file doesn't have a date/time column. tokpipe needs a column with post dates to generate time-based charts. The engagement distribution chart will still work.

pip install -e . fails

If you're on Python 3.14+, try installing without editable mode:

pip install .

Or install dependencies manually and run with PYTHONPATH:

pip install pandas openpyxl matplotlib seaborn plotly pyyaml
PYTHONPATH=src tokpipe analyze data.xlsx

Contributing

See CONTRIBUTING.md.


License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokpipe-0.1.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokpipe-0.1.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file tokpipe-0.1.0.tar.gz.

File metadata

  • Download URL: tokpipe-0.1.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for tokpipe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bbf0c51b032d36a6c57540359905f39ee7b00b116de7c268d8f9466fe4620c57
MD5 1655d39db20ccdbab23ef0471124e443
BLAKE2b-256 17b758ec167408f14a24fd0db1a266cd60f11e1c53332e9911b3682ad1dff894

See more details on using hashes here.

File details

Details for the file tokpipe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokpipe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for tokpipe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df0af14564f7fd587eba1a0a614dfa646716a1869769f538211b513b026ec4df
MD5 03444153138d669345cc6cbaff8711e6
BLAKE2b-256 dff4d3264a64a89000b7e61d12ffa52d9551b9db92558c02d36eecb4a7a6ec0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page