Skip to main content

A Python toolkit for TikTok data extraction and analysis using TikAPI

Project description

tiktools

A Python toolkit for TikTok data extraction and analysis using TikAPI.

Extract post metadata and transcribe videos using TikTok's built-in subtitles.

Features

  • Fetch post metadata - Download complete post data for any TikTok user
  • Extract transcripts - Get speech-to-text from videos using TikTok's ASR subtitles
  • Download thumbnails - Save video thumbnails in multiple sizes and formats
  • Translate transcripts - Translate transcripts using AWS Translate or other services
  • Incremental updates - Only fetch new content to save API costs
  • Audio detection - Flag videos with non-original audio (songs vs. speech)
  • Pip-installable - Easy to install and use in your projects
  • Extensible - Build custom analysis tools on top of the core toolkit

Installation

pip install tiktools

Quick start

1. Set up API keys

TikTools requires a TikAPI key for core functionality:

export TIKAPI_KEY="your_tikapi_key_here"

For testing, you can use the sandbox key: DemoAPIKeyTokenSeHYGXDfd4SFD320Sc39Asd0Sc39Asd4s

2. Fetch posts, extract transcripts and download thumbnails

from tiktools import fetch_user_posts, extract_transcripts, download_thumbnails
from pathlib import Path

# Fetch posts
posts_data = fetch_user_posts(
    username="davis_big_dawg",
    output_file=Path("data/davis_big_dawg/posts.json")
)

# Extract transcripts
transcript_results = extract_transcripts(
    posts_file=Path("data/davis_big_dawg/posts.json"),
    language="eng"
)

# Download thumbnails
thumbnail_results = download_thumbnails(
    posts_file=Path("data/davis_big_dawg/posts.json"),
    thumbnail_type="origin"  # High quality
)

print(f"Extracted {transcript_results['transcripts_downloaded']} transcripts")
print(f"Downloaded {thumbnail_results['thumbnails_downloaded']} thumbnails")

3. Use the CLI scripts

# Fetch all posts
python scripts/fetch_posts.py davis_big_dawg

# Extract transcripts
python scripts/extract_transcripts.py data/davis_big_dawg/davis_big_dawg_posts.json

# Download thumbnails
python scripts/download_thumbnails.py data/davis_big_dawg/davis_big_dawg_posts.json --type origin

# Translate transcripts (requires AWS credentials)
python scripts/translate_transcripts.py data/davis_big_dawg/transcripts/davis_big_dawg_transcripts.json --target en

# See generic analysis template
python scripts/analyze.py data/davis_big_dawg/transcripts/davis_big_dawg_transcripts.json

Incremental updates

Save API costs by only fetching new content:

# Only fetch NEW posts
python scripts/fetch_posts.py davis_big_dawg --update

# Only transcribe NEW posts
python scripts/extract_transcripts.py data/davis_big_dawg/davis_big_dawg_posts.json --update

# Only download thumbnails for NEW posts
python scripts/download_thumbnails.py data/davis_big_dawg/davis_big_dawg_posts.json --update

# Only translate NEW transcripts
python scripts/translate_transcripts.py data/davis_big_dawg/transcripts/davis_big_dawg_transcripts.json --target en --update

Output structure

data/
└── davis_big_dawg/
    ├── davis_big_dawg_posts.json       # Post metadata
    ├── thumbnails/
    │   ├── 7575304937580547342.jpg     # Individual thumbnails
    │   └── davis_big_dawg_thumbnails.json  # Thumbnail metadata
    └── transcripts/
        ├── 7575304937580547342.txt     # Individual transcripts
        ├── 7575304937580547342.en.txt  # Translated transcripts
        ├── davis_big_dawg_transcripts.json  # All transcripts
        └── davis_big_dawg_translations.json # All translations

Example: Food reviews analysis

See examples/food_reviews/ for a complete example analyzing @davis_big_dawg's school lunch reviews.

The example includes a standalone script that demonstrates the full workflow:

cd examples/food_reviews

# Install tiktools and dependencies
pip install tiktools openai

# Set up API keys
export TIKAPI_KEY="your_tikapi_key_here"
export OPENAI_API_KEY="your_openai_key_here"

# Fetch posts, transcripts and extract review data
python fetch_davis_archive.py --extract-reviews

# Calculate statistics
python calculate_stats.py data/davis_big_dawg/davis_big_dawg_reviews.json

Features:

  • Fetches all posts and transcripts for a TikTok user
  • Extracts structured review data using OpenAI
  • Calculates statistics by category and day
  • Update mode to only fetch new content

See the example README for full documentation and customization options.

API Reference

Core functions

fetch_user_posts()

Fetch TikTok post metadata for a user.

from tiktools import fetch_user_posts
from pathlib import Path

data = fetch_user_posts(
    username="davis_big_dawg",
    api_key=None,  # Uses TIKAPI_KEY env var
    max_posts=100,  # Limit number of posts
    output_file=Path("output.json"),
    sandbox=False,
    update_mode=False  # Only fetch new posts
)

extract_transcripts()

Extract transcripts from TikTok videos using subtitle files.

from tiktools import extract_transcripts
from pathlib import Path

results = extract_transcripts(
    posts_file=Path("posts.json"),
    output_dir=None,  # Defaults to posts_file.parent/transcripts
    output_format="individual",  # or "combined" or "both"
    language="eng",
    update_mode=False  # Only process new posts
)

get_best_subtitle()

Get the best available subtitle for a post (prioritizes ASR over MT).

from tiktools import get_best_subtitle

subtitle = get_best_subtitle(post, preferred_language="eng")
if subtitle:
    print(f"Found {subtitle['LanguageCodeName']} ({subtitle['Source']})")

download_thumbnails()

Download video thumbnails for posts.

from tiktools import download_thumbnails
from pathlib import Path

results = download_thumbnails(
    posts_file=Path("posts.json"),
    output_dir=None,  # Defaults to posts_file.parent/thumbnails
    thumbnail_type="cover",  # or "origin", "dynamic", "zoom_960"
    update_mode=False,  # Only download for new posts
    skip_existing=False  # Skip if file exists
)

Important: TikTok's thumbnail URLs have time-limited signatures and expire quickly (typically within hours). For best results, download thumbnails immediately after fetching posts:

# Recommended: Download thumbnails while URLs are fresh
data = fetch_user_posts(
    username="davis_big_dawg",
    output_file=Path("data/davis_big_dawg/posts.json"),
    download_thumbnails=True,
    thumbnail_type="origin"
)

Or use the CLI:

python scripts/fetch_posts.py davis_big_dawg --download-thumbnails --thumbnail-type origin

Thumbnail types:

  • cover - Standard cover image (default)
  • origin - Original quality cover (highest quality)
  • dynamic - Animated cover
  • zoom_240, zoom_480, zoom_720, zoom_960 - Specific resolutions

translate_transcripts()

Translate transcripts to multiple languages.

from tiktools import translate_transcripts
from pathlib import Path

results = translate_transcripts(
    transcripts_file=Path("transcripts/user_transcripts.json"),
    target_languages=["en", "es"],
    service="aws",  # AWS Translate
    output_dir=None,  # Defaults to transcripts_file.parent
    update_mode=False,  # Only translate new transcripts
    estimate_only=False,  # Estimate costs without translating
    source_language=None  # Auto-detected if None
)

Translation service setup:

For AWS Translate, configure AWS credentials:

# Option 1: AWS CLI
aws configure

# Option 2: Environment variables
export AWS_ACCESS_KEY_ID="your_key"
export AWS_SECRET_ACCESS_KEY="your_secret"
export AWS_DEFAULT_REGION="us-east-1"

Cost estimation:

# Estimate costs before translating
results = translate_transcripts(
    transcripts_file=Path("transcripts/user_transcripts.json"),
    target_languages=["en", "es", "fr"],
    estimate_only=True
)
print(f"Estimated cost: ${results['estimated_cost']:.4f} USD")

API Client

from tiktools import TikAPIClient

client = TikAPIClient()  # Uses TIKAPI_KEY env var

# Get profile
profile = client.get_profile("davis_big_dawg")
print(profile['nickname'], profile['videoCount'])

# Iterate through posts
for post in client.get_posts(profile['secUid'], max_count=10):
    print(post['desc'])

Transcript limitations

TikTok's automatic speech recognition (ASR) has some limitations:

  1. Speech recognition errors: May misinterpret words (e.g., "Baha Blast" → "Brawha Blast")
  2. Non-speech audio: Videos using TikTok sounds may contain song lyrics instead of speech

Recommendations:

  • Filter by is_original_audio: true for speech-only content
  • Manually verify proper nouns and brand names for journalistic work
  • Check the needs_review flag if using AI extraction

Requirements

  • Python 3.8+
  • TikAPI key (get one at tikapi.io)
  • Optional: OpenAI API key (for AI-powered analysis examples)

Dependencies

Core dependencies:

  • tikapi - TikTok API client
  • requests - HTTP requests
  • pathlib - File path handling

Optional dependencies:

  • boto3 - AWS Translate for translation features (install with: pip install boto3)

Development

# Clone the repository
git clone https://github.com/stiles/tiktools.git
cd tiktools

# Install in development mode
pip install -e .

# Run tests
pytest tests/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built on top of TikAPI
  • Inspired by the need for more TikTok research tools

Support

Citation

If you use this toolkit in your research, please cite:

@software{tiktools2025,
  author = {Matt Stiles},
  title = {tiktools: A Python toolkit for TikTok data extraction and analysis},
  year = {2025},
  url = {https://github.com/stiles/tiktools}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiktools-0.3.0.tar.gz (136.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiktools-0.3.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file tiktools-0.3.0.tar.gz.

File metadata

  • Download URL: tiktools-0.3.0.tar.gz
  • Upload date:
  • Size: 136.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tiktools-0.3.0.tar.gz
Algorithm Hash digest
SHA256 80d07c53e8e471e06c6fd8d6c60aae99aa9a2641550d89044c2347e582cd77b4
MD5 fda935ee67d3af6e20aec4c61335af1b
BLAKE2b-256 926e88707e876030b1d06a4eba85f1d5438ed3db8bd4fabf67f5ee4cb6a883b0

See more details on using hashes here.

File details

Details for the file tiktools-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: tiktools-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tiktools-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 316a068ce2fc0d4e89adce5260d77fd16452e897d33472241bcccc6522a4b6e5
MD5 db8a073851b600fe2033aac3eb769aad
BLAKE2b-256 9fe010b74ffd02ef1b002c6449c185015885ea80e4e5bca5437c63eb7a9a80c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page