Skip to main content

A Python toolkit for TikTok data extraction and analysis using TikAPI

Project description

tiktools

A Python toolkit for TikTok data extraction and analysis using TikAPI.

Extract post metadata, transcribe videos using TikTok's built-in subtitles and analyze content at scale. Perfect for researchers, journalists and data analysts.

PyPI version License: MIT

Features

  • Fetch post metadata - Download complete post data for any TikTok user
  • Extract transcripts - Get speech-to-text from videos using TikTok's ASR subtitles
  • Incremental updates - Only fetch new content to save API costs
  • Audio detection - Flag videos with non-original audio (songs vs. speech)
  • Pip-installable - Easy to install and use in your projects
  • Extensible - Build custom analysis tools on top of the core toolkit

Installation

pip install tiktools

Quick start

1. Set up API keys

TikTools requires a TikAPI key for core functionality:

export TIKAPI_KEY="your_tikapi_key_here"

For testing, you can use the sandbox key: DemoAPIKeyTokenSeHYGXDfd4SFD320Sc39Asd0Sc39Asd4s

2. Fetch posts and extract transcripts

from tiktools import fetch_user_posts, extract_transcripts
from pathlib import Path

# Fetch posts
posts_data = fetch_user_posts(
    username="davis_big_dawg",
    output_file=Path("data/davis_big_dawg/posts.json")
)

# Extract transcripts
results = extract_transcripts(
    posts_file=Path("data/davis_big_dawg/posts.json"),
    language="eng"
)

print(f"Extracted {results['transcripts_downloaded']} transcripts")

3. Use the CLI scripts

# Fetch all posts
python scripts/fetch_posts.py davis_big_dawg

# Extract transcripts
python scripts/extract_transcripts.py data/davis_big_dawg/davis_big_dawg_posts.json

# See generic analysis template
python scripts/analyze.py data/davis_big_dawg/transcripts/davis_big_dawg_transcripts.json

Incremental updates

Save API costs by only fetching new content:

# Only fetch NEW posts
python scripts/fetch_posts.py davis_big_dawg --update

# Only transcribe NEW posts
python scripts/extract_transcripts.py data/davis_big_dawg/davis_big_dawg_posts.json --update

Output structure

data/
└── davis_big_dawg/
    ├── davis_big_dawg_posts.json       # Post metadata
    └── transcripts/
        ├── 7575304937580547342.txt     # Individual transcripts
        └── davis_big_dawg_transcripts.json  # All transcripts

Example: Food reviews analysis

See examples/food_reviews/ for a complete example that:

  • Extracts structured review data using OpenAI
  • Calculates statistics by category and day
  • Handles scoring and categorization
cd examples/food_reviews
python extract_reviews.py ../../data/davis_big_dawg/transcripts/davis_big_dawg_transcripts.json
python calculate_stats.py ../../data/davis_big_dawg/davis_big_dawg_reviews.json

API Reference

Core functions

fetch_user_posts()

Fetch TikTok post metadata for a user.

from tiktools import fetch_user_posts
from pathlib import Path

data = fetch_user_posts(
    username="davis_big_dawg",
    api_key=None,  # Uses TIKAPI_KEY env var
    max_posts=100,  # Limit number of posts
    output_file=Path("output.json"),
    sandbox=False,
    update_mode=False  # Only fetch new posts
)

extract_transcripts()

Extract transcripts from TikTok videos using subtitle files.

from tiktools import extract_transcripts
from pathlib import Path

results = extract_transcripts(
    posts_file=Path("posts.json"),
    output_dir=None,  # Defaults to posts_file.parent/transcripts
    output_format="individual",  # or "combined" or "both"
    language="eng",
    update_mode=False  # Only process new posts
)

get_best_subtitle()

Get the best available subtitle for a post (prioritizes ASR over MT).

from tiktools import get_best_subtitle

subtitle = get_best_subtitle(post, preferred_language="eng")
if subtitle:
    print(f"Found {subtitle['LanguageCodeName']} ({subtitle['Source']})")

API Client

from tiktools import TikAPIClient

client = TikAPIClient()  # Uses TIKAPI_KEY env var

# Get profile
profile = client.get_profile("davis_big_dawg")
print(profile['nickname'], profile['videoCount'])

# Iterate through posts
for post in client.get_posts(profile['secUid'], max_count=10):
    print(post['desc'])

Transcript limitations

TikTok's automatic speech recognition (ASR) has some limitations:

  1. Speech recognition errors: May misinterpret words (e.g., "Baha Blast" → "Brawha Blast")
  2. Non-speech audio: Videos using TikTok sounds may contain song lyrics instead of speech

Recommendations:

  • Filter by is_original_audio: true for speech-only content
  • Manually verify proper nouns and brand names for journalistic work
  • Check the needs_review flag if using AI extraction

Requirements

  • Python 3.8+
  • TikAPI key (get one at tikapi.io)
  • Optional: OpenAI API key (for AI-powered analysis examples)

Dependencies

  • tikapi - TikTok API client
  • requests - HTTP requests
  • pathlib - File path handling

Development

# Clone the repository
git clone https://github.com/stiles/tiktools.git
cd tiktools

# Install in development mode
pip install -e .

# Run tests
pytest tests/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built on top of TikAPI
  • Inspired by the need for more TikTok research tools

Support

Citation

If you use this toolkit in your research, please cite:

@software{tiktools2025,
  author = {Matt Stiles},
  title = {tiktools: A Python toolkit for TikTok data extraction and analysis},
  year = {2025},
  url = {https://github.com/stiles/tiktools}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiktools-0.1.0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiktools-0.1.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file tiktools-0.1.0.tar.gz.

File metadata

  • Download URL: tiktools-0.1.0.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tiktools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd52987a6795fbfbaea7e1896a4e34923725bfa97a21c417c59f1c59eaa46cfd
MD5 08e1b1f58e73b7901bdcaae2a088f35d
BLAKE2b-256 b125df3bf76894c85a891221f9f0e690389b68c7315ba740523b74666974a8a3

See more details on using hashes here.

File details

Details for the file tiktools-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tiktools-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tiktools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e440d0a9f4a1d3b93298c49dc523fe57dea8999c7529aab0184ef5173496f93
MD5 9c346e316d5c7037049b31e2cff63341
BLAKE2b-256 4233d3ee91a4ec4e02576cd730ef372d4056143c9bcda01e764b2da2bb4b9b2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page