Skip to main content

A Python toolkit for TikTok data extraction and analysis using TikAPI

Project description

tiktools

A Python toolkit for TikTok data extraction and analysis using TikAPI.

Extract post metadata and transcribe videos using TikTok's built-in subtitles.

Features

  • Fetch post metadata - Download complete post data for any TikTok user
  • Extract transcripts - Get speech-to-text from videos using TikTok's ASR subtitles
  • Incremental updates - Only fetch new content to save API costs
  • Audio detection - Flag videos with non-original audio (songs vs. speech)
  • Pip-installable - Easy to install and use in your projects
  • Extensible - Build custom analysis tools on top of the core toolkit

Installation

pip install tiktools

Quick start

1. Set up API keys

TikTools requires a TikAPI key for core functionality:

export TIKAPI_KEY="your_tikapi_key_here"

For testing, you can use the sandbox key: DemoAPIKeyTokenSeHYGXDfd4SFD320Sc39Asd0Sc39Asd4s

2. Fetch posts and extract transcripts

from tiktools import fetch_user_posts, extract_transcripts
from pathlib import Path

# Fetch posts
posts_data = fetch_user_posts(
    username="davis_big_dawg",
    output_file=Path("data/davis_big_dawg/posts.json")
)

# Extract transcripts
results = extract_transcripts(
    posts_file=Path("data/davis_big_dawg/posts.json"),
    language="eng"
)

print(f"Extracted {results['transcripts_downloaded']} transcripts")

3. Use the CLI scripts

# Fetch all posts
python scripts/fetch_posts.py davis_big_dawg

# Extract transcripts
python scripts/extract_transcripts.py data/davis_big_dawg/davis_big_dawg_posts.json

# See generic analysis template
python scripts/analyze.py data/davis_big_dawg/transcripts/davis_big_dawg_transcripts.json

Incremental updates

Save API costs by only fetching new content:

# Only fetch NEW posts
python scripts/fetch_posts.py davis_big_dawg --update

# Only transcribe NEW posts
python scripts/extract_transcripts.py data/davis_big_dawg/davis_big_dawg_posts.json --update

Output structure

data/
└── davis_big_dawg/
    ├── davis_big_dawg_posts.json       # Post metadata
    └── transcripts/
        ├── 7575304937580547342.txt     # Individual transcripts
        └── davis_big_dawg_transcripts.json  # All transcripts

Example: Food reviews analysis

See examples/food_reviews/ for a complete example analyzing @davis_big_dawg's school lunch reviews.

The example includes a standalone script that demonstrates the full workflow:

cd examples/food_reviews

# Install tiktools and dependencies
pip install tiktools openai

# Set up API keys
export TIKAPI_KEY="your_tikapi_key_here"
export OPENAI_API_KEY="your_openai_key_here"

# Fetch posts, transcripts and extract review data
python fetch_davis_archive.py --extract-reviews

# Calculate statistics
python calculate_stats.py data/davis_big_dawg/davis_big_dawg_reviews.json

Features:

  • Fetches all posts and transcripts for a TikTok user
  • Extracts structured review data using OpenAI
  • Calculates statistics by category and day
  • Update mode to only fetch new content

See the example README for full documentation and customization options.

API Reference

Core functions

fetch_user_posts()

Fetch TikTok post metadata for a user.

from tiktools import fetch_user_posts
from pathlib import Path

data = fetch_user_posts(
    username="davis_big_dawg",
    api_key=None,  # Uses TIKAPI_KEY env var
    max_posts=100,  # Limit number of posts
    output_file=Path("output.json"),
    sandbox=False,
    update_mode=False  # Only fetch new posts
)

extract_transcripts()

Extract transcripts from TikTok videos using subtitle files.

from tiktools import extract_transcripts
from pathlib import Path

results = extract_transcripts(
    posts_file=Path("posts.json"),
    output_dir=None,  # Defaults to posts_file.parent/transcripts
    output_format="individual",  # or "combined" or "both"
    language="eng",
    update_mode=False  # Only process new posts
)

get_best_subtitle()

Get the best available subtitle for a post (prioritizes ASR over MT).

from tiktools import get_best_subtitle

subtitle = get_best_subtitle(post, preferred_language="eng")
if subtitle:
    print(f"Found {subtitle['LanguageCodeName']} ({subtitle['Source']})")

API Client

from tiktools import TikAPIClient

client = TikAPIClient()  # Uses TIKAPI_KEY env var

# Get profile
profile = client.get_profile("davis_big_dawg")
print(profile['nickname'], profile['videoCount'])

# Iterate through posts
for post in client.get_posts(profile['secUid'], max_count=10):
    print(post['desc'])

Transcript limitations

TikTok's automatic speech recognition (ASR) has some limitations:

  1. Speech recognition errors: May misinterpret words (e.g., "Baha Blast" → "Brawha Blast")
  2. Non-speech audio: Videos using TikTok sounds may contain song lyrics instead of speech

Recommendations:

  • Filter by is_original_audio: true for speech-only content
  • Manually verify proper nouns and brand names for journalistic work
  • Check the needs_review flag if using AI extraction

Requirements

  • Python 3.8+
  • TikAPI key (get one at tikapi.io)
  • Optional: OpenAI API key (for AI-powered analysis examples)

Dependencies

  • tikapi - TikTok API client
  • requests - HTTP requests
  • pathlib - File path handling

Development

# Clone the repository
git clone https://github.com/stiles/tiktools.git
cd tiktools

# Install in development mode
pip install -e .

# Run tests
pytest tests/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built on top of TikAPI
  • Inspired by the need for more TikTok research tools

Support

Citation

If you use this toolkit in your research, please cite:

@software{tiktools2025,
  author = {Matt Stiles},
  title = {tiktools: A Python toolkit for TikTok data extraction and analysis},
  year = {2025},
  url = {https://github.com/stiles/tiktools}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiktools-0.2.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiktools-0.2.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file tiktools-0.2.0.tar.gz.

File metadata

  • Download URL: tiktools-0.2.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tiktools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b64b870c06eeca9a168335fa15ffc12933ed007cca4990e2b083953173247c1a
MD5 facf66aeaf519fbfe0cb56c6f5f30c47
BLAKE2b-256 cf9e4fcb66a1043809af63a3069681d8745afbec5ce6439949f95e5617bd9288

See more details on using hashes here.

File details

Details for the file tiktools-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tiktools-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tiktools-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c39e75b4ac8666b2f27edb1962db92a452a4a3c34472dd24a7ca6d9e59ece1b3
MD5 1ea9dfda6bb6e8abca5f34982583f006
BLAKE2b-256 11dfc774a3539992eb0293fc6e7facff3c9b3dc6f6e7c887a5f1c82a3c8bd7f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page