Skip to main content

Scrape YouTube search results, channels, transcripts, and playlists — no API key needed. Sync + async. CLI + REST API included.

Project description

TubeScrape

Scrape YouTube search results, channels, transcripts, and playlists — no API key needed.

PyPI version Python versions CI License: MIT Downloads

Built on YouTube's internal InnerTube API. Three interfaces: Python SDK, CLI, and REST API.


Features

  • Search with filters — type, duration, upload date, sort order, features (4K, HDR, live, CC)
  • Channel browsing — videos, shorts, playlists, and in-channel search
  • Transcripts — fetch, translate, format (SRT / WebVTT / JSON), save to file
  • Playlists — full pagination with position tracking and metadata
  • Async-first — every method has an async variant for concurrent workloads
  • JSON-ready — every result object has .to_dict() for instant serialization
  • Proxy rotation — built-in support for residential proxy lists
  • Three interfaces — Python SDK, CLI with Rich tables, REST API with Swagger docs
  • Zero config — no API key, no OAuth, no setup
  • Lightweight — only httpx as a core dependency

Installation

recommended to install using pip

You can also integrate it into an existing project or use it via a CLI.

# Core SDK only
pip install tubescrape

# With CLI (adds click + rich)
pip install "tubescrape[cli]"

# With REST API server (adds fastapi + uvicorn)
pip install "tubescrape[api]"

# Everything
pip install "tubescrape[all]"

Requirements: Python 3.10+


Quick Start

import json
from tubescrape import YouTube

with YouTube() as yt:
    # Search YouTube
    results = yt.search('python tutorial', max_results=5)
    for video in results.videos:
        print(f'{video.title}{video.duration}{video.view_count}')

    # Browse a channel
    channel = yt.get_channel_videos('@lexfridman', max_results=10)
    for video in channel.videos:
        print(f'{video.title} ({video.published_text})')

    # Get a transcript and save as subtitles
    transcript = yt.get_transcript('dQw4w9WgXcQ')
    print(transcript.text)
    transcript.save('subtitles.srt')

    # Every result serializes to JSON instantly
    data = results.to_dict()
    print(json.dumps(data, indent=2))

Every method accepts plain IDs, full URLs, or @handles — parsed automatically.


Why tubescrape?

tubescrape youtube-transcript-api pytube / pytubefix yt-dlp
Search videos Yes No No Limited
Channel browse Yes No No Yes
Transcripts Yes Yes No Yes
Playlists Yes No Yes Yes
Async support Yes No No No
Built-in REST API Yes No No No
CLI tool Yes No No Yes
Core dependencies 1 (httpx) 1 (requests) 0 Many
API key needed No No No No

Search

results = yt.search('python tutorial', max_results=5)

for video in results.videos:
    print(f'{video.title}{video.url}')
    print(f'  {video.view_count} | {video.duration} | {video.published_text}')
    print(f'  Channel: {video.channel} (verified: {video.is_verified})')

Search Filters

All filters can be combined in a single call:

results = yt.search(
    'podcast interview',
    max_results=10,
    type='video',              # video | channel | playlist | movie
    duration='long',           # short (<4m) | medium (4-20m) | long (>20m)
    upload_date='this_month',  # last_hour | today | this_week | this_month | this_year
    sort_by='view_count',      # relevance | upload_date | view_count | rating
    features=['hd', 'subtitles'],  # live | 4k | hd | subtitles | cc | creative_commons | 360 | vr180 | 3d | hdr
)

Channel Browsing

All channel methods accept @handle, channel ID (UC...), or full URL.

# Videos (newest first, with pagination)
videos = yt.get_channel_videos('@lexfridman', max_results=10)
all_videos = yt.get_channel_videos('@lexfridman', max_results=0)  # fetch ALL

# Shorts
shorts = yt.get_channel_shorts('@lexfridman')
for short in shorts.shorts:
    print(f'{short.title}{short.view_count}{short.url}')

# Playlists
playlists = yt.get_channel_playlists('@lexfridman')
for pl in playlists.playlists:
    print(f'{pl.title}{pl.video_count}{pl.url}')

# Search within a channel
results = yt.search_channel('@lexfridman', 'artificial intelligence', max_results=10)

Transcripts

# Fetch transcript (auto-detects best language)
transcript = yt.get_transcript('dQw4w9WgXcQ')
print(transcript.text)  # full text as a single string

# Choose language (priority order fallback)
transcript = yt.get_transcript('dQw4w9WgXcQ', languages=['de', 'en'])

# Translate to any language
transcript = yt.get_transcript('dQw4w9WgXcQ', translate_to='es')

# Without timestamps (plain text blob)
transcript = yt.get_transcript('dQw4w9WgXcQ', timestamps=False)

# List available languages
languages = yt.list_transcripts('dQw4w9WgXcQ')
for entry in languages:
    print(f'{entry.language} ({entry.language_code}) — {"auto" if entry.is_generated else "manual"}')

Formatting & Saving

transcript = yt.get_transcript('dQw4w9WgXcQ')

# Format as SRT, WebVTT, JSON, or plain text
srt = YouTube.format_transcript(transcript, fmt='srt')
vtt = YouTube.format_transcript(transcript, fmt='vtt')

# Save to file (format auto-detected from extension)
transcript.save('subtitles.srt')
transcript.save('subtitles.vtt')
transcript.save('transcript.json')
transcript.save('transcript.txt')

Playlists

# Accepts playlist ID or full URL
playlist = yt.get_playlist('PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf')

print(f'{playlist.title} by {playlist.channel}{len(playlist.videos)} videos')

for entry in playlist.videos:
    print(f'#{entry.position}  {entry.title}{entry.duration}')

Serialization (.to_dict())

Every result object converts to a plain Python dictionary — ready for JSON, databases, or APIs:

import json

# Works on every result type
results.to_dict()       # SearchResult → dict
video.to_dict()         # VideoResult → dict
channel.to_dict()       # BrowseResult → dict
shorts.to_dict()        # ShortsResult → dict
playlists.to_dict()     # ChannelPlaylistsResult → dict
playlist.to_dict()      # PlaylistResult → dict
transcript.to_dict()    # Transcript → dict

# Sparse output: optional fields excluded when empty/default
# is_verified=False → omitted | badges=[] → omitted | None fields → omitted
print(json.dumps(results.to_dict(), indent=2, ensure_ascii=False))

Async Support

Every method has an async variant prefixed with a. Use in FastAPI, Discord bots, or any async application:

import asyncio
from tubescrape import YouTube

async def main():
    async with YouTube() as yt:
        # All methods have async variants
        results = await yt.asearch('python', max_results=5)
        transcript = await yt.aget_transcript('dQw4w9WgXcQ')

        # Run multiple requests concurrently
        r1, r2, r3 = await asyncio.gather(
            yt.asearch('python'),
            yt.asearch('javascript'),
            yt.asearch('rust'),
        )

asyncio.run(main())

Proxy Support

# Single proxy
yt = YouTube(proxy='http://user:pass@proxy.example.com:8080')

# Proxy rotation (round-robin per request)
yt = YouTube(proxies=[
    'http://user:pass@proxy1:8080',
    'http://user:pass@proxy2:8080',
])

# SOCKS5
yt = YouTube(proxy='socks5://user:pass@proxy:1080')

# Custom timeout and retries
yt = YouTube(proxy='http://proxy:8080', timeout=60.0, max_retries=5)

Tip: YouTube blocks cloud IPs aggressively. Use rotating residential proxies (BrightData, SmartProxy, Oxylabs) for production.


CLI

Install with pip install "tubescrape[cli]".

tubescrape search tubescrape channel search tubescrape transcript tubescrape transcript srt

tubescrape search "python tutorial" -n 5
tubescrape search "podcast" --type video --duration long --sort-by view_count
tubescrape search "python" --json                    # JSON output

tubescrape channel @lexfridman                       # videos (default)
tubescrape channel @lexfridman shorts                # shorts
tubescrape channel @lexfridman playlists             # playlists
tubescrape channel @lexfridman search "podcast"      # search within channel

tubescrape playlist PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf

tubescrape transcript dQw4w9WgXcQ                    # plain text
tubescrape transcript dQw4w9WgXcQ --format srt       # SRT subtitles
tubescrape transcript dQw4w9WgXcQ --translate es      # translate
tubescrape transcript dQw4w9WgXcQ --save output.srt  # save to file
tubescrape transcript dQw4w9WgXcQ --list-languages   # available languages
tubescrape --proxy http://user:pass@host:port search "python"  # with proxy
export TUBESCRAPE_PROXY="http://user:pass@host:port"           # env variable

REST API

Install with pip install "tubescrape[api]".

tubescrape serve                          # starts on localhost:8000
tubescrape serve --host 0.0.0.0 --port 3000

Interactive Swagger docs at http://localhost:8000/docs.

Method Endpoint Description
GET /api/v1/search?q=python Search videos
GET /api/v1/channel/{id}/videos Channel videos
GET /api/v1/channel/{id}/shorts Channel shorts
GET /api/v1/channel/{id}/playlists Channel playlists
GET /api/v1/channel/{id}/search?q=... Search within channel
GET /api/v1/playlist/{id} Fetch playlist
GET /api/v1/transcript/{video_id} Fetch transcript
GET /api/v1/transcript/{video_id}/languages List languages
GET /health Health check
curl "http://localhost:8000/api/v1/search?q=python+tutorial&max_results=5"
curl "http://localhost:8000/api/v1/transcript/dQw4w9WgXcQ?format=srt&translate_to=es"

Error Handling

All exceptions inherit from YouTubeError:

YouTubeError
├── RequestError
│   ├── RateLimitError          # HTTP 429
│   └── BotDetectedError        # HTTP 403
├── VideoUnavailableError       # private, deleted, region-locked
│   └── AgeRestrictedError
├── TranscriptsDisabledError
├── TranscriptsNotAvailableError
├── TranscriptFetchError
├── TranslationNotAvailableError
├── ChannelNotFoundError
├── PlaylistNotFoundError
├── APIKeyNotFoundError
└── ParsingError
from tubescrape import YouTube, YouTubeError, RateLimitError

try:
    results = yt.search('python')
except RateLimitError:
    print('Rate limited — use a proxy')
except YouTubeError as e:
    print(f'YouTube error: {e}')

Full Documentation

For detailed examples, all field references, and advanced usage, see the Complete Usage Guide.


Warning

This library uses YouTube's undocumented InnerTube API. It may break if YouTube changes their internal API. If it does, please open an issue.


Contributing

git clone https://github.com/zaidkx37/tubescrape.git
cd tubescrape
pip install -e ".[all,dev]"

pytest                    # run tests
ruff check src/           # lint
mypy src/tubescrape/      # type check

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubescrape-0.1.2.tar.gz (644.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tubescrape-0.1.2-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file tubescrape-0.1.2.tar.gz.

File metadata

  • Download URL: tubescrape-0.1.2.tar.gz
  • Upload date:
  • Size: 644.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tubescrape-0.1.2.tar.gz
Algorithm Hash digest
SHA256 043e37bd0c9a59293882c4da731342b64cbf1608734a2bd659b72a92a5e64208
MD5 84e4777edd3a46c9e5a9034132b867fd
BLAKE2b-256 08aa1fa9b5d432b958ed25d0cd5ab1d656c70a51cc5668c249daef704594e7af

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubescrape-0.1.2.tar.gz:

Publisher: publish.yml on zaidkx37/tubescrape

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tubescrape-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tubescrape-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tubescrape-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e068fe4037fe73c822909664ea83520ddbc3f20cb1164a1b3db26d885e5996bd
MD5 cd0e9552b865af8847c9dd2267ba5a1f
BLAKE2b-256 fe19d1b8805a0dd7a24e7174a11c9abeff481a311a1401dc302f582de52cde7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubescrape-0.1.2-py3-none-any.whl:

Publisher: publish.yml on zaidkx37/tubescrape

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page