Skip to main content

Extract and monitor metadata from Apple App Store applications

Project description

Apple App Store Metadata Extractor

PyPI version Python Support License: MIT

Extract and monitor metadata from Apple App Store applications with ease.

Features

  • ๐Ÿ“ฑ Extract comprehensive app metadata - title, description, version, ratings, and more
  • ๐Ÿ”Ž Keyword & genre search (v0.2.0) - find apps via iTunes Search API
  • ๐Ÿ“ Review mining (v0.2.0) - paginated reviews via Apple's RSS feed (~500 per app)
  • ๐Ÿ“Š Chart rankings (v0.2.0) - current top-free/top-paid/top-grossing snapshots
  • ๐ŸŒ Storefront support (v0.2.0) - first-class country parameter on every extractor
  • ๐Ÿ’ฐ In-App Purchase details - extract names and prices of all IAP items
  • ๐Ÿ”— Support links - app support, privacy policy, and developer website URLs
  • ๐Ÿ”„ Track version changes - monitor app updates and metadata changes over time
  • ๐Ÿš€ Async support - fast concurrent extraction for multiple apps
  • ๐Ÿ’ช Robust error handling - automatic retries and graceful error recovery
  • ๐Ÿ›ก๏ธ Rate limiting - respect API limits and prevent blocking
  • ๐ŸŽจ Rich CLI - beautiful command-line interface with progress tracking
  • ๐Ÿ“Š Multiple output formats - JSON, pretty-printed, or custom formatting

Installation

pip install apple-appstore-metadata-extractor

What's New in v0.2.0

v0.2.0 closes three gaps that previously required a third-party service:

  • Search โ€” AppStoreSearcher / appstore-extractor search
  • Reviews โ€” AppStoreReviewExtractor / appstore-extractor reviews
  • Rankings โ€” AppStoreRankingFetcher / appstore-extractor chart & rank
  • CompositeAppStoreClient โ€” one client bundling all four extractors with a shared rate limiter (default 20 req/min, the iTunes per-IP cap).
  • Country parameter on every existing extractor (defaults to "us").

Recommended entry point:

import asyncio
from appstore_metadata_extractor import CompositeAppStoreClient

async def discover():
    async with CompositeAppStoreClient(country="us") as client:
        # 1. Find candidate apps.
        hits = await client.search.search("habit tracker", limit=20)

        for hit in hits.hits[:3]:
            # 2. Pull full metadata (existing CombinedExtractor).
            meta = client.metadata.fetch(hit.url)

            # 3. Mine reviews (up to 10 pages โ‰ˆ 500 reviews).
            reviews = await client.reviews.fetch_reviews(
                hit.app_id, max_pages=5
            )

            # 4. Look up current chart rank.
            rank = await client.rankings.find_app_rank(
                hit.app_id, chart="top-free"
            )

            print(f"{hit.name}: rank={rank}, reviews={reviews.total_reviews}")

asyncio.run(discover())

The four sub-extractors (AppStoreSearcher, AppStoreReviewExtractor, AppStoreRankingFetcher, CombinedExtractor) are also importable individually if you need finer control.

Quick Start

Command Line

Extract metadata for a single app:

appstore-extractor extract https://apps.apple.com/us/app/example/id123456789

Extract from multiple apps:

appstore-extractor extract-batch apps.json

Monitor apps for changes:

appstore-extractor watch apps.json --interval 3600

Python Library

from appstore_metadata_extractor import AppStoreScraper

# Initialize scraper
scraper = AppStoreScraper()

# Extract single app metadata
metadata = scraper.extract("https://apps.apple.com/us/app/example/id123456789")
print(f"App: {metadata.title}")
print(f"Version: {metadata.version}")
print(f"Rating: {metadata.rating}")

# Access In-App Purchases
if metadata.in_app_purchases:
    print(f"\nIn-App Purchases ({len(metadata.in_app_purchase_list)} items):")
    for iap in metadata.in_app_purchase_list:
        print(f"  - {iap['name']}: {iap['price']}")

# Access Support Links
print(f"\nSupport Links:")
print(f"  App Support: {metadata.app_support_url}")
print(f"  Privacy Policy: {metadata.privacy_policy_url}")
print(f"  Developer Website: {metadata.developer_website_url}")

# Access Screenshots (NEW in v0.1.10)
print(f"\nScreenshots:")
print(f"  iPhone: {len(metadata.screenshots)} screenshots")
print(f"  iPad: {len(metadata.ipad_screenshots)} screenshots")
if metadata.ipad_screenshots:
    print(f"  First iPad screenshot: {metadata.ipad_screenshots[0]}")

# Extract multiple apps
urls = [
    "https://apps.apple.com/us/app/app1/id111111111",
    "https://apps.apple.com/us/app/app2/id222222222"
]
results = scraper.extract_batch(urls)

Async Usage

import asyncio
from appstore_metadata_extractor import CombinedExtractor

async def main():
    extractor = CombinedExtractor()

    # Extract single app
    result = await extractor.extract("https://apps.apple.com/us/app/example/id123456789")

    # Extract multiple apps concurrently
    urls = ["url1", "url2", "url3"]
    results = await extractor.extract_batch(urls)

asyncio.run(main())

Standalone Review Mining (v0.2.0)

import asyncio
from appstore_metadata_extractor import AppStoreReviewExtractor

async def mine_reviews():
    extractor = AppStoreReviewExtractor()
    try:
        # Single app โ€” up to 10 pages โ‰ˆ 500 reviews.
        batch = await extractor.fetch_reviews(
            app_id="310633997",       # WhatsApp Messenger
            country="us",
            sort="mostrecent",        # or "mosthelpful"
            max_pages=10,
        )
        print(f"Got {batch.total_reviews} reviews across {batch.pages_fetched} page(s)")
        for review in batch.reviews[:3]:
            print(f"  {review.rating}โ˜… โ€” {review.author}: {review.title}")

        # Batch โ€” many apps at once, with a concurrency cap.
        batches = await extractor.fetch_reviews_batch(
            app_ids=["310633997", "284882215", "454638411"],
            country="us",
            max_pages=3,
            max_concurrent=3,
        )
        for app_id, b in batches.items():
            print(f"{app_id}: {b.total_reviews} reviews")
    finally:
        await extractor.close()

asyncio.run(mine_reviews())

Standalone Chart Rankings (v0.2.0)

import asyncio
from appstore_metadata_extractor import AppStoreRankingFetcher

async def check_rankings():
    fetcher = AppStoreRankingFetcher()
    try:
        # Fetch a full chart snapshot.
        snapshot = await fetcher.fetch_chart(
            chart="top-free",         # "top-free" | "top-paid" | "top-grossing"
            country="us",
            limit=100,
        )
        print(f"Top 3 free apps:")
        for entry in snapshot.entries[:3]:
            print(f"  #{entry.rank} โ€” {entry.name} ({entry.developer_name})")

        # Or just look up one app's rank.
        rank = await fetcher.find_app_rank(
            app_id="310633997", chart="top-free", country="us", limit=100
        )
        print(f"WhatsApp rank: {rank if rank else 'not in top 100'}")
    finally:
        await fetcher.close()

asyncio.run(check_rankings())

Standalone Search (v0.2.0)

import asyncio
from appstore_metadata_extractor import AppStoreSearcher

async def search_competitors():
    searcher = AppStoreSearcher()
    try:
        # Keyword search.
        results = await searcher.search("habit tracker", country="us", limit=25)
        print(f"Found {results.total_count} matches; top {len(results.hits)} returned:")
        for hit in results.hits[:5]:
            print(f"  {hit.name} โ€” {hit.developer_name} ({hit.formatted_price})")

        # Genre-only browse (6017 = Lifestyle).
        top_lifestyle = await searcher.search_by_genre(6017, country="us", limit=10)
        for hit in top_lifestyle.hits:
            print(f"  {hit.primary_category}: {hit.name}")
    finally:
        await searcher.close()

asyncio.run(search_competitors())

CLI Commands

extract - Extract single app metadata

appstore-extractor extract [OPTIONS] URL

Options:
  -o, --output PATH         Output file path
  -f, --format [json|pretty]  Output format (default: pretty)
  --no-cache               Disable caching
  --country TEXT           Country code (default: us)

extract-batch - Extract multiple apps

appstore-extractor extract-batch [OPTIONS] INPUT_FILE

Options:
  -o, --output PATH         Output file path
  -f, --format [json|pretty]  Output format
  --concurrent INTEGER     Max concurrent requests (default: 5)
  --delay FLOAT           Delay between requests in seconds

watch - Monitor apps for changes

appstore-extractor watch [OPTIONS] INPUT_FILE

Options:
  --interval INTEGER       Check interval in seconds (default: 3600)
  --output-dir PATH       Directory for history files
  --notify               Enable notifications for changes

search - Find apps via the iTunes Search API (v0.2.0)

appstore-extractor search "habit tracker" --limit 25
appstore-extractor search --genre-id 6017 --limit 20  # Lifestyle category

Options:
  --country TEXT          Storefront code (default: us)
  --limit INTEGER         Max results (1โ€“200, default: 50)
  --genre-id INTEGER      Optional category filter
  -o, --output PATH       Write JSON to file instead of stdout

reviews / reviews-batch - Mine app reviews (v0.2.0)

appstore-extractor reviews 310633997 --max-pages 5
appstore-extractor reviews-batch ids.txt --max-pages 5 --concurrent 3

Options (reviews):
  --country TEXT                          Storefront code (default: us)
  --max-pages INTEGER                     1โ€“10 (Apple's cap, default: 10)
  --sort [mostrecent|mosthelpful]        Sort order (default: mostrecent)
  -o, --output PATH                       Write JSON to file

reviews-batch reads one app ID per line from IDS_FILE.

chart / rank - Chart snapshot and per-app rank (v0.2.0)

appstore-extractor chart top-free --limit 50
appstore-extractor chart top-paid --country us --genre-id 6017 --limit 25
appstore-extractor rank 310633997 --chart top-free --country us

Input File Format

For batch operations, use a JSON file:

{
  "apps": [
    {
      "name": "Example App 1",
      "url": "https://apps.apple.com/us/app/example-1/id123456789"
    },
    {
      "name": "Example App 2",
      "url": "https://apps.apple.com/us/app/example-2/id987654321"
    }
  ]
}

Extracted Fields

The extractor provides comprehensive app metadata including:

Basic Information

  • app_id - Apple App Store ID
  • bundle_id - App bundle identifier
  • url - App Store URL
  • name - App name
  • subtitle - App subtitle/tagline (web scraping required)
  • developer_name - Developer name
  • developer_id - Developer ID
  • developer_url - Developer page URL

Categories

  • category / primary_category - Primary category name
  • category_id / primary_category_id - Primary category ID
  • categories - List of all categories
  • category_ids - List of all category IDs

Pricing & Purchases

  • price - App price (numeric value)
  • formatted_price - Formatted price string (e.g., "$4.99" or "Free")
  • currency - Currency code (e.g., "USD")
  • in_app_purchases - Boolean indicating if app has IAPs
  • in_app_purchase_list - Detailed list of IAPs (web scraping required):
    • name - IAP item name
    • price - Formatted price
    • price_value - Numeric price
    • type - IAP type (auto_renewable_subscription, non_consumable, etc.)
    • currency - Currency code

Version Information

  • current_version - Current version number
  • version_date / current_version_release_date - Release date
  • whats_new / release_notes - What's new in this version
  • version_history - List of previous versions (web scraping required)
  • initial_release_date - First release date
  • last_updated - Last update to any field

Content & Description

  • description - Full app description
  • content_rating - Age rating (e.g., "4+", "12+")
  • content_advisories - List of content warnings

Languages (web scraping required)

  • languages - Human-readable language names (e.g., "English", "Spanish")
  • language_codes - ISO language codes (e.g., "EN", "ES")

Ratings & Reviews

  • average_rating - Average user rating (0-5)
  • rating_count - Total number of ratings
  • average_rating_current_version - Rating for current version
  • rating_count_current_version - Ratings for current version
  • rating_distribution - Star breakdown (web scraping required)
  • reviews - User reviews list (web scraping required)

Media Assets

  • icon_url - App icon URL (512x512)
  • icon_urls - Dictionary of multiple icon sizes
  • screenshots - List of iPhone screenshot URLs
  • ipad_screenshots - List of iPad screenshot URLs (NEW in v0.1.10 - from iTunes API and web scraping)

Support Links (web scraping required)

  • app_support_url - Direct link to app support page
  • privacy_policy_url - Link to privacy policy
  • developer_website_url - Main developer website
  • support_url - Support website (alias)
  • marketing_url - Marketing website

Technical Details

  • file_size_bytes - Size in bytes
  • file_size_formatted - Human-readable size (e.g., "245.8 MB")
  • minimum_os_version - Minimum iOS version required
  • supported_devices - List of compatible devices

Features & Capabilities

  • features - List of app features/capabilities
  • is_game_center_enabled - Game Center support
  • is_vpp_device_based_licensing_enabled - VPP device licensing

Privacy Information (web scraping required)

  • privacy - Detailed privacy information including:
    • data_used_to_track
    • data_linked_to_you
    • data_not_linked_to_you
    • privacy_details_url

Related Content (web scraping required)

  • developer_apps - Other apps by the same developer
  • similar_apps - "You might also like" recommendations
  • rankings - Chart positions (e.g., {"Games": 5, "Overall": 23})

Metadata

  • data_source - Source of the data (itunes_api, web_scrape, combined)
  • extracted_at / scraped_at - When data was collected
  • raw_data - Raw response data (optional, for debugging)

v0.2.0 Models โ€” Search, Reviews, Rankings

The new extractors return their own typed Pydantic models, separate from AppMetadata / ExtendedAppMetadata.

SearchHit (from AppStoreSearcher.search / search_by_genre)

Per-result fields populated from the iTunes Search API.

  • app_id โ€” Apple track ID (string)
  • bundle_id โ€” App bundle identifier (optional)
  • name โ€” App name (trackName)
  • developer_name โ€” Developer / artist name
  • developer_id โ€” Artist ID (optional)
  • url โ€” apps.apple.com URL (trackViewUrl)
  • icon_url โ€” Best available artwork URL (512 โ†’ 100 โ†’ 60 fallback)
  • average_rating โ€” averageUserRating (optional)
  • rating_count โ€” userRatingCount (optional)
  • price โ€” Numeric price in the storefront currency
  • formatted_price โ€” Price as a display string (e.g. "Free", "$4.99")
  • primary_category โ€” primaryGenreName
  • primary_category_id โ€” primaryGenreId
  • description โ€” Full description (iTunes Search returns this inline)
  • country โ€” Storefront the result came from

SearchResults (wrapper for a single query)

  • query โ€” The original search term (or "genre:<id>" for genre-only searches)
  • country โ€” Storefront code
  • total_count โ€” resultCount from the API (may exceed len(hits) if the API truncated)
  • hits โ€” List of SearchHit
  • fetched_at โ€” UTC timestamp the query was issued

Review (one user review, reused from models_combined.Review)

  • author โ€” Reviewer's screen name
  • rating โ€” Star rating, 1โ€“5 (validated)
  • title โ€” Review title (optional)
  • content โ€” Review body
  • date โ€” datetime parsed from the RSS updated field
  • version โ€” App version the review was written against (optional)
  • helpful_count โ€” im:voteSum count (coerced to int; 0 if missing)

ReviewBatch (from AppStoreReviewExtractor.fetch_reviews)

  • app_id โ€” The target app ID
  • country โ€” Storefront code
  • sort โ€” "mostrecent" or "mosthelpful"
  • pages_fetched โ€” How many pages were actually fetched (โ‰ค requested max_pages)
  • total_reviews โ€” len(reviews) after dedup
  • reviews โ€” List of Review
  • fetched_at โ€” UTC timestamp
  • has_more โ€” True if we stopped at the requested cap rather than end-of-data
  • notes โ€” Diagnostic strings (e.g. "page 5: 404 โ€” end of data")

RankingEntry (one entry in a chart, 1-indexed)

  • rank โ€” Position in the chart, starting at 1
  • app_id โ€” Apple track ID
  • name โ€” App name
  • developer_name โ€” Developer / artist name
  • genre_ids โ€” List of genre IDs (often empty for the overall chart)
  • artwork_url โ€” Icon URL (artworkUrl100, optional)
  • url โ€” apps.apple.com URL if Apple includes it

ChartSnapshot (from AppStoreRankingFetcher.fetch_chart)

  • chart โ€” "top-free", "top-paid", or "top-grossing"
  • country โ€” Storefront code
  • genre_id โ€” None for the overall chart, otherwise the genre filter applied
  • fetched_at โ€” UTC timestamp โ€” stitch these together to build history
  • entries โ€” Ordered list of RankingEntry

Snapshots are point-in-time only. The package returns the chart as it is right now; storing daily snapshots for trend history is the consumer's responsibility.

Migration Guide

v0.1.10 - Screenshot Updates

The iTunes API extractor now returns ExtendedAppMetadata instead of basic AppMetadata, which includes:

  • ipad_screenshots - Separate field for iPad screenshots
  • developer_url - Developer page URL from iTunes
  • initial_release_date - When the app was first released
  • average_rating_current_version and rating_count_current_version
# The screenshots field still contains iPhone screenshots
iphone_screenshots = metadata.screenshots  # iPhone only

# NEW: iPad screenshots are now separate
ipad_screenshots = metadata.ipad_screenshots  # iPad only (if available)

v0.1.6 - CombinedExtractor Migration

If you were using CombinedAppStoreScraper, it has been consolidated into CombinedExtractor. The old class name still works via an alias, but we recommend updating your code:

# Old way (still works via alias)
from appstore_metadata_extractor import CombinedAppStoreScraper
scraper = CombinedAppStoreScraper()
result = scraper.fetch(url)

# New way (recommended)
from appstore_metadata_extractor import CombinedExtractor
extractor = CombinedExtractor()
metadata = extractor.fetch(url)  # Synchronous method
# or
result = await extractor.extract(url)  # Async method

The new CombinedExtractor offers:

  • Full backward compatibility
  • Better type safety
  • Support for extraction modes (iTunes-only vs combined)
  • Both sync and async interfaces

Advanced Usage

Custom Extraction Modes

from appstore_metadata_extractor import CombinedExtractor, ExtractionMode

extractor = CombinedExtractor()

# API-only mode (faster, less data)
result = await extractor.extract(url, mode=ExtractionMode.API_ONLY)

# Web scraping mode (slower, more complete)
result = await extractor.extract(url, mode=ExtractionMode.WEB_SCRAPE)

# Combined mode (default - best of both)
result = await extractor.extract(url, mode=ExtractionMode.COMBINED)

Rate Limiting Configuration

from appstore_metadata_extractor import RateLimiter

# Configure custom rate limits
rate_limiter = RateLimiter(
    calls_per_minute=20,  # iTunes API limit
    min_delay=1.0        # Minimum delay between calls
)

scraper = AppStoreScraper(rate_limiter=rate_limiter)

Caching

from appstore_metadata_extractor import CacheManager

# Configure cache
cache = CacheManager(
    ttl=300,  # Cache TTL in seconds
    max_size=1000  # Maximum cache entries
)

scraper = AppStoreScraper(cache_manager=cache)

Error Handling

The library provides robust error handling with automatic retries:

from appstore_metadata_extractor import AppNotFoundError, RateLimitError

try:
    metadata = scraper.extract(url)
except AppNotFoundError:
    print("App not found")
except RateLimitError:
    print("Rate limit exceeded, please wait")
except Exception as e:
    print(f"Extraction failed: {e}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development

See DEVELOPMENT.md for detailed development setup and workflow instructions.

Quick Start:

# Clone and setup
git clone https://github.com/yourusername/appstore-metadata-extractor-python.git
cd appstore-metadata-extractor-python
./dev-setup.sh

# Activate environment and develop
source venv/bin/activate

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is for educational and research purposes only. Make sure to comply with Apple's Terms of Service and robots.txt when using this tool. Be respectful of rate limits and implement appropriate delays between requests.

Acknowledgments

Related Projects

For a full-featured solution with web API, authentication, and UI, check out the parent project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apple_appstore_metadata_extractor-0.2.2.tar.gz (100.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file apple_appstore_metadata_extractor-0.2.2.tar.gz.

File metadata

File hashes

Hashes for apple_appstore_metadata_extractor-0.2.2.tar.gz
Algorithm Hash digest
SHA256 954566cebb869234cf0d7d4030638fec71e88769a797bdb3901144fba65cd458
MD5 13dc11f6654f42ad831073d6f7272a6b
BLAKE2b-256 ff29c30bdf3c08d99e39fadcb6b2174222f6489232c4cd2b3b4e62a24d06483e

See more details on using hashes here.

File details

Details for the file apple_appstore_metadata_extractor-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for apple_appstore_metadata_extractor-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 51c2a1c96aabaa31c2745dc61cb3d3a4fc2297325ab137953c0557f0ec93e3d2
MD5 7e829001f65fb0d3365def19a3327a03
BLAKE2b-256 0d5728dfb79cc6fecfcf08237e7123edab2a5a9b3e7504bbf6f35a8e6426e81d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page