Skip to main content

Python wrapper for Il Post newspaper API

Project description

ilpost-api-wrapper

A Python wrapper for the Il Post public search API. Searches articles, podcast episodes, and newsletters — no authentication required.

Installation

pip install ilpost-api-wrapper

Requires Python 3.9+. No third-party dependencies.

Quick start

from ilpost import IlPostClient, SortOrder, ContentType, DateRange

client = IlPostClient()

result = client.search("berlusconi")
for doc in result.docs:
    print(doc.title, doc.link)

API reference

IlPostClient(timeout=10)

Method Description
search(query, ...) General search across all content types
search_articles(query, ...) Articles only
search_podcasts(query, ...) Podcast episodes only
search_newsletters(query, ...) Newsletter issues only
paginate(query, ...) Generator that yields one SearchResult per page
get_by_date(date, ...) All articles published on a specific date (scrapes the date-archive page)

Common parameters

Parameter Type Default Description
query str Search term
page int 1 Page number (1-based)
hits int 10 Results per page
sort SortOrder RELEVANCE Sort order
content_type ContentType None Filter by content type
category str | list[str] None Editorial category filter (articles only). Pass a list for OR union, e.g. ["politica", "economia"]
date_range DateRange None Publication date filter
fetch_content bool False Scrape and return full article text for each article result (see Document.content)

fetch_content is available on search(), search_articles(), paginate(), and get_by_date(). It has no effect on podcast or newsletter results — doc.content will be None for those types.

get_by_date(date, *, fetch_content=False)

Returns all articles published on date by scraping the Il Post date-archive page (https://www.ilpost.it/YYYY/MM/DD/), paginating through all pages automatically. Each article is then enriched with API fields (id, tags, category, subscriber status, full timestamp) via a title-based search lookup (with a summary-keyword fallback). Articles that cannot be matched in the search index are excluded from the results.

The following recurring article types are automatically skipped before enrichment, as they are not available in the search API: post-it external-link articles (pre-2017), Peanuts comic strips, daily photo roundups, newspaper front-page galleries (le-prime-pagine-*), and meteo forecast articles.

Parameter Type Default Description
date datetime.date Publication date to fetch
fetch_content bool False Scrape the full article body for each result

Raises ValueError if date is in the future or within the last 5 days (the search index has a ~5 day lag; recent dates cannot be enriched).

Enums

SortOrder

Value Description
RELEVANCE Sort by relevance score (default)
NEWEST Most recent first
OLDEST Oldest first

ContentType

Value Description
ARTICLES Articles and news posts
PODCASTS Podcast episodes
NEWSLETTERS Newsletter issues

DateRange

Value Description
ALL_TIME Entire archive (default)
PAST_YEAR Past 12 months
PAST_30_DAYS Past 30 days

SearchResult

Attribute Type Description
total int Total number of matching results
docs list[Document] Results for this page
filters list[FilterGroup] Available filters with counts
sort str Active sort value
hits int Page size
page int Current page number
total_pages int Total number of pages
has_next_page bool Whether a next page exists
has_prev_page bool Whether a previous page exists

Document

Attribute Type Description
id int Unique content identifier
type str "post", "flashes", "blog_post", "episodes", or "newsletter"
title str Content title
link str URL to the content page
timestamp str Publication date (ISO 8601, Italian local time)
summary str Short excerpt
image str Cover image URL
score float Relevance score (0.0 when sorting by date)
subscriber bool True if content is paywalled
highlight str | None Snippet with matched term in <span> tags
category str | None Editorial category (articles only)
post_tag_text list[str] Tags (articles only)
derived_info dict Extra data: episode or newsletter metadata
content str | None Full article body text, populated when fetch_content=True (articles only)
is_article bool Convenience property
is_podcast bool Convenience property
is_newsletter bool Convenience property
is_paywalled bool Alias for subscriber

Examples

from ilpost import IlPostClient, SortOrder, ContentType, DateRange

client = IlPostClient()

# Most recent articles in politics
result = client.search_articles(
    "renzi",
    sort=SortOrder.NEWEST,
    category="politica",
    date_range=DateRange.PAST_30_DAYS,
)

# Podcast search
result = client.search_podcasts("cacao", sort=SortOrder.NEWEST)

# Paginate through all results, 5 per page
for page in client.paginate("sicilia", hits=5, max_pages=10):
    print(f"Page {page.page}/{page.total_pages}")
    for doc in page.docs:
        print(f"  [{doc.type}] {doc.title}")

# Access filter counts from a response
result = client.search("europa")
for group in result.filters:
    print(f"{group.label}:")
    for opt in group.options:
        print(f"  {opt.label}: {opt.doc_count}")

# Fetch full article text alongside search results
result = client.search_articles("economia", hits=5, fetch_content=True)
for doc in result.docs:
    print(doc.title)
    if doc.content:
        print(doc.content[:300])

# Use the scraper directly (e.g. for parallel fetching)
from ilpost import fetch_article_content
text = fetch_article_content("https://www.ilpost.it/2026/04/02/...")

# Fetch all articles published on a specific date
import datetime
docs = client.get_by_date(datetime.date(2025, 11, 12))
for doc in docs:
    print(doc.title, doc.timestamp)

CLI

The ilpost-search command is included with the package:

usage: ilpost-search [-h] [--type {articles,podcasts,newsletters}]
                     [--sort {relevance,newest,oldest}]
                     [--date {all,year,month}] [--category CATEGORY [CATEGORY ...]]
                     [--page PAGE] [--hits HITS] [--all-pages] [--max-pages N]
                     [--fetch-content] [--output-json] [--output-dir DIR]
                     [--archive-date YYYY-MM-DD]
                     [query]

Each result is printed with labelled fields: type, category, tags, title, link, date, score, summary, and either content (when --fetch-content is used) or excerpt (search highlight).

--output-json suppresses stdout and writes results to a JSON file instead. The filename is generated automatically from the timestamp and query (e.g. 20260411_143000_matteo_zuppi.json). Use --output-dir to specify a target directory (defaults to the current working directory). The file path is printed to stdout so it can be captured by scripts.

# Basic search
ilpost-search berlusconi

# Most recent articles in politics
ilpost-search renzi --type articles --sort newest --category politica

# Podcast search, past 30 days
ilpost-search cacao --type podcasts --date month

# Page 2, 5 results per page, oldest first
ilpost-search sicilia --sort oldest --hits 5 --page 2

# Fetch all pages of newsletter results (up to 3 pages)
ilpost-search economia --type newsletters --all-pages --max-pages 3

# Fetch full article text for each result
ilpost-search bondi --type articles --hits 3 --fetch-content

# Filter by multiple categories (OR union)
ilpost-search cultivar --category scienza italia mondo

# Save results to a JSON file in the current directory
ilpost-search berlusconi --hits 20 --output-json

# Save results to a specific directory
ilpost-search berlusconi --all-pages --output-json --output-dir ~/Desktop

# Fetch all articles published on a specific date
ilpost-search --archive-date 2025-11-12

# Fetch articles by date and save to JSON
ilpost-search --archive-date 2025-11-12 --output-json

# Fetch articles by date with full content
ilpost-search --archive-date 2025-11-12 --fetch-content

Notes

  • Paywalled content (subscriber: true) is included in search results — title, summary, and highlight are visible, but the full article requires an active ilpost.it subscription.
  • When sorting by date (NEWEST or OLDEST), score is always 0.0.
  • The category filter only applies to articles. It is ignored by the server when content_type=PODCASTS.
  • Timestamps are in Italian local time (CET/CEST) with no UTC offset.
  • The search index has a ~5 day lag. Articles published in the last few days will not appear in search results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ilpost_api_wrapper-0.5.1.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ilpost_api_wrapper-0.5.1-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file ilpost_api_wrapper-0.5.1.tar.gz.

File metadata

  • Download URL: ilpost_api_wrapper-0.5.1.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ilpost_api_wrapper-0.5.1.tar.gz
Algorithm Hash digest
SHA256 1ee436995b962ebf910a8662dd5f11023e26a50536e37d83538c4455dbe96247
MD5 8fd7899e4fc7d26b99c91bac454cdcfc
BLAKE2b-256 84f609c9dbb4bab30225e6276350616b36c1aad636c355fba8e220419e60abde

See more details on using hashes here.

File details

Details for the file ilpost_api_wrapper-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ilpost_api_wrapper-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e796755b3b69f07fb47a1697fff1beff0181222d5f5e2759b1a82d8c050e0546
MD5 68b012118364134938fccdd9f683ba92
BLAKE2b-256 aee08aa5911d1c5df9a2d80b6a19b16984f505f2572d408a889b78fba17e6eaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page