Skip to main content

Python SDK for the HKFilings API — parse HKEX annual / interim PDFs into source-traced financial facts, industry signals and supply-chain graphs.

Project description

hkfilings — turn HKEX annual report PDFs into source-traced JSON

hkfilings · Python SDK for HKEX annual & interim reports

Stop hand-copying numbers from 300-page HKEX PDFs.
Get every fact — with its source page — in 5 lines of Python.

PyPI Python License: MIT CI Downloads

English · 中文
Install · Quickstart · Data shape · Cookbook · API · Plans · FAQ


Why this exists

If you've ever tried to extract revenue, segment breakdowns or capex guidance from a 300-page HKEX PDF, you know the failure modes: numbers without page references, OCR drift, units mixed between HKD and RMB, hallucinated year-over-year changes that don't reconcile, segment totals that don't sum to the consolidated figure.

This SDK is a client for an API that returns every fact with:

  • source_page + source_text — point at the exact location in the PDF (page-level bbox lands in extra when the parser can resolve it)
  • 13 deterministic validators — YoY recalc, accounting equation, segment reconciliation, currency / unit consistency, cashflow sign checks, EPS sign, free-cashflow definition, …
  • Frozen v1 schema with forward-compat extra dict — new backend fields land in extra; your existing code never breaks on a release
  • Layer-2 signals — 11 enumerated signal types (margin driver, upstream cost pressure, downstream demand, inventory & orderbook, capex, …), each bound to evidence with anti-hallucination rules
  • Supply-chain graph — suppliers, customers, competitors, regulators, substitutes, partners — with exposure share and direction

vs the alternatives

Source-traced Validated HK coverage Schema stability Free tier
Manual analyst depends manual full n/a
Wind / Bloomberg / CIQ proprietary full shifts each release
Generic LLM on a PDF hallucinates any none API costs
hkfilings page + text 13 validators full HKEX frozen v1

Install

pip install hkfilings

Requires Python ≥ 3.10. The only runtime dependency is httpx. Works great in Jupyter / Colab.

Quickstart

from hkfilings import HKFilingsClient

client = HKFilingsClient(api_key="ak_...")

task   = client.analyze(ticker="9988", year=2026)   # Alibaba 9988.HK
report = client.wait(task.task_id, timeout=600)

for fact in report.facts:
    print(f"{fact.metric_key:32}  {fact.value:>18,.0f}  p.{fact.source_page}")

Free tier: 20 tasks/month, no credit card required. → Get a key

What you get back

Every fact carries its provenance:

fact.metric_key       # "revenue"
fact.metric_label     # "Revenue"
fact.value            # 245_864_000_000.0
fact.comparable_value # 224_500_000_000.0    (prior period)
fact.yoy_change       # 0.0952
fact.source_page      # 87
fact.source_text      # "Revenue for the year increased to RMB245,864..."
fact.confidence       # 0.98
fact.extra            # forward-compat fields (e.g. bbox, unit) land here

Layer-2 signals come with direction + evidence:

signal.signal_type     # "margin_driver"
signal.direction       # "up" | "down" | "flat"
signal.summary         # "Cloud margin lifted on lower GPU procurement..."
signal.evidence        # [{"page": 42, "text": "..."}, ...]
signal.review_status   # "approved" | "auto_passed" | "pending"

Supply-chain nodes come with role + exposure share:

node.node_label             # "TSMC"
node.node_role              # "supplier" | "customer" | "competitor" | "regulator" | "substitute" | "partner"
node.exposure_share         # 0.18    (18% of revenue / cost tied to this node)
node.direction_to_company   # "inflow" | "outflow"
node.evidence_page          # 142

Cookbook

Tip: every snippet below runs as-is in a Jupyter notebook once you've exported HKFILINGS_API_KEY in your shell.

Compare gross margin: BABA vs Tencent, last 3 fiscal years

from hkfilings import HKFilingsClient
import pandas as pd

client = HKFilingsClient(api_key="ak_...")
rows = []
for tk in ("9988", "0700"):
    m = client.company_matrix(tk, metrics=["revenue", "gross_profit"])
    rows.extend({"ticker": tk, **cell} for cell in m.cells)

df = pd.DataFrame(rows).pivot_table(
    index="period", columns=["ticker", "metric_key"], values="value"
)
print(df)

Read industry signals + evidence

sigs = client.task_signals(task.task_id, signal_type="margin_driver")
for s in sigs.signals:
    print(f"[{(s.direction or '-'):>4}] {s.summary}")
    for ev in s.evidence:
        text = (ev.get("text") or "")[:80]
        print(f"        p.{ev.get('page')}{text}")

# [  up] Cloud Intelligence margin lifted 4.2pp YoY on lower GPU procurement...
#         p.42 — During the year, Cloud Intelligence Group recorded segment...

Render the supply-chain graph

import networkx as nx

graph = nx.DiGraph()
sc = client.company_supply_chain("9988")
for node in sc.nodes:
    graph.add_edge(
        "9988",
        node.node_label,
        role=node.node_role,
        exposure=node.exposure_share,
    )
print(graph)   # → "DiGraph with N nodes and M edges"  (networkx ≥ 3.0)

Export to CSV for Excel

with open("baba_2026_facts.csv", "wb") as fh:
    fh.write(client.facts_csv(task.task_id))

More runnable examples in examples/.

API reference

Method What it does
analyze(ticker, year, …) Auto-discover and parse a report by ticker + year
create_task(pdf_url, …) Parse a PDF by URL
upload(file_path, …) Upload a local PDF
task_status(task_id) Poll task progress
wait(task_id, timeout=600) Block until the task reaches a terminal state
result(task_id) Return the Layer-1 financial-facts envelope
facts_csv(task_id) Same data, as CSV bytes
company_matrix(ticker, metrics=…) Cross-period matrix for a ticker
task_signals(task_id, …) Layer-2 signals for one report
company_signals(ticker, …) Cross-period signal feed
task_supply_chain(task_id) Supply-chain nodes for one report
company_supply_chain(ticker, …) Cross-period supply-chain feed
task_catalysts(task_id) Forward-looking catalysts (1–4Q horizon)
company_catalysts(ticker, …) Cross-period catalyst feed
intelligence_brief(task_id) Executive brief (rich nested)
review_diff(task_id, …) Diff between review versions
patch_fact(fact_id, **fields) Update a fact (review action)
patch_signal(signal_id, **fields) Update a signal (review action)
fact_comment(fact_id, body, …) Attach a review comment
schema(name="financial_fact") Fetch a JSON Schema document

Full docs: https://docs.hkfilings.app/python

Plans & limits

The SDK itself is free under MIT. The managed API runs on a usage-based plan with a free tier:

Plan Tasks / month Layer-2 access Export
Free 20 Limited JSON / CSV (watermark)
Pro 200 Full JSON / CSV
Team 1,000 Full + multi-seat JSON / CSV / MD
Enterprise Custom Full + webhooks + SLA Custom

See current pricing & upgrade on hkfilings.app

You can also self-host the parsing service if you have your own LLM budget — set base_url="https://your-host" when constructing the client. Get in touch (sales@hkfilings.app) for on-prem deployments.

Schema contract

The v1 public schema is frozen. JSON Schema documents:

New backend fields land in each dataclass's extra dict — you do not need to upgrade the SDK to see them. We never remove or rename a documented field within v1.

Configuration

Setting Constructor arg Env var
API base URL base_url= HKFILINGS_BASE_URL
API key api_key= HKFILINGS_API_KEY
Request timeout (s) timeout=60.0
User-Agent user_agent=

The client honors HTTPS_PROXY / HTTP_PROXY via httpx — useful behind corporate firewalls.

Roadmap

  • v0.2 — Async client (HKFilingsAsyncClient), retry with exponential backoff, client.facts_to_dataframe() helper.
  • v0.3 — Streaming task events (SSE wrapper), webhook signature helpers, CLI (hkfilings analyze 9988 2026).
  • v1.0 — Stable surface, deprecation warnings removed, TypeScript SDK companion.

Open issues for what you'd like prioritized.

FAQ

Q: Can I self-host the parsing engine? A: Yes for Enterprise customers — contact sales@hkfilings.app. The parsing service requires an LLM API key (DeepSeek, OpenAI, or Anthropic) and a Postgres database.

Q: How is this different from Wind / Bloomberg / Capital IQ? A: We focus on Hong Kong listings only and ship the entire evidence chain (source page + text) alongside every number. Our schema is public and frozen; theirs are proprietary and shift between releases. We're priced for individuals and small funds.

Q: A-shares? US listings? A: Not yet. We may add them once the HK coverage is rock-solid.

Q: Is the parsing logic open-source? A: This SDK is open-source under MIT. The parsing engine — LLM prompts, validators, anti-hallucination rules — is closed-source. We publish the JSON Schema contract as a stability commitment.

Q: How do I report a parsing bug for a specific report? A: Open a GitHub issue with the task_id. We'll triage on the SaaS side and either fix the engine or update the report.

Q: I got rate-limited. What now? A: Free tier is 20 tasks/month. Upgrade or wait for the next monthly reset. HKFilingsError will include status_code=429 and payload["upgrade_url"].

License

MIT. See SECURITY.md for vulnerability reporting, CONTRIBUTING.md to help out.


If this saves you time, star the repo — it's the cheapest way to support the project.
Get a free key → hkfilings.app/signup

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hkfilings-0.1.2.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hkfilings-0.1.2-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file hkfilings-0.1.2.tar.gz.

File metadata

  • Download URL: hkfilings-0.1.2.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hkfilings-0.1.2.tar.gz
Algorithm Hash digest
SHA256 80b5d350b49d24dd8c19035c5fcb9057a766aeb7b3bfa789efb2393723d00093
MD5 27dc49ecf0e2fcd1132c4f88fda2719a
BLAKE2b-256 ed7517c58fd05e03d3f4b876eaca0cda2dc9d16c90743f079f180b8d6ac82e92

See more details on using hashes here.

Provenance

The following attestation bundles were made for hkfilings-0.1.2.tar.gz:

Publisher: publish.yml on mylovelycodes/hkfilings-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hkfilings-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: hkfilings-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hkfilings-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ba138b0999864f508d1ba3fb4e7678b797f7148c4085e5975603638a17c59bbb
MD5 130a73a1e4764a458190bac873c3b7ab
BLAKE2b-256 af693120a881b6d0f76f549232ecf142169932ac23d21e2364e2968ad03b89c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for hkfilings-0.1.2-py3-none-any.whl:

Publisher: publish.yml on mylovelycodes/hkfilings-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page