Skip to main content

Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.

Project description

Valuein Quants

PyPI version Python 3.10+ License Tests

Point-in-Time accurate. Survivorship-bias free. 105M+ facts from 1990 to present.

US fundamental data from SEC EDGAR — 10-K, 10-Q, 8-K, 20-F and amendments — covering 10,000+ active and delisted entities. Cleaned, standardized, and queryable via SQL with zero local downloads.


What You Can Do With This Repository

Use Case Who Where to Start
Query financial data via Python Quants, data engineers Quickstart
Run 39 pre-built financial signals Analysts, quants SQL Templates
Learn with interactive notebooks Students, new users Examples & Notebooks
Pull data into Excel Financial analysts Excel Integration
Prove data quality to stakeholders Institutional buyers, compliance Research & Quality Proofs
Read methodology and compliance docs Due diligence, enterprise Documentation
Contribute templates, examples, research Open-source contributors Contributing

Quickstart

1. Install

pip install valuein-sdk

2. Get a free API token (S&P 500 coverage, no credit card required)

Register at valuein.biz

3. Set your key and run

export VALUEIN_API_KEY="your_token"
from valuein_sdk import ValueinClient

client = ValueinClient()
print(client)
# ValueinClient(plan='sp500', status='active', snapshot='2026-03-14', tables=6)

me = client.me()
print(me["plan"], me["email"])

Data Plans

Plan Entities Coverage Price
S&P 500 ~605 current + historical members, full history All 6 tables, all columns Free — Register
US Core Fundamentals 10,000+ active and delisted Full universe 1990–present, PIT, restatements Subscription — Buy now

Use Case 1 — Python SDK

Install valuein-sdk, authenticate once, run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process against Parquet files on Cloudflare R2.

Ticker lookup

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security"])

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)

Revenue trends

client = ValueinClient(tables=["entity", "security", "filing", "fact"])

df = client.query("""
    SELECT fa.fiscal_year,
           round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact     fa
    JOIN   filing   f  ON fa.accession_id = f.accession_id
    JOIN   security s  ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'MSFT'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept = 'Revenues'
      AND  f.form_type         = '10-K'
    ORDER  BY fa.fiscal_year DESC
    LIMIT  10
""")
print(df)

Point-in-Time backtest (the most important pattern)

Filter by filing_date, not report_date. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — using report_date leaks 34 days of future data into your backtest.

client = ValueinClient(tables=["security", "filing", "fact"])

TRADE_DATE = "2024-01-15"

df = client.query(f"""
    SELECT fa.standard_concept, fa.fiscal_year,
           f.filing_date,
           round(fa.numeric_value / 1e9, 2) AS value_bn
    FROM   fact     fa
    JOIN   filing   f  ON fa.accession_id = f.accession_id
    JOIN   security s  ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'NVDA'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept IN ('Revenues', 'NetIncomeLoss')
      AND  f.form_type         = '10-K'
      AND  f.filing_date      <= '{TRADE_DATE}'   -- PIT gate
    ORDER  BY f.filing_date DESC
""")
print(df)

Date columns reference

Column Table Use for
report_date / period_end filing / fact Aligning to fiscal calendar
filing_date filing PIT backtest filter — when the SEC received the filing
knowledge_at fact Millisecond-precision PIT for intraday signal research

API reference

client = ValueinClient(
    api_key="...",       # or read from VALUEIN_API_KEY env var
    gateway_url="...",   # override for local dev only
    tables=["entity"],   # load specific tables; omit to load all
)

client.me()                           # dict: plan, status, email, createdAt
client.manifest()                     # dict: snapshot, last_updated, tables
client.health()                       # dict: ok, worker, env (no auth required)
client.query(sql)                     # DuckDB SQL → pandas DataFrame
client.get(table)                     # Download full table → pandas DataFrame
client.run_template(name, **kwargs)   # Named SQL template → pandas DataFrame
client.tables()                       # List loaded table names

Error handling

from valuein_sdk import (
    ValueinAuthError,      # HTTP 401/403 — invalid or expired token
    ValueinPlanError,      # HTTP 403 — endpoint requires a higher plan
    ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
    ValueinAPIError,       # HTTP 5xx — includes .status_code
)

try:
    df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinRateLimitError as e:
    print(f"Rate limited. Retry in {e.retry_after}s")
except ValueinPlanError:
    print("Upgrade to Full plan for this data")

Use Case 2 — SQL Templates

39 production-ready SQL templates bundled with the SDK. Run complex financial signals in one line — no SQL required.

# DuPont decomposition for MSFT
df = client.run_template("16_dupont_decomposition", ticker="MSFT")

# Piotroski F-Score screen across S&P 500
df = client.run_template("17_piotroski_f_score", tickers="'AAPL', 'MSFT', 'NVDA'")

# Altman Z-Score (bankruptcy probability)
df = client.run_template("18_altman_z_score", ticker="TSLA")

# Trailing-twelve-months revenue
df = client.run_template("06_trailing_twelve_months_ttm", ticker="AMZN")

Template categories

Range Category Examples
01–04 Data Access Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen
05–09 Income Statement YoY revenue growth, TTM, margin analysis, FCF, R&D intensity
10–15 Balance Sheet Liquidity, solvency, interest coverage, cash conversion, capex ratios
16–20 Investment Scores DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly
21–26 Valuation & Screening Sector aggregates, peer ranking, dilution, arbitrage signals
27–33 Short Signals Late filers, restatements, 8-K events, ghost companies
34–39 Advanced Analytics PIT backtest engine, Z-score outliers, seasonality, XBRL audit

See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.


Use Case 3 — Examples & Notebooks

Six standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.

Python scripts (examples/python/)

Script Level What it demonstrates
getting_started.py Beginner Auth check, first query, entity counts by sector
usage.py Reference Every public SDK method demonstrated
entity_screening.py Beginner Screen by sector, SIC code, active vs inactive status
financial_analysis.py Intermediate Revenue trends, margins, concept normalization, peer comparison
pit_backtest.py Intermediate Correct PIT discipline, restatement impact, filing_date vs report_date
survivorship_bias.py Intermediate Delisted/bankrupt companies, index_membership, bias quantification

Run any script standalone:

VALUEIN_API_KEY=your_token python examples/python/getting_started.py

Jupyter notebooks (examples/notebooks/)

Notebook Open in Colab
Quickstart Open in Colab
Fundamental Analysis Open in Colab
PIT Backtest Open in Colab
Survivorship Bias Open in Colab

Use Case 4 — Excel Integration

Pull live SEC fundamental data into Microsoft Excel via Power Query. No Python required.

Requirements: Microsoft 365 (build 16.0.17531 or later)

  1. Download excel/valuein-fundamentals.xlsx
  2. Open the workbook and enter your API token in the Connectivity Guide sheet
  3. Click Refresh All — data streams directly from Parquet files on Cloudflare R2

The workbook includes 8 pre-configured sheets: Income Statement, Balance Sheet, Cash Flow, Entities, Securities, Filings, Index Membership, and a Data Dictionary.

For DIY Power Query connections, the M-language source files are in excel/power-query/.

Full setup walkthrough: docs/excel-guide.md


Use Case 5 — Research & Quality Proofs

16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.

# Install research dependencies
uv sync --group research

# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py

Research modules

research/fundamental/ — Financial statement analysis workflows

  • Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score

research/quantitative/ — Factor model and strategy research

  • PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation

research/data_engineering/ — XBRL normalization and pipeline analysis

  • Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector

research/quality_proof/ — Automated data quality validation

  • Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check

See research/README.md for a full breakdown of what each module proves and the key metric it validates.


Data Schema

Six tables, 90+ columns, fully documented in docs/DATA_CATALOG.xlsx and docs/schema.json.

Table Primary Key Description
entity cik Legal company — name, sector, SIC code, fiscal year end, status
security id Ticker symbols with SCD Type 2 date ranges (valid_from, valid_to, is_active)
filing accession_id SEC filing metadata — form type, filing_date, report_date, accepted_at
fact (entity_id, accession_id, concept, period_end, unit) Every financial fact from every filing, with knowledge_at PIT timestamp
taxonomy_guide standard_concept Definitions for 150 canonical concept names
index_membership Historical index constituent records (S&P 500 entry and exit dates)

Key joins

security.entity_id        →  entity.cik
filing.entity_id          →  entity.cik
fact.entity_id            →  entity.cik
fact.accession_id         →  filing.accession_id
index_membership.security_id  →  security.id

Standard concept names

Raw XBRL tags (15,000+) are normalized to canonical standard_concept values. Use these exact strings:

Concept standard_concept
Revenue / Sales 'Revenues'
Net Income 'NetIncomeLoss'
Total Assets 'Assets'
Gross Profit 'GrossProfit'
Operating Income 'OperatingIncomeLoss'

Both the raw concept tag and the normalized standard_concept are on the fact table — no join to a separate mapping table needed.


Why Valuein

Point-in-Time (PIT) Every fact carries filing_date (SEC receipt) and knowledge_at (millisecond precision). Filter filing_date <= trade_date to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision.
Survivorship-bias free 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record back to 1990. A backtest on survivors only is not a real backtest.
Standardized concepts 15,000+ raw XBRL tags mapped to ~150 canonical standard_concept values. One concept name works across every filer, regardless of what tag they filed.
DuckDB SQL In-process DuckDB with authenticated Parquet streaming. Queries run in milliseconds — no local downloads required.
39 SQL templates Production-ready queries for Altman Z-Score, DuPont, Piotroski F-Score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more.

Documentation

Document Description
docs/METHODOLOGY.md Data sourcing, PIT architecture, restatement handling, XBRL normalization logic
docs/COMPLIANCE_AND_DDQ.md Data provenance, MNPI policy, PIT integrity, security, SLA summary
docs/SLA.md Uptime targets, data freshness SLAs, support response times, SLA credits
docs/excel-guide.md Full Excel / Power Query setup walkthrough
docs/DATA_CATALOG.xlsx All columns, types, definitions, sample values
docs/schema.json Machine-readable JSON schema
CHANGELOG.md Full release history

Contributing

Contributions are welcome — new SQL templates, example scripts, research modules, and documentation improvements.

git clone https://github.com/valuein/quants.git
cd quants
uv sync --group dev
uv run pytest tests/ -k "not integration"   # all tests pass offline

See CONTRIBUTING.md for code standards, naming conventions, and the PR process.


For research and educational purposes only. Not financial advice.

Apache-2.0 License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuein_sdk-0.5.4.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valuein_sdk-0.5.4-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file valuein_sdk-0.5.4.tar.gz.

File metadata

  • Download URL: valuein_sdk-0.5.4.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuein_sdk-0.5.4.tar.gz
Algorithm Hash digest
SHA256 e70b7a6a4658b64f5c245bac6317b30b1d3b500200bd9d01dccda1abb8b1a6b9
MD5 74a026a2c91a012e03d0dd0b4ba84ad2
BLAKE2b-256 13ee091a8ff6dfba5302ef5f81d3eecf341011fd345693d1dd6d585e736808a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-0.5.4.tar.gz:

Publisher: publish.yml on valuein/quants

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valuein_sdk-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: valuein_sdk-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuein_sdk-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7c9d0587bc442a06ba11156758880c624afa804dcba32c7fb9a839a0a5b0d935
MD5 2570483cab701f9eb5a4a3122d8cf3b7
BLAKE2b-256 b2ea64e36c8aa60e8183b9a2de7630b019ca2f17b57f60bb81cd33dbae42fb32

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-0.5.4-py3-none-any.whl:

Publisher: publish.yml on valuein/quants

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page