Skip to main content

Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.

Project description

Valuein

PyPI version Python 3.10+ License CI

Valuein US Core Fundamentals

Institutional-grade SEC financial data. Point-in-Time accurate. Survivorship-bias free.

Standardized fundamentals from SEC EDGAR (10-K, 10-Q, 8-K, 20-F). 10,000+ active and delisted entities. ~105M facts. Coverage from 1990 to present. Query with DuckDB SQL — no downloads, no local database.


Install & first query

pip install valuein-sdk
export VALUEIN_API_KEY="your_token"   # get one at valuein.biz
from valuein_sdk import ValueinClient

client = ValueinClient(tables=["security", "filing", "fact"])

df = client.query("""
    SELECT fa.fiscal_year, round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact fa
    JOIN   filing   f ON fa.accession_id = f.accession_id
    JOIN   security s ON f.entity_id     = s.entity_id
    WHERE  s.symbol = 'NVDA' AND s.is_active = TRUE
      AND  f.form_type = '10-K' AND fa.standard_concept = 'TotalRevenue'
      AND  fa.fiscal_period = 'FY'
    QUALIFY row_number() OVER (PARTITION BY fa.fiscal_year ORDER BY fa.period_end DESC) = 1
    ORDER  BY fa.fiscal_year DESC LIMIT 5
""")
print(df)
#    fiscal_year  revenue_bn
# 0         2026      215.94
# 1         2025      130.50
# 2         2024       60.92
# 3         2023       26.97
# 4         2022       26.91

Why Valuein

Point-in-Time (PIT) Every fact has filing_date (SEC receipt date) and knowledge_at (millisecond precision). Filter filing_date <= trade_date to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision.
Survivorship-bias free 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record. A strategy back-tested on survivors only is not a real back-test.
Standardized concepts 15,000+ raw XBRL tags mapped to ~150 canonical standard_concept values via waterfall priority logic. One concept name works across every filer, regardless of what tag they chose.
DuckDB SQL An in-process DuckDB connection with authenticated access. Your SQL queries run in milliseconds against Parquet files — no local downloads required.
40+ SQL templates Production-ready queries for Altman Z-score, DuPont decomposition, Piotroski F-score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more.

Installation

# pip
pip install valuein-sdk

# uv (recommended)
uv add valuein-sdk

# with research extras (numpy, matplotlib, Jupyter)
pip install "valuein-sdk[research]"

Authentication — set your token in any of these ways:

# environment variable (recommended for scripts)
export VALUEIN_API_KEY="your_token"

# .env file in your project root (auto-loaded by the SDK)
echo 'VALUEIN_API_KEY=your_token' >> .env
# or pass directly
client = ValueinClient(api_key="your_token")

Get a token at valuein.biz.


Core examples

1 — Load only what you need

# Instant auth check — no tables downloaded
client = ValueinClient(tables=[])
print(client.me())      # {'plan': 'sp500', 'status': 'active', 'email': '...'}
print(client.manifest()) # {'last_updated': '...', 'tables': [...]}

# Load specific tables — much faster than loading all 7
client = ValueinClient(tables=["entity", "security"])

Always pass tables=. The fact table is large; load it only when you need financials.


2 — Ticker lookup and company metadata

client = ValueinClient(tables=["entity", "security"])

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.industry, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")

Use is_active = TRUE for current tickers. Historical and delisted companies have is_active = FALSE.


3 — Point-in-Time backtesting

The most important pattern. Always filter by filing_date, not report_date.

client = ValueinClient(tables=["security", "filing", "fact"])

TRADE_DATE = "2024-01-15"

# CORRECT — only data the market had on the trade date
df = client.query(f"""
    SELECT fa.standard_concept, fa.fiscal_year,
           f.filing_date, round(fa.numeric_value / 1e9, 2) AS value_bn
    FROM   fact    fa
    JOIN   filing  f  ON fa.accession_id = f.accession_id
    JOIN   security s ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'NVDA'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept IN ('TotalRevenue', 'NetIncome')
      AND  f.form_type         = '10-K'
      AND  f.filing_date      <= '{TRADE_DATE}'   -- ← PIT gate
    ORDER  BY f.filing_date DESC
""")
Column Use for
period_end / report_date Aligning to fiscal calendar
filing_date PIT backtest filter — when the SEC received the filing
knowledge_at Millisecond-precision PIT for intraday signal research

Filtering by report_date instead of filing_date leaks future data into your backtest. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — using report_date gives you 34 days of look-ahead.


4 — Annual financial metrics (deduplication)

10-K filings include comparative prior-year figures. Use QUALIFY to keep only the current fiscal year row.

client = ValueinClient(tables=["security", "filing", "fact"])

df = client.query("""
    SELECT
        s.symbol,
        fa.fiscal_year,
        round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact    fa
    JOIN   filing  f  ON fa.accession_id = f.accession_id
    JOIN   security s ON f.entity_id     = s.entity_id
    WHERE  s.symbol            IN ('NVDA', 'TSLA', 'AMZN')
      AND  s.is_active         = TRUE
      AND  f.form_type         = '10-K'
      AND  fa.standard_concept = 'TotalRevenue'
      AND  fa.fiscal_period    = 'FY'
    QUALIFY row_number() OVER (
        PARTITION BY s.symbol, fa.fiscal_year
        ORDER BY fa.period_end DESC          -- latest period = current year
    ) = 1
    ORDER BY fa.fiscal_year DESC, s.symbol
""")

5 — Cross-sectional screening

client = ValueinClient(tables=["entity", "security", "filing", "fact"])

# All companies with revenue > $10B in their most recent 10-K
df = client.query("""
    WITH latest AS (
        SELECT entity_id, accession_id,
               row_number() OVER (PARTITION BY entity_id ORDER BY filing_date DESC) AS rn
        FROM filing WHERE form_type = '10-K'
    )
    SELECT e.name, e.sector::VARCHAR AS sector,
           round(fa.numeric_value / 1e9, 1) AS revenue_bn
    FROM   latest  l
    JOIN   entity  e  ON l.entity_id    = e.cik
    JOIN   fact    fa ON fa.accession_id = l.accession_id
    WHERE  l.rn = 1
      AND  fa.standard_concept = 'TotalRevenue'
      AND  fa.numeric_value    > 10e9
    QUALIFY row_number() OVER (PARTITION BY e.cik ORDER BY fa.period_end DESC) = 1
    ORDER  BY revenue_bn DESC
    LIMIT  20
""")

6 — Survivorship-bias free universe

Most providers delete companies that went bankrupt or were delisted. We keep every one.

client = ValueinClient(tables=["entity"])

# Count active vs inactive — see the full picture
df = client.query("""
    SELECT status, count(*) AS companies,
           round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct
    FROM entity GROUP BY status ORDER BY companies DESC
""")

# Financial data for inactive companies works exactly the same
# as for active ones — just remove the is_active filter
df_inactive = client.query("""
    SELECT e.name, e.status, fa.fiscal_year,
           round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact    fa
    JOIN   filing  f ON fa.accession_id = f.accession_id
    JOIN   entity  e ON f.entity_id     = e.cik
    WHERE  e.status IN ('INACTIVE', 'DELISTED')
      AND  fa.standard_concept = 'TotalRevenue'
      AND  fa.fiscal_period    = 'FY'
    QUALIFY row_number() OVER (PARTITION BY e.cik, fa.fiscal_year ORDER BY fa.period_end DESC) = 1
    ORDER  BY e.name, fa.fiscal_year DESC
    LIMIT  30
""")

7 — Pre-built SQL templates

40+ production-ready templates. Run by name, pass parameters.

client = ValueinClient(tables=["entity", "security", "filing", "fact"])

df = client.run_template(
    "01_fundamentals_by_ticker",
    ticker="NVDA",
    form_types=["10-K"],
    metrics=["TotalRevenue", "NetIncome", "OperatingIncome"],
    start_date="2020-01-01",
    end_date="2026-01-01",
)

# List all available templates
print(client.query_cache.keys())

Template catalogue:

Template What it does
01_fundamentals_by_ticker Core financials for a single ticker
02_peer_group_comparison Side-by-side multi-ticker comparison
04_altman_z_score_inputs Inputs for Altman Z-score distress model
05_piotroski_f_score_inputs Inputs for Piotroski F-score quality screen
07_trailing_twelve_months_ttm TTM revenue/income from quarterly filings
09_free_cash_flow FCF = operating cash flow − capex
15_earnings_quality_accruals_anomaly Accruals-based earnings quality signal
18_true_point_in_time_backtest_engine Full PIT engine keyed on knowledge_at
19_cross_sector_dupont_roe_breakdown DuPont decomposition across sectors
06_restatement_negative_revision_alpha Short signal from negative restatements

See all 40+ templates in valuein_sdk/queries/.


Data schema

Tables

Table Primary key Description
entity cik Legal company entity — name, sector, SIC code, status
security id Ticker symbols with SCD Type 2 date ranges (valid_from, valid_to, is_active)
filing accession_id SEC filing metadata — form type, filing_date, report_date
fact (entity_id, accession_id, concept, period_end, unit) Every financial fact from every filing
taxonomy_guide standard_concept Human-readable definitions for 150+ standard concepts
concept_mapping xbrl_tag Maps 15,000+ raw XBRL tags → standard_concept
index_membership Historical S&P 500 and index constituent records

Key joins

security.entity_id  →  entity.cik
filing.entity_id    →  entity.cik
fact.entity_id      →  entity.cik
fact.accession_id   →  filing.accession_id

The fact table — critical columns

Column Description
standard_concept Canonical metric name — use this, not concept
numeric_value The number (USD, shares, ratio, etc.)
unit USD, shares, pure, etc.
fiscal_year / fiscal_period e.g. 2024 / FY, Q1, Q2
period_start / period_end Reporting period boundaries
filing_date When the SEC received the filing (use for PIT)
knowledge_at Millisecond-precision public knowledge timestamp
statement_type income, balance, cashflow, notes

Common standard_concept values

Category Concepts
Income TotalRevenue, GrossProfit, OperatingIncome, NetIncome, EPS_Diluted
Balance sheet TotalAssets, TotalLiabilities, StockholdersEquity, CurrentAssets
Cash flow OperatingCashFlow, InvestingCashFlow, FinancingCashFlow
Efficiency ResearchAndDevelopment, CostOfRevenue

API reference

client = ValueinClient(
    api_key="...",       # defaults to VALUEIN_API_KEY env var
    gateway_url="...",   # override for local dev only
    tables=["entity"],   # load specific tables (omit to load all)
)

client.query(sql)                    # DuckDB SQL → pandas DataFrame
client.get(table)                    # Download full table → pandas DataFrame
client.run_template(name, **kwargs)  # Named SQL template → pandas DataFrame
client.tables()                      # List loaded table names
client.me()                          # Token: plan, status, email
client.manifest()                    # Snapshot: last_updated, tables

Exception hierarchy

from valuein_sdk import (
    ValueinError,          # base — catch-all
    ValueinAuthError,      # 401/403 — invalid, expired, or revoked token
    ValueinPlanError,      # 403 — resource requires Full Dataset plan
    ValueinNotFoundError,  # 404 — table or path not found
    ValueinRateLimitError, # 429 — includes .retry_after (seconds)
    ValueinAPIError,       # 5xx — gateway server error
)

try:
    df = client.get("fact")
except ValueinPlanError:
    print("Upgrade to Full Dataset at valuein.biz/pricing")
except ValueinAuthError:
    print("Check your API key at valuein.biz/portal")

The transport layer retries automatically (3× with exponential backoff) on 429, 502, 503, 504.


Examples

Notebooks Open In Colab

Notebook Description Colab
quickstart.ipynb Auth, table loading, first financial query Colab
fundamental_analysis.ipynb Mag 7 revenue, income statement, concept normalisation, $10B screen Colab
pit_backtest.ipynb PIT methodology deep-dive, restatement history, correct vs incorrect patterns Colab
survivorship_bias.ipynb Bankrupt and delisted company data — scale, sector failure rates Colab

Python scripts

Script Tables used
getting_started.py entity, security
usage.py All — full SDK walkthrough
01_entity_screening.py entity, security
02_financial_analysis.py entity, security, filing, fact
03_pit_backtest.py security, filing, fact
04_survivorship_bias.py entity, security, filing, fact

Plans

Plan Coverage Price
S&P 500 Full history for S&P 500 constituents, all tables Free (register at valuein.biz)
Full Dataset All 10,000+ entities from 1990, active + inactive Subscription — valuein.biz/pricing
# Check your current plan
print(client.me())
# {'plan': 'sp500', 'status': 'active', 'email': 'you@example.com'}

Explore the Code

examples/ — Start here. Seven copy-paste scripts that cover auth, entity screening, financial analysis, PIT backtesting, and survivorship bias. Runnable in minutes.

research/ — Advanced. Full investment workflows across four domains: fundamental analysis, quantitative strategies, data engineering, and automated quality proofs.


Contributing / data issues


For research and educational purposes only. Not financial advice.

Apache-2.0 License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuein_sdk-0.5.3.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valuein_sdk-0.5.3-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file valuein_sdk-0.5.3.tar.gz.

File metadata

  • Download URL: valuein_sdk-0.5.3.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuein_sdk-0.5.3.tar.gz
Algorithm Hash digest
SHA256 70ae69d031686f9ee7af78f3fae85f4ded113f58fe9dde273e61847cadc4e7a0
MD5 0e64e7e4c52a8c2f569fc386b28bf13d
BLAKE2b-256 d722354583937f2f8d538a978f87271dd48e844bb7250a1d3a81955f4a133b31

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-0.5.3.tar.gz:

Publisher: publish.yml on valuein/quants

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valuein_sdk-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: valuein_sdk-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuein_sdk-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 01d15d027ae795e0e05e00a857e3ce703725bc79ad4fde88f64d373a382b9e56
MD5 469b85f2b9f319bb77ff29e5434760af
BLAKE2b-256 c8e944c54249499a0e3e7a14e5aed515d789bc021e2917f6c8c56646f7297e10

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-0.5.3-py3-none-any.whl:

Publisher: publish.yml on valuein/quants

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page