Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.
Project description
Valuein US Core Fundamentals
Institutional-grade SEC financial data. Point-in-Time accurate. Survivorship-bias free.
Standardized fundamentals from SEC EDGAR (10-K, 10-Q, 8-K, 20-F). 10,000+ active and delisted entities. ~105M facts. Coverage from 1990 to present. Query with DuckDB SQL — no downloads, no local database.
Install & first query
pip install valuein-sdk
export VALUEIN_API_KEY="your_token" # get one at valuein.biz
from valuein_sdk import ValueinClient
client = ValueinClient(tables=["security", "filing", "fact"])
df = client.query("""
SELECT fa.fiscal_year, round(fa.numeric_value / 1e9, 2) AS revenue_bn
FROM fact fa
JOIN filing f ON fa.accession_id = f.accession_id
JOIN security s ON f.entity_id = s.entity_id
WHERE s.symbol = 'NVDA' AND s.is_active = TRUE
AND f.form_type = '10-K' AND fa.standard_concept = 'TotalRevenue'
AND fa.fiscal_period = 'FY'
QUALIFY row_number() OVER (PARTITION BY fa.fiscal_year ORDER BY fa.period_end DESC) = 1
ORDER BY fa.fiscal_year DESC LIMIT 5
""")
print(df)
# fiscal_year revenue_bn
# 0 2026 215.94
# 1 2025 130.50
# 2 2024 60.92
# 3 2023 26.97
# 4 2022 26.91
Why Valuein
| Point-in-Time (PIT) | Every fact has filing_date (SEC receipt date) and knowledge_at (millisecond precision). Filter filing_date <= trade_date to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision. |
| Survivorship-bias free | 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record. A strategy back-tested on survivors only is not a real back-test. |
| Standardized concepts | 15,000+ raw XBRL tags mapped to ~150 canonical standard_concept values via waterfall priority logic. One concept name works across every filer, regardless of what tag they chose. |
| DuckDB SQL | An in-process DuckDB connection with authenticated access. Your SQL queries run in milliseconds against Parquet files — no local downloads required. |
| 40+ SQL templates | Production-ready queries for Altman Z-score, DuPont decomposition, Piotroski F-score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more. |
Installation
# pip
pip install valuein-sdk
# uv (recommended)
uv add valuein-sdk
# with research extras (numpy, matplotlib, Jupyter)
pip install "valuein-sdk[research]"
Authentication — set your token in any of these ways:
# environment variable (recommended for scripts)
export VALUEIN_API_KEY="your_token"
# .env file in your project root (auto-loaded by the SDK)
echo 'VALUEIN_API_KEY=your_token' >> .env
# or pass directly
client = ValueinClient(api_key="your_token")
Get a token at valuein.biz.
Core examples
1 — Load only what you need
# Instant auth check — no tables downloaded
client = ValueinClient(tables=[])
print(client.me()) # {'plan': 'sp500', 'status': 'active', 'email': '...'}
print(client.manifest()) # {'last_updated': '...', 'tables': [...]}
# Load specific tables — much faster than loading all 7
client = ValueinClient(tables=["entity", "security"])
Always pass tables=. The fact table is large; load it only when you need financials.
2 — Ticker lookup and company metadata
client = ValueinClient(tables=["entity", "security"])
df = client.query("""
SELECT e.cik, e.name, e.sector, e.industry, e.status,
s.symbol, s.exchange
FROM security s
JOIN entity e ON s.entity_id = e.cik
WHERE s.symbol = 'AAPL' AND s.is_active = TRUE
""")
Use is_active = TRUE for current tickers. Historical and delisted companies have is_active = FALSE.
3 — Point-in-Time backtesting
The most important pattern. Always filter by filing_date, not report_date.
client = ValueinClient(tables=["security", "filing", "fact"])
TRADE_DATE = "2024-01-15"
# CORRECT — only data the market had on the trade date
df = client.query(f"""
SELECT fa.standard_concept, fa.fiscal_year,
f.filing_date, round(fa.numeric_value / 1e9, 2) AS value_bn
FROM fact fa
JOIN filing f ON fa.accession_id = f.accession_id
JOIN security s ON f.entity_id = s.entity_id
WHERE s.symbol = 'NVDA'
AND s.is_active = TRUE
AND fa.standard_concept IN ('TotalRevenue', 'NetIncome')
AND f.form_type = '10-K'
AND f.filing_date <= '{TRADE_DATE}' -- ← PIT gate
ORDER BY f.filing_date DESC
""")
| Column | Use for |
|---|---|
period_end / report_date |
Aligning to fiscal calendar |
filing_date |
PIT backtest filter — when the SEC received the filing |
knowledge_at |
Millisecond-precision PIT for intraday signal research |
Filtering by
report_dateinstead offiling_dateleaks future data into your backtest. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — usingreport_dategives you 34 days of look-ahead.
4 — Annual financial metrics (deduplication)
10-K filings include comparative prior-year figures. Use QUALIFY to keep only the current fiscal year row.
client = ValueinClient(tables=["security", "filing", "fact"])
df = client.query("""
SELECT
s.symbol,
fa.fiscal_year,
round(fa.numeric_value / 1e9, 2) AS revenue_bn
FROM fact fa
JOIN filing f ON fa.accession_id = f.accession_id
JOIN security s ON f.entity_id = s.entity_id
WHERE s.symbol IN ('NVDA', 'TSLA', 'AMZN')
AND s.is_active = TRUE
AND f.form_type = '10-K'
AND fa.standard_concept = 'TotalRevenue'
AND fa.fiscal_period = 'FY'
QUALIFY row_number() OVER (
PARTITION BY s.symbol, fa.fiscal_year
ORDER BY fa.period_end DESC -- latest period = current year
) = 1
ORDER BY fa.fiscal_year DESC, s.symbol
""")
5 — Cross-sectional screening
client = ValueinClient(tables=["entity", "security", "filing", "fact"])
# All companies with revenue > $10B in their most recent 10-K
df = client.query("""
WITH latest AS (
SELECT entity_id, accession_id,
row_number() OVER (PARTITION BY entity_id ORDER BY filing_date DESC) AS rn
FROM filing WHERE form_type = '10-K'
)
SELECT e.name, e.sector::VARCHAR AS sector,
round(fa.numeric_value / 1e9, 1) AS revenue_bn
FROM latest l
JOIN entity e ON l.entity_id = e.cik
JOIN fact fa ON fa.accession_id = l.accession_id
WHERE l.rn = 1
AND fa.standard_concept = 'TotalRevenue'
AND fa.numeric_value > 10e9
QUALIFY row_number() OVER (PARTITION BY e.cik ORDER BY fa.period_end DESC) = 1
ORDER BY revenue_bn DESC
LIMIT 20
""")
6 — Survivorship-bias free universe
Most providers delete companies that went bankrupt or were delisted. We keep every one.
client = ValueinClient(tables=["entity"])
# Count active vs inactive — see the full picture
df = client.query("""
SELECT status, count(*) AS companies,
round(100.0 * count(*) / sum(count(*)) OVER (), 1) AS pct
FROM entity GROUP BY status ORDER BY companies DESC
""")
# Financial data for inactive companies works exactly the same
# as for active ones — just remove the is_active filter
df_inactive = client.query("""
SELECT e.name, e.status, fa.fiscal_year,
round(fa.numeric_value / 1e9, 2) AS revenue_bn
FROM fact fa
JOIN filing f ON fa.accession_id = f.accession_id
JOIN entity e ON f.entity_id = e.cik
WHERE e.status IN ('INACTIVE', 'DELISTED')
AND fa.standard_concept = 'TotalRevenue'
AND fa.fiscal_period = 'FY'
QUALIFY row_number() OVER (PARTITION BY e.cik, fa.fiscal_year ORDER BY fa.period_end DESC) = 1
ORDER BY e.name, fa.fiscal_year DESC
LIMIT 30
""")
7 — Pre-built SQL templates
40+ production-ready templates. Run by name, pass parameters.
client = ValueinClient(tables=["entity", "security", "filing", "fact"])
df = client.run_template(
"01_fundamentals_by_ticker",
ticker="NVDA",
form_types=["10-K"],
metrics=["TotalRevenue", "NetIncome", "OperatingIncome"],
start_date="2020-01-01",
end_date="2026-01-01",
)
# List all available templates
print(client.query_cache.keys())
Template catalogue:
| Template | What it does |
|---|---|
01_fundamentals_by_ticker |
Core financials for a single ticker |
02_peer_group_comparison |
Side-by-side multi-ticker comparison |
04_altman_z_score_inputs |
Inputs for Altman Z-score distress model |
05_piotroski_f_score_inputs |
Inputs for Piotroski F-score quality screen |
07_trailing_twelve_months_ttm |
TTM revenue/income from quarterly filings |
09_free_cash_flow |
FCF = operating cash flow − capex |
15_earnings_quality_accruals_anomaly |
Accruals-based earnings quality signal |
18_true_point_in_time_backtest_engine |
Full PIT engine keyed on knowledge_at |
19_cross_sector_dupont_roe_breakdown |
DuPont decomposition across sectors |
06_restatement_negative_revision_alpha |
Short signal from negative restatements |
See all 40+ templates in valuein_sdk/queries/.
Data schema
Tables
| Table | Primary key | Description |
|---|---|---|
entity |
cik |
Legal company entity — name, sector, SIC code, status |
security |
id |
Ticker symbols with SCD Type 2 date ranges (valid_from, valid_to, is_active) |
filing |
accession_id |
SEC filing metadata — form type, filing_date, report_date |
fact |
(entity_id, accession_id, concept, period_end, unit) |
Every financial fact from every filing |
taxonomy_guide |
standard_concept |
Human-readable definitions for 150+ standard concepts |
concept_mapping |
xbrl_tag |
Maps 15,000+ raw XBRL tags → standard_concept |
index_membership |
— | Historical S&P 500 and index constituent records |
Key joins
security.entity_id → entity.cik
filing.entity_id → entity.cik
fact.entity_id → entity.cik
fact.accession_id → filing.accession_id
The fact table — critical columns
| Column | Description |
|---|---|
standard_concept |
Canonical metric name — use this, not concept |
numeric_value |
The number (USD, shares, ratio, etc.) |
unit |
USD, shares, pure, etc. |
fiscal_year / fiscal_period |
e.g. 2024 / FY, Q1, Q2 |
period_start / period_end |
Reporting period boundaries |
filing_date |
When the SEC received the filing (use for PIT) |
knowledge_at |
Millisecond-precision public knowledge timestamp |
statement_type |
income, balance, cashflow, notes |
Common standard_concept values
| Category | Concepts |
|---|---|
| Income | TotalRevenue, GrossProfit, OperatingIncome, NetIncome, EPS_Diluted |
| Balance sheet | TotalAssets, TotalLiabilities, StockholdersEquity, CurrentAssets |
| Cash flow | OperatingCashFlow, InvestingCashFlow, FinancingCashFlow |
| Efficiency | ResearchAndDevelopment, CostOfRevenue |
API reference
client = ValueinClient(
api_key="...", # defaults to VALUEIN_API_KEY env var
gateway_url="...", # override for local dev only
tables=["entity"], # load specific tables (omit to load all)
)
client.query(sql) # DuckDB SQL → pandas DataFrame
client.get(table) # Download full table → pandas DataFrame
client.run_template(name, **kwargs) # Named SQL template → pandas DataFrame
client.tables() # List loaded table names
client.me() # Token: plan, status, email
client.manifest() # Snapshot: last_updated, tables
Exception hierarchy
from valuein_sdk import (
ValueinError, # base — catch-all
ValueinAuthError, # 401/403 — invalid, expired, or revoked token
ValueinPlanError, # 403 — resource requires Full Dataset plan
ValueinNotFoundError, # 404 — table or path not found
ValueinRateLimitError, # 429 — includes .retry_after (seconds)
ValueinAPIError, # 5xx — gateway server error
)
try:
df = client.get("fact")
except ValueinPlanError:
print("Upgrade to Full Dataset at valuein.biz/pricing")
except ValueinAuthError:
print("Check your API key at valuein.biz/portal")
The transport layer retries automatically (3× with exponential backoff) on 429, 502, 503, 504.
Examples
Notebooks 
| Notebook | Description | Colab |
|---|---|---|
quickstart.ipynb |
Auth, table loading, first financial query | |
fundamental_analysis.ipynb |
Mag 7 revenue, income statement, concept normalisation, $10B screen | |
pit_backtest.ipynb |
PIT methodology deep-dive, restatement history, correct vs incorrect patterns | |
survivorship_bias.ipynb |
Bankrupt and delisted company data — scale, sector failure rates |
Python scripts
| Script | Tables used |
|---|---|
getting_started.py |
entity, security |
usage.py |
All — full SDK walkthrough |
01_entity_screening.py |
entity, security |
02_financial_analysis.py |
entity, security, filing, fact |
03_pit_backtest.py |
security, filing, fact |
04_survivorship_bias.py |
entity, security, filing, fact |
Plans
| Plan | Coverage | Price |
|---|---|---|
| S&P 500 | Full history for S&P 500 constituents, all tables | Free (register at valuein.biz) |
| Full Dataset | All 10,000+ entities from 1990, active + inactive | Subscription — valuein.biz/pricing |
# Check your current plan
print(client.me())
# {'plan': 'sp500', 'status': 'active', 'email': 'you@example.com'}
Contributing / data issues
- Data quality report: open a ticket
- SDK bug: open an issue
- Questions: support@valuein.biz
For research and educational purposes only. Not financial advice.
Apache-2.0 License — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valuein_sdk-0.4.1.tar.gz.
File metadata
- Download URL: valuein_sdk-0.4.1.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8e32cb7c16cecf1e0c990b51b155734a7431fce98ff7864793d4f9919602b17
|
|
| MD5 |
6f6af97da78dd7e7ed166c4c02952755
|
|
| BLAKE2b-256 |
7dcc120a94b13e247fd98bf1455fe761a851b3986cde2aea103f9389e5946316
|
Provenance
The following attestation bundles were made for valuein_sdk-0.4.1.tar.gz:
Publisher:
publish.yml on valuein/quants
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuein_sdk-0.4.1.tar.gz -
Subject digest:
a8e32cb7c16cecf1e0c990b51b155734a7431fce98ff7864793d4f9919602b17 - Sigstore transparency entry: 1092435562
- Sigstore integration time:
-
Permalink:
valuein/quants@1f8c948301eb72696e149159b5c5937c5a9e2060 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/valuein
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1f8c948301eb72696e149159b5c5937c5a9e2060 -
Trigger Event:
push
-
Statement type:
File details
Details for the file valuein_sdk-0.4.1-py3-none-any.whl.
File metadata
- Download URL: valuein_sdk-0.4.1-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff75608ce8977f3b9a36a4139f7ba67fce68b5307bda328b9f83d3cb2b1fabfa
|
|
| MD5 |
9abde88f0991fe17ae1c358746d4d99a
|
|
| BLAKE2b-256 |
aa2bde8b1224c5c17e477f5d39a66eb29e98e9ae33e77f407461adb281b5aeb1
|
Provenance
The following attestation bundles were made for valuein_sdk-0.4.1-py3-none-any.whl:
Publisher:
publish.yml on valuein/quants
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuein_sdk-0.4.1-py3-none-any.whl -
Subject digest:
ff75608ce8977f3b9a36a4139f7ba67fce68b5307bda328b9f83d3cb2b1fabfa - Sigstore transparency entry: 1092435563
- Sigstore integration time:
-
Permalink:
valuein/quants@1f8c948301eb72696e149159b5c5937c5a9e2060 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/valuein
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1f8c948301eb72696e149159b5c5937c5a9e2060 -
Trigger Event:
push
-
Statement type: