Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.
Project description
Valuein Quants
Point-in-Time accurate. Survivorship-bias free. 105M+ facts from 1990 to present.
US fundamental data from SEC EDGAR — 10-K, 10-Q, 8-K, 20-F and amendments — covering 10,000+ active and delisted entities. Cleaned, standardized, and queryable via SQL with zero local downloads.
What You Can Do With This Repository
| Use Case | Who | Where to Start |
|---|---|---|
| Query financial data via Python | Quants, data engineers | Quickstart |
| Run 39 pre-built financial signals | Analysts, quants | SQL Templates |
| Learn with interactive notebooks | Students, new users | Examples & Notebooks |
| Pull data into Excel | Financial analysts | Excel Integration |
| Prove data quality to stakeholders | Institutional buyers, compliance | Research & Quality Proofs |
| Read methodology and compliance docs | Due diligence, enterprise | Documentation |
| Contribute templates, examples, research | Open-source contributors | Contributing |
Quickstart
1. Install
pip install valuein-sdk
2. Get a free API token (S&P 500 coverage, no credit card required)
3. Set your key and run
export VALUEIN_API_KEY="your_token"
from valuein_sdk import ValueinClient
client = ValueinClient()
print(client)
# ValueinClient(plan='sp500', status='active', snapshot='2026-03-14', tables=6)
me = client.me()
print(me["plan"], me["email"])
Data Plans
| Plan | Entities | Coverage | Price |
|---|---|---|---|
| S&P 500 | ~605 current + historical members, full history | All 6 tables, all columns | Free — Register |
| US Core Fundamentals | 10,000+ active and delisted | Full universe 1990–present, PIT, restatements | Subscription — Buy now |
Use Case 1 — Python SDK
Install valuein-sdk, authenticate once, run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process against Parquet files on Cloudflare R2.
Ticker lookup
from valuein_sdk import ValueinClient
client = ValueinClient(tables=["entity", "security"])
df = client.query("""
SELECT e.cik, e.name, e.sector, e.status,
s.symbol, s.exchange
FROM security s
JOIN entity e ON s.entity_id = e.cik
WHERE s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)
Revenue trends
client = ValueinClient(tables=["entity", "security", "filing", "fact"])
df = client.query("""
SELECT fa.fiscal_year,
round(fa.numeric_value / 1e9, 2) AS revenue_bn
FROM fact fa
JOIN filing f ON fa.accession_id = f.accession_id
JOIN security s ON f.entity_id = s.entity_id
WHERE s.symbol = 'MSFT'
AND s.is_active = TRUE
AND fa.standard_concept = 'Revenues'
AND f.form_type = '10-K'
ORDER BY fa.fiscal_year DESC
LIMIT 10
""")
print(df)
Point-in-Time backtest (the most important pattern)
Filter by filing_date, not report_date. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — using report_date leaks 34 days of future data into your backtest.
client = ValueinClient(tables=["security", "filing", "fact"])
TRADE_DATE = "2024-01-15"
df = client.query(f"""
SELECT fa.standard_concept, fa.fiscal_year,
f.filing_date,
round(fa.numeric_value / 1e9, 2) AS value_bn
FROM fact fa
JOIN filing f ON fa.accession_id = f.accession_id
JOIN security s ON f.entity_id = s.entity_id
WHERE s.symbol = 'NVDA'
AND s.is_active = TRUE
AND fa.standard_concept IN ('Revenues', 'NetIncomeLoss')
AND f.form_type = '10-K'
AND f.filing_date <= '{TRADE_DATE}' -- PIT gate
ORDER BY f.filing_date DESC
""")
print(df)
Date columns reference
| Column | Table | Use for |
|---|---|---|
report_date / period_end |
filing / fact |
Aligning to fiscal calendar |
filing_date |
filing |
PIT backtest filter — when the SEC received the filing |
knowledge_at |
fact |
Millisecond-precision PIT for intraday signal research |
API reference
client = ValueinClient(
api_key="...", # or read from VALUEIN_API_KEY env var
gateway_url="...", # override for local dev only
tables=["entity"], # load specific tables; omit to load all
)
client.me() # dict: plan, status, email, createdAt
client.manifest() # dict: snapshot, last_updated, tables
client.health() # dict: ok, worker, env (no auth required)
client.query(sql) # DuckDB SQL → pandas DataFrame
client.get(table) # Download full table → pandas DataFrame
client.run_template(name, **kwargs) # Named SQL template → pandas DataFrame
client.tables() # List loaded table names
Error handling
from valuein_sdk import (
ValueinAuthError, # HTTP 401/403 — invalid or expired token
ValueinPlanError, # HTTP 403 — endpoint requires a higher plan
ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
ValueinAPIError, # HTTP 5xx — includes .status_code
)
try:
df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinRateLimitError as e:
print(f"Rate limited. Retry in {e.retry_after}s")
except ValueinPlanError:
print("Upgrade to Full plan for this data")
Use Case 2 — SQL Templates
39 production-ready SQL templates bundled with the SDK. Run complex financial signals in one line — no SQL required.
# DuPont decomposition for MSFT
df = client.run_template("16_dupont_decomposition", ticker="MSFT")
# Piotroski F-Score screen across S&P 500
df = client.run_template("17_piotroski_f_score", tickers="'AAPL', 'MSFT', 'NVDA'")
# Altman Z-Score (bankruptcy probability)
df = client.run_template("18_altman_z_score", ticker="TSLA")
# Trailing-twelve-months revenue
df = client.run_template("06_trailing_twelve_months_ttm", ticker="AMZN")
Template categories
| Range | Category | Examples |
|---|---|---|
| 01–04 | Data Access | Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen |
| 05–09 | Income Statement | YoY revenue growth, TTM, margin analysis, FCF, R&D intensity |
| 10–15 | Balance Sheet | Liquidity, solvency, interest coverage, cash conversion, capex ratios |
| 16–20 | Investment Scores | DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly |
| 21–26 | Valuation & Screening | Sector aggregates, peer ranking, dilution, arbitrage signals |
| 27–33 | Short Signals | Late filers, restatements, 8-K events, ghost companies |
| 34–39 | Advanced Analytics | PIT backtest engine, Z-score outliers, seasonality, XBRL audit |
See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.
Use Case 3 — Examples & Notebooks
Six standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.
Python scripts (examples/python/)
| Script | Level | What it demonstrates |
|---|---|---|
getting_started.py |
Beginner | Auth check, first query, entity counts by sector |
usage.py |
Reference | Every public SDK method demonstrated |
entity_screening.py |
Beginner | Screen by sector, SIC code, active vs inactive status |
financial_analysis.py |
Intermediate | Revenue trends, margins, concept normalization, peer comparison |
pit_backtest.py |
Intermediate | Correct PIT discipline, restatement impact, filing_date vs report_date |
survivorship_bias.py |
Intermediate | Delisted/bankrupt companies, index_membership, bias quantification |
Run any script standalone:
VALUEIN_API_KEY=your_token python examples/python/getting_started.py
Jupyter notebooks (examples/notebooks/)
| Notebook | Open in Colab |
|---|---|
| Quickstart | |
| Fundamental Analysis | |
| PIT Backtest | |
| Survivorship Bias |
Use Case 4 — Excel Integration
Pull live SEC fundamental data into Microsoft Excel via Power Query. No Python required.
Requirements: Microsoft 365 (build 16.0.17531 or later)
- Download
excel/valuein-fundamentals.xlsx - Open the workbook and enter your API token in the Connectivity Guide sheet
- Click Refresh All — data streams directly from Parquet files on Cloudflare R2
The workbook includes 8 pre-configured sheets: Income Statement, Balance Sheet, Cash Flow, Entities, Securities, Filings, Index Membership, and a Data Dictionary.
For DIY Power Query connections, the M-language source files are in excel/power-query/.
Full setup walkthrough: docs/excel-guide.md
Use Case 5 — Research & Quality Proofs
16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.
# Install research dependencies
uv sync --group research
# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py
Research modules
research/fundamental/ — Financial statement analysis workflows
- Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score
research/quantitative/ — Factor model and strategy research
- PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation
research/data_engineering/ — XBRL normalization and pipeline analysis
- Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector
research/quality_proof/ — Automated data quality validation
- Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check
See research/README.md for a full breakdown of what each module proves and the key metric it validates.
Data Schema
Six tables, 90+ columns, fully documented in docs/DATA_CATALOG.xlsx and docs/schema.json.
| Table | Primary Key | Description |
|---|---|---|
entity |
cik |
Legal company — name, sector, SIC code, fiscal year end, status |
security |
id |
Ticker symbols with SCD Type 2 date ranges (valid_from, valid_to, is_active) |
filing |
accession_id |
SEC filing metadata — form type, filing_date, report_date, accepted_at |
fact |
(entity_id, accession_id, concept, period_end, unit) |
Every financial fact from every filing, with knowledge_at PIT timestamp |
taxonomy_guide |
standard_concept |
Definitions for 150 canonical concept names |
index_membership |
— | Historical index constituent records (S&P 500 entry and exit dates) |
Key joins
security.entity_id → entity.cik
filing.entity_id → entity.cik
fact.entity_id → entity.cik
fact.accession_id → filing.accession_id
index_membership.security_id → security.id
Standard concept names
Raw XBRL tags (15,000+) are normalized to canonical standard_concept values. Use these exact strings:
| Concept | standard_concept |
|---|---|
| Revenue / Sales | 'Revenues' |
| Net Income | 'NetIncomeLoss' |
| Total Assets | 'Assets' |
| Gross Profit | 'GrossProfit' |
| Operating Income | 'OperatingIncomeLoss' |
Both the raw concept tag and the normalized standard_concept are on the fact table — no join to a separate mapping table needed.
Why Valuein
| Point-in-Time (PIT) | Every fact carries filing_date (SEC receipt) and knowledge_at (millisecond precision). Filter filing_date <= trade_date to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision. |
| Survivorship-bias free | 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record back to 1990. A backtest on survivors only is not a real backtest. |
| Standardized concepts | 15,000+ raw XBRL tags mapped to ~150 canonical standard_concept values. One concept name works across every filer, regardless of what tag they filed. |
| DuckDB SQL | In-process DuckDB with authenticated Parquet streaming. Queries run in milliseconds — no local downloads required. |
| 39 SQL templates | Production-ready queries for Altman Z-Score, DuPont, Piotroski F-Score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more. |
Documentation
| Document | Description |
|---|---|
docs/METHODOLOGY.md |
Data sourcing, PIT architecture, restatement handling, XBRL normalization logic |
docs/COMPLIANCE_AND_DDQ.md |
Data provenance, MNPI policy, PIT integrity, security, SLA summary |
docs/SLA.md |
Uptime targets, data freshness SLAs, support response times, SLA credits |
docs/excel-guide.md |
Full Excel / Power Query setup walkthrough |
docs/DATA_CATALOG.xlsx |
All columns, types, definitions, sample values |
docs/schema.json |
Machine-readable JSON schema |
CHANGELOG.md |
Full release history |
Contributing
Contributions are welcome — new SQL templates, example scripts, research modules, and documentation improvements.
git clone https://github.com/valuein/quants.git
cd quants
uv sync --group dev
uv run pytest tests/ -k "not integration" # all tests pass offline
See CONTRIBUTING.md for code standards, naming conventions, and the PR process.
For research and educational purposes only. Not financial advice.
Apache-2.0 License — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valuein_sdk-0.5.4.tar.gz.
File metadata
- Download URL: valuein_sdk-0.5.4.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e70b7a6a4658b64f5c245bac6317b30b1d3b500200bd9d01dccda1abb8b1a6b9
|
|
| MD5 |
74a026a2c91a012e03d0dd0b4ba84ad2
|
|
| BLAKE2b-256 |
13ee091a8ff6dfba5302ef5f81d3eecf341011fd345693d1dd6d585e736808a9
|
Provenance
The following attestation bundles were made for valuein_sdk-0.5.4.tar.gz:
Publisher:
publish.yml on valuein/quants
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuein_sdk-0.5.4.tar.gz -
Subject digest:
e70b7a6a4658b64f5c245bac6317b30b1d3b500200bd9d01dccda1abb8b1a6b9 - Sigstore transparency entry: 1106903552
- Sigstore integration time:
-
Permalink:
valuein/quants@a2e06ee53eb40ed6a0f225a8f60f182228745f8c -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/valuein
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a2e06ee53eb40ed6a0f225a8f60f182228745f8c -
Trigger Event:
push
-
Statement type:
File details
Details for the file valuein_sdk-0.5.4-py3-none-any.whl.
File metadata
- Download URL: valuein_sdk-0.5.4-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c9d0587bc442a06ba11156758880c624afa804dcba32c7fb9a839a0a5b0d935
|
|
| MD5 |
2570483cab701f9eb5a4a3122d8cf3b7
|
|
| BLAKE2b-256 |
b2ea64e36c8aa60e8183b9a2de7630b019ca2f17b57f60bb81cd33dbae42fb32
|
Provenance
The following attestation bundles were made for valuein_sdk-0.5.4-py3-none-any.whl:
Publisher:
publish.yml on valuein/quants
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuein_sdk-0.5.4-py3-none-any.whl -
Subject digest:
7c9d0587bc442a06ba11156758880c624afa804dcba32c7fb9a839a0a5b0d935 - Sigstore transparency entry: 1106903626
- Sigstore integration time:
-
Permalink:
valuein/quants@a2e06ee53eb40ed6a0f225a8f60f182228745f8c -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/valuein
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a2e06ee53eb40ed6a0f225a8f60f182228745f8c -
Trigger Event:
push
-
Statement type: