Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.
Project description
💎 Valuein Python SDK: Frictionless Financial Data
A high-performance toolkit for querying point-in-time US fundamentals from SEC EDGAR, built for quants, analysts, and data engineers.
The Valuein SDK, is a complete infrastructure solution for consuming point-in-time accurate US Core financial fundamentals (facts) on your daily workflow. Whether you are building complex asynchronous Python pipelines, performing forensic financial research in Excel, or executing templated SQL, this library provides frictionless, zero-setup access to institutional-grade data.
The Data Engine
Powered by survivorship-bias-free data containing 12M+ filings and 108M+ facts from 10-Ks, 10-Qs, 8-Ks, 20-Fs, and amendments across 10,000+ active and delisted US companies since 1990.
Why use this toolkit?
⚡ Lightning-Fast Python SDK: Execute blazing-fast queries against remote Parquet files hosted on R2, powered entirely by DuckDB under the hood. No database setup, no massive local downloads.
📊 Excel & Power Query Ready: Not a Python developer? Fetch, transform, and analyze the data directly within Excel spreadsheets using our provided integrations.
🛠️ Plug-and-Play SQL Templates: Skip the boilerplate. Use our pre-built SQL templates to immediately extract insights, calculate intrinsic values, or model standardized financial statements.
📚 Comprehensive Context: Deep-dive documentation mapping out table schemas, primary keys, and field definitions to support your specific financial research use cases.
🚀 Why Valuein Data and SDK Library
Easy of use and intelligence.
| Feature | Benefit |
|---|---|
| 🕒 Point‑in‑Time Data | Eliminate look‑ahead bias in backtests |
| ⚖️ Survivorship‑Bias Free | Includes bankrupt, delisted, and acquired firms |
| 📊 Standardized Concepts | 15k+ XBRL tags mapped to ~150 canonical financial concepts |
| 🚀 DuckDB SQL Engine | Millisecond analytics directly in Python |
| ☁️ Cloud Parquet Streaming | No local data downloads required |
| 🧩 Financial Templates | Production‑ready investment signals |
🧠 What You Can Do With This Repository
| Use Case | Who | Where to Start |
|---|---|---|
| Query financial data via Python | Quants, data engineers | Quickstart |
| Run 39 pre-built financial signals | Analysts, quants | SQL Templates |
| Learn with interactive notebooks | Students, new users | Examples & Notebooks |
| Pull data into Excel | Financial analysts | Excel Integration |
| Prove data quality to stakeholders | Institutional buyers, compliance | Research & Quality Proofs |
| Read methodology and compliance docs | Due diligence, enterprise | Documentation |
| Contribute templates, examples, research | Open-source contributors | Contributing |
⚡ Quickstart
1. Install Package
pip install valuein-sdk
or
uv add valuein-sdk
🔑 2. Get Your API Token
| Data Plan | Coverage | Price | Get Access |
|---|---|---|---|
| Free | S&P 500 coverage Full history Active & inactive companies |
Free | Register |
| Pro | Full universe of US stocks US core fundamentals Point-in-time & survivorship-bias free |
$200 / month | Subscribe |
| Pro (Annual) | Same as Pro plan | $1920 / year | Subscribe |
🔐 3. Set Your API Token
export VALUEIN_API_KEY="your_token"
or
echo 'VALUEIN_API_KEY="your_token"' >> .env
▶️ 4. Run Your First Query
Include API reference
The ValueinClient handles authentication, table discovery, and local caching in a high-performance DuckDB instance.
The Recommended Way (Context Manager)
Using a with statement ensures that temporary files and database connections are closed automatically, even if your script crashes.
from valuein_sdk import ValueinClient
try:
# load specific tables OR omit to load all tables
with ValueinClient() as client:
client.health() # dict: ok, connection status (no token required).
client.me() # dict: plan, status, email, createdAt
client.manifest() # dict: snapshot, last_updated, tables
client.tables() # List loaded table names
sql = "SELECT COUNT(cik) FROM entity" # Build SQL query
result_df = client.query(sql) # Execute it with DuckDB → pandas DataFrame
f_df = client.get(table="filing") # Download full table → pandas DataFrame
name, tickers = "01_fundamentals_by_ticker", "'AAPL','MSFT'"
df = client.run_template(name, tickers) # Named SQL template → pandas DataFrame
except Exception as e:
# Catch-all for unexpected errors (network issues, API errors, etc.)
# Or catch-them individually, more details in the error handling section at the bottom.
print(f"Unexpected error while initializing ValueinClient: {e}")
[!Note] The first time it runs it will take a minute as it's fetching all tables.
📊 Excel Integration
Stream live SEC fundamental data directly into your spreadsheets. No Python, or complex scripts, just pure data power.
⚡ Quick Start
Get up and running in less than 60 seconds.
- Download Template: Get the
valuein-fundamentals.xlsxworkbook. - Authorize: Open the workbook and enter your API token in the Connectivity Guide sheet.
- Sync Data: Click
Data > Refresh All.
[!TIP] Data streams directly from Parquet files on Cloudflare R2, ensuring high-speed transfers and minimal local overhead.
🛠 Requirements
- Microsoft 365 (Build 16.0.17531 or later)
- Active API Token
📂 Pre-Configured Sheets
The workbook includes 8 high-performance sheets ready for analysis:
- Financials: Income Statement, Balance Sheet, Cash Flow
- Metadata: Entities, Securities, Filings
- Reference: Index Membership, Data Dictionary
🧑💻 Developer Customization
For those who prefer a "Do It Yourself" approach, the raw M-language source files for Power Query are available in the excel/power-query/ directory. You can use these to build custom connections in your existing workbooks.
📖 Need more help? View the Full Setup Walkthrough.
🗂️ Data Schema
Find more information regarding all tables, and it fields in docs/schema.json.
| Table | Description | Records |
|---|---|---|
| references | Start here. Flat join of entity + security + index_membership. One row per security. Boolean flags (is_sp500) replace three joins. |
~7K |
| entity | Company metadata | 19K+ |
| security | Ticker history (SCD Type 2) | 7K+ |
| filing | Filing metadata | 12M+ |
| fact | Financial statement facts | 108M+ |
| valuation | Two-stage DCF + DDM intrinsic value per entity per period | 19K+ |
| taxonomy_guide | 2026 US GAAP Taxonomy Guide | 11,966 |
| index_membership | Historical index membership with start/end dates | 8K+ |
🔗 Key Joins
references.cik → entity.cik (references is the fast entry point)
security.entity_id → entity.cik
filing.entity_id → entity.cik
fact.entity_id → entity.cik
fact.accession_id → filing.accession_id
index_membership.security_id → security.id
⚡ DuckDB Query Patterns
Three patterns that eliminate redundant joins and scans on every cross-company query:
1. references replaces entity + security + index_membership
-- Filter S&P 500 tech companies — zero joins
SELECT symbol, name, sector
FROM references
WHERE is_sp500 = TRUE AND sector ILIKE '%technology%' AND is_active = TRUE
2. LATERAL for the latest filing per company
JOIN LATERAL (
SELECT accession_id, filing_date
FROM filing
WHERE entity_id = r.cik AND form_type = '10-K'
ORDER BY filing_date DESC
LIMIT 1
) f ON true
3. Pivot multiple concepts in one fact scan
-- Debt + equity in one pass — no self-join
SELECT
MAX(CASE WHEN standard_concept = 'LongTermDebt' THEN numeric_value END) AS debt,
MAX(CASE WHEN standard_concept = 'StockholdersEquity' THEN numeric_value END) AS equity
FROM fact WHERE standard_concept IN ('LongTermDebt', 'StockholdersEquity')
GROUP BY accession_id
For quarterly cash flow metrics, use
COALESCE(derived_quarterly_value, numeric_value)— Q2/Q3 10-Qs report YTD; this column isolates the single quarter.
See valuein_sdk/queries/SQL_CHEATSHEET.md for 8 complete patterns including FCF screens, PIT backtesting, and restatement auditing.
🏷️ Standard Concept Names
[!Note] Raw XBRL tags (11,966 unique) are normalized to canonical
standard_conceptvalues. We standardized the most used 3,200 concepts from the US GAAP Taxonomy Code which allows to categorize 95% of all facts, the rest has the 'Other' category.
Both the raw
concepttag (xbrl_tag) and the normalizedstandard_conceptare on thefacttable — no join to a separate mapping table needed.
📅 Date Columns Reference
| Column | Table | Use for |
|---|---|---|
report_date / period_end |
filing / fact |
Aligning to fiscal calendar |
filing_date |
filing |
PIT backtest filter — when the SEC received the filing |
knowledge_at |
fact |
Millisecond-precision PIT for intraday signal research |
🧩 Template Categories
| Range | Category | Examples |
|---|---|---|
| 01–04 | Data Access | Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen |
| 05–09 | Income Statement | YoY revenue growth, TTM, margin analysis, FCF, R&D intensity |
| 10–15 | Balance Sheet | Liquidity, solvency, interest coverage, cash conversion, capex ratios |
| 16–20 | Investment Scores | DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly |
| 21–26 | Valuation & Screening | Sector aggregates, peer ranking, dilution, arbitrage signals |
| 27–33 | Short Signals | Late filers, restatements, 8-K events, ghost companies |
| 34–39 | Advanced Analytics | PIT backtest engine, Z-score outliers, seasonality, XBRL audit |
See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.
📚 Documentation
| Document | Description | Format |
|---|---|---|
docs/METHODOLOGY.md |
Data sourcing, PIT architecture, restatement handling, XBRL normalization logic | Markdown |
docs/COMPLIANCE_AND_DDQ.md |
Data provenance, MNPI policy, PIT integrity, security, SLA summary | Markdown |
docs/SLA.md |
Uptime targets, data freshness SLAs, support response times, SLA credits | Markdown |
docs/excel-guide.md |
Full Excel / Power Query setup walkthrough | Markdown |
docs/DATA_CATALOG.xlsx |
All columns, types, definitions, sample values | Excel |
docs/schema.json |
Machine-readable JSON schema | JSON |
🐍 Python Examples
Standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.*
Ticker lookup example
Run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process.
from valuein_sdk import ValueinClient
client = ValueinClient(tables=["entity", "security"])
# This client only fetch these 2 tables, making it faster!
df = client.query("""
SELECT e.cik, e.name, e.sector, e.status,
s.symbol, s.exchange
FROM security s
JOIN entity e ON s.entity_id = e.cik
WHERE s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)
You are now querying SEC financial statements directly from the cloud.
Python scripts (examples/python/)
| Script | Level | What it demonstrates |
|---|---|---|
getting_started.py |
Beginner | Auth check, first query, entity counts by sector |
usage.py |
Reference | Every public SDK method demonstrated |
entity_screening.py |
Beginner | Screen by sector, SIC code, active vs inactive status |
financial_analysis.py |
Intermediate | Revenue trends, margins, concept normalization, peer comparison |
pit_backtest.py |
Intermediate | Correct PIT discipline, restatement impact, filing_date vs report_date |
survivorship_bias.py |
Intermediate | Delisted/bankrupt companies, index_membership, bias quantification |
Jupyter notebooks (examples/notebooks/)
| Notebook | Open in Colab |
|---|---|
| Quickstart | |
| Fundamental Analysis | |
| PIT Backtest | |
| Survivorship Bias |
🛡️ Error Handling
from valuein_sdk import (
ValueinAuthError, # HTTP 401/403 — invalid or expired token
ValueinPlanError, # HTTP 403 — endpoint requires a higher plan
ValueinNotFoundError, # HTTP 404 — no table found
ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
ValueinAPIError, # HTTP 5xx — includes .status_code
ValueinClient
)
client = None
try:
client = ValueinClient()
df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinAuthError:
print("Check your VALUEIN_API_KEY. It might be expired or invalid.")
except ValueinPlanError:
print("This requires a higher-tier plan. Upgrade at valuein.biz.")
except ValueinRateLimitError as e:
print(f"Slow down! Retry allowed in {e.retry_after}s.")
except ValueinNotFoundError as e:
print(f"That table or endpoint doesn't exist: {e}")
except ConnectionError as e:
print(f"Physical network issue: {e}")
except ValueinAPIError as e:
print(f"The Gateway is having a bad day (Status {e.status_code}).")
except Exception as e:
print(f"Non-SDK error (Python/Logic): {e}")
finally:
# Always close manually if not using a context manager and if a client was created.
if client is not None:
client.close()
🔬 Research & Quality Proofs
16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.
# Install research dependencies
uv sync --group research
# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py
📊 Research Modules
research/fundamental/ — Financial statement analysis workflows
- Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score
research/quantitative/ — Factor model and strategy research
- PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation
research/data_engineering/ — XBRL normalization and pipeline analysis
- Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector
research/quality_proof/ — Automated data quality validation
- Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check
See research/README.md for a full breakdown of what each module proves and the key metric it validates.
🤝 Contributing
We welcome contributions including SQL templates, notebooks, scripts, research modules, and documentation improvements.
See CONTRIBUTING.md for code standards, naming conventions, and the PR process.
📄 License
Apache-2.0 License — see LICENSE.
Disclosure: This repository is for research and educational purposes only and does not constitute financial advice.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valuein_sdk-0.6.0.tar.gz.
File metadata
- Download URL: valuein_sdk-0.6.0.tar.gz
- Upload date:
- Size: 31.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
973b98795cd58150d37f44bc46be0a41c565924aca63d1c11526a124d60d6369
|
|
| MD5 |
875c26e7404f58f30a0844206f86a492
|
|
| BLAKE2b-256 |
97dc8e5aeb0c429d2f6b3f5f45094bd184d89418075d1861eb151beecca44eb0
|
Provenance
The following attestation bundles were made for valuein_sdk-0.6.0.tar.gz:
Publisher:
publish.yml on valuein/quants
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuein_sdk-0.6.0.tar.gz -
Subject digest:
973b98795cd58150d37f44bc46be0a41c565924aca63d1c11526a124d60d6369 - Sigstore transparency entry: 1186642720
- Sigstore integration time:
-
Permalink:
valuein/quants@74a9aeac667e58cd41888f744a16658e579ac555 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/valuein
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74a9aeac667e58cd41888f744a16658e579ac555 -
Trigger Event:
push
-
Statement type:
File details
Details for the file valuein_sdk-0.6.0-py3-none-any.whl.
File metadata
- Download URL: valuein_sdk-0.6.0-py3-none-any.whl
- Upload date:
- Size: 48.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9842f27a5fc7308f17f982d6d80245ca9f1ed30ad9b674b5001307988e79a1d
|
|
| MD5 |
479aedfd984833dbf7f0ad306e940933
|
|
| BLAKE2b-256 |
d5d2f4900484c4bb0cc3deec08612bd4ee25a4b83263864ec722ba89300bc0d1
|
Provenance
The following attestation bundles were made for valuein_sdk-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on valuein/quants
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuein_sdk-0.6.0-py3-none-any.whl -
Subject digest:
f9842f27a5fc7308f17f982d6d80245ca9f1ed30ad9b674b5001307988e79a1d - Sigstore transparency entry: 1186642821
- Sigstore integration time:
-
Permalink:
valuein/quants@74a9aeac667e58cd41888f744a16658e579ac555 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/valuein
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74a9aeac667e58cd41888f744a16658e579ac555 -
Trigger Event:
push
-
Statement type: