Skip to main content

Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.

Project description

Valuein Logo

PyPI version Python 3.10+ License

Point-in-Time accurate. Survivorship-bias free. 12M+ filings with 108M+ facts from 1990 to present.

US fundamental data from SEC EDGAR — 10-K, 10-Q, 8-K, 20-F and amendments — covering 10,000+ active and delisted entities. Cleaned, standardized, and queryable via SQL with zero local downloads.


What You Can Do With This Repository

Use Case Who Where to Start
Query financial data via Python Quants, data engineers Quickstart
Run 39 pre-built financial signals Analysts, quants SQL Templates
Learn with interactive notebooks Students, new users Examples & Notebooks
Pull data into Excel Financial analysts Excel Integration
Prove data quality to stakeholders Institutional buyers, compliance Research & Quality Proofs
Read methodology and compliance docs Due diligence, enterprise Documentation
Contribute templates, examples, research Open-source contributors Contributing

Quickstart

1. Install

pip install valuein-sdk

2. Get your API token

A. Get a free API token** (S&P500 coverage, Full history, Active and Inactive companies) → Register at valuein.biz OR B. Get your API token** (Full Universe of US stocks, US Core Fundamentals, PIT and Survivorship-bias free) → $200/month$1920/year

3. Set your key and run

export VALUEIN_API_KEY="your_token"
from valuein_sdk import ValueinClient

client = ValueinClient()
me = client.me()

print(me["plan"], me["email"])
print(client.manifest())

Data Plans

Plan Entities Coverage Price
S&P500 ~605 current + historical members, full history Full history Free — Register
US Core Fundamentals 10,000+ active and delisted Full universe 1990–present, PIT, restatements Subscription — Buy now

Use Case 1 — Python SDK

Install valuein-sdk, authenticate once, run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process against Parquet files on Cloudflare R2.

Ticker lookup

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security"])

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)

Revenue trends

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security", "filing", "fact"])

df = client.query("""
    SELECT fa.fiscal_year,
           round(fa.numeric_value / 1e9, 2) AS revenue_bn
    FROM   fact     fa
    JOIN   filing   f  ON fa.accession_id = f.accession_id
    JOIN   security s  ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'MSFT'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept = 'Revenues'
      AND  f.form_type         = '10-K'
    ORDER  BY fa.fiscal_year DESC
    LIMIT  10
""")
print(df)

Point-in-Time backtest (the most important pattern)

Filter by filing_date, not report_date. Apple's Q3 2023 ended Sep 30 but was filed Nov 3 — using report_date leaks 34 days of future data into your backtest.

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["security", "filing", "fact"])

TRADE_DATE = "2024-01-15"

df = client.query(f"""
    SELECT fa.standard_concept, fa.fiscal_year,
           f.filing_date,
           round(fa.numeric_value / 1e9, 2) AS value_bn
    FROM   fact     fa
    JOIN   filing   f  ON fa.accession_id = f.accession_id
    JOIN   security s  ON f.entity_id     = s.entity_id
    WHERE  s.symbol            = 'NVDA'
      AND  s.is_active         = TRUE
      AND  fa.standard_concept IN ('Revenues', 'NetIncomeLoss')
      AND  f.form_type         = '10-K'
      AND  f.filing_date      <= '{TRADE_DATE}'   -- PIT gate
    ORDER  BY f.filing_date DESC
""")
print(df)

Date columns reference

Column Table Use for
report_date / period_end filing / fact Aligning to fiscal calendar
filing_date filing PIT backtest filter — when the SEC received the filing
knowledge_at fact Millisecond-precision PIT for intraday signal research

API reference

from valuein_sdk import ValueinClient

client = ValueinClient(
    api_key="...",       # or read from VALUEIN_API_KEY env var
    gateway_url="...",   # override for local dev only
    tables=["entity"],   # load specific tables; omit to load all
)

client.me()                           # dict: plan, status, email, createdAt
client.manifest()                     # dict: snapshot, last_updated, tables
client.health()                       # dict: ok, worker, env (no auth required)
client.query(sql)                     # DuckDB SQL → pandas DataFrame
client.get(table)                     # Download full table → pandas DataFrame
client.run_template(name, **kwargs)   # Named SQL template → pandas DataFrame
client.tables()                       # List loaded table names

Error handling

from valuein_sdk import (
    ValueinAuthError,      # HTTP 401/403 — invalid or expired token
    ValueinPlanError,      # HTTP 403 — endpoint requires a higher plan
    ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
    ValueinAPIError,       # HTTP 5xx — includes .status_code
    ValueinClient
)

try:
    client = ValueinClient()
    df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinRateLimitError as e:
    print(f"Rate limited. Retry in {e.retry_after}s")
except ValueinPlanError:
    print("Upgrade to Full plan for this data")

Use Case 2 — SQL Templates

39 production-ready SQL templates bundled with the SDK. Run complex financial signals in one line — no SQL required.

from valuein_sdk import ValueinClient

client = ValueinClient()
# DuPont decomposition for MSFT
df = client.run_template("16_dupont_analysis_inputs", ticker="MSFT")
print("Result: ", df)

# Piotroski F-Score screen across S&P500
df = client.run_template("18_piotroski_f_score", tickers="'AAPL', 'MSFT', 'NVDA'")
print("Result: ", df)

# Altman Z-Score (bankruptcy probability)
df = client.run_template("19_altman_z_score", ticker="TSLA")
print("Result: ", df)

# Trailing-twelve-months revenue
df = client.run_template("06_trailing_twelve_months_ttm", ticker="AMZN")
print("Result: ", df)

Template categories

Range Category Examples
01–04 Data Access Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen
05–09 Income Statement YoY revenue growth, TTM, margin analysis, FCF, R&D intensity
10–15 Balance Sheet Liquidity, solvency, interest coverage, cash conversion, capex ratios
16–20 Investment Scores DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly
21–26 Valuation & Screening Sector aggregates, peer ranking, dilution, arbitrage signals
27–33 Short Signals Late filers, restatements, 8-K events, ghost companies
34–39 Advanced Analytics PIT backtest engine, Z-score outliers, seasonality, XBRL audit

See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.


Use Case 3 — Examples & Notebooks

Six standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.

Python scripts (examples/python/)

Script Level What it demonstrates
getting_started.py Beginner Auth check, first query, entity counts by sector
usage.py Reference Every public SDK method demonstrated
entity_screening.py Beginner Screen by sector, SIC code, active vs inactive status
financial_analysis.py Intermediate Revenue trends, margins, concept normalization, peer comparison
pit_backtest.py Intermediate Correct PIT discipline, restatement impact, filing_date vs report_date
survivorship_bias.py Intermediate Delisted/bankrupt companies, index_membership, bias quantification

Run any script standalone:

VALUEIN_API_KEY=your_token python examples/python/getting_started.py

Jupyter notebooks (examples/notebooks/)

Notebook Open in Colab
Quickstart Open in Colab
Fundamental Analysis Open in Colab
PIT Backtest Open in Colab
Survivorship Bias Open in Colab

Use Case 4 — Excel Integration

Pull live SEC fundamental data into Microsoft Excel via Power Query. No Python required.

Requirements: Microsoft 365 (build 16.0.17531 or later)

  1. Download excel/valuein-fundamentals.xlsx
  2. Open the workbook and enter your API token in the Connectivity Guide sheet
  3. Click Refresh All — data streams directly from Parquet files on Cloudflare R2

The workbook includes 8 pre-configured sheets: Income Statement, Balance Sheet, Cash Flow, Entities, Securities, Filings, Index Membership, and a Data Dictionary.

For DIY Power Query connections, the M-language source files are in excel/power-query/.

Full setup walkthrough: docs/excel-guide.md


Use Case 5 — Research & Quality Proofs

16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.

# Install research dependencies
uv sync --group research

# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py

Research modules

research/fundamental/ — Financial statement analysis workflows

  • Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score

research/quantitative/ — Factor model and strategy research

  • PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation

research/data_engineering/ — XBRL normalization and pipeline analysis

  • Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector

research/quality_proof/ — Automated data quality validation

  • Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check

See research/README.md for a full breakdown of what each module proves and the key metric it validates.


Data Schema

Six tables, 90+ columns, fully documented in docs/DATA_CATALOG.xlsx and docs/schema.json.

Table Primary Key Description
entity cik Legal company — name, sector, SIC code, fiscal year end, status
security id Ticker symbols with SCD Type 2 date ranges (valid_from, valid_to, is_active)
filing accession_id SEC filing metadata — form type, filing_date, report_date, accepted_at
fact (entity_id, accession_id, concept, period_end, unit) Every financial fact from every filing, with knowledge_at PIT timestamp
taxonomy_guide standard_concept Definitions for 150 canonical concept names
index_membership Historical index constituent records (S&P 500 entry and exit dates)

Key joins

security.entity_id        →  entity.cik
filing.entity_id          →  entity.cik
fact.entity_id            →  entity.cik
fact.accession_id         →  filing.accession_id
index_membership.security_id  →  security.id

Standard concept names

Raw XBRL tags (15,000+) are normalized to canonical standard_concept values. Use these exact strings:

Concept standard_concept
Revenue / Sales 'Revenues'
Net Income 'NetIncomeLoss'
Total Assets 'Assets'
Gross Profit 'GrossProfit'
Operating Income 'OperatingIncomeLoss'

Both the raw concept tag and the normalized standard_concept are on the fact table — no join to a separate mapping table needed.


Why Valuein

Point-in-Time (PIT) Every fact carries filing_date (SEC receipt) and knowledge_at (millisecond precision). Filter filing_date <= trade_date to eliminate look-ahead bias. Most providers silently overwrite restated numbers — we append every revision.
Survivorship-bias free 10,000+ entities including every delisted, bankrupt, and acquired company in the SEC record back to 1990. A backtest on survivors only is not a real backtest.
Standardized concepts 15,000+ raw XBRL tags mapped to ~150 canonical standard_concept values. One concept name works across every filer, regardless of what tag they filed.
DuckDB SQL In-process DuckDB with authenticated Parquet streaming. Queries run in milliseconds — no local downloads required.
39 SQL templates Production-ready queries for Altman Z-Score, DuPont, Piotroski F-Score, TTM, FCF, sector screening, restatement signals, PIT backtest engine, and more.

Documentation

Document Description
docs/METHODOLOGY.md Data sourcing, PIT architecture, restatement handling, XBRL normalization logic
docs/COMPLIANCE_AND_DDQ.md Data provenance, MNPI policy, PIT integrity, security, SLA summary
docs/SLA.md Uptime targets, data freshness SLAs, support response times, SLA credits
docs/excel-guide.md Full Excel / Power Query setup walkthrough
docs/DATA_CATALOG.xlsx All columns, types, definitions, sample values
docs/schema.json Machine-readable JSON schema
CHANGELOG.md Full release history

Contributing

Contributions are welcome — new SQL templates, example scripts, research modules, and documentation improvements.

See CONTRIBUTING.md for code standards, naming conventions, and the PR process.

Disclosure: This repository is for research and educational purposes only. Not financial advice.

Apache-2.0 License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuein_sdk-0.5.5.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valuein_sdk-0.5.5-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file valuein_sdk-0.5.5.tar.gz.

File metadata

  • Download URL: valuein_sdk-0.5.5.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuein_sdk-0.5.5.tar.gz
Algorithm Hash digest
SHA256 662bfc0c09fdb4f51c6a68fc361b72c8d2b45b0d1a6f6ae1385811c4ee48e70b
MD5 e0435f8a2fea184b8255543983e1e852
BLAKE2b-256 9fb151e26a693d95ef49e22e6d22c158a296fdd739ad0bc02f5f80c6161c2866

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-0.5.5.tar.gz:

Publisher: publish.yml on valuein/quants

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valuein_sdk-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: valuein_sdk-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuein_sdk-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 89dd197c41371f65495ab8b0b63044170b5c79925e6bee3d5f630562c769d02f
MD5 17dee42bc67b649b7b1fdd586c5114d7
BLAKE2b-256 6a2fa247ee6e7b93343bcc22dc173db9db3574797041256fda2adccbb7ed307b

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-0.5.5-py3-none-any.whl:

Publisher: publish.yml on valuein/quants

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page