Official Python SDK for the Valuein US Core Fundamentals dataset — SEC EDGAR financials via API.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

valuein

These details have not been verified by PyPI

Project links

Project description

💎 Valuein Python SDK: Frictionless Financial Data

A high-performance toolkit for querying point-in-time US fundamentals from SEC EDGAR, built for quants, analysts, and data engineers.

The Valuein SDK, is a complete infrastructure solution for consuming point-in-time accurate US Core financial fundamentals (facts) on your daily workflow. Whether you are building complex asynchronous Python pipelines, performing forensic financial research in Excel, or executing templated SQL, this library provides frictionless, zero-setup access to institutional-grade data.

The Data Engine

Powered by survivorship-bias-free data containing 12M+ filings and 108M+ facts from 10-Ks, 10-Qs, 8-Ks, 20-Fs, and amendments across 10,000+ active and delisted US companies since 1990.

New: Semantic search over filing narratives. Risk Factors, MD&A, Business description, Legal Proceedings, and Controls & Procedures sections from every 10-K / 10-Q / 20-F are extracted as XBRL TextBlocks, chunked, and published as filing_text.parquet. Two query paths:

Agent-first (MCP) — search_filing_text tool at mcp.valuein.biz answers "find companies discussing AI capex pressure" in <100ms via Cloudflare Vectorize + Workers AI.
Programmatic (SDK) — query filing_text directly in DuckDB for custom tokenization, chunk reassembly, or your own embedding pipeline. See the filing_text_by_ticker template.

Why use this toolkit?

⚡ Lightning-Fast Python SDK: Execute blazing-fast queries against remote Parquet files hosted on R2, powered entirely by DuckDB under the hood. No database setup, no massive local downloads.

📊 Excel & Power Query Ready: Not a Python developer? Fetch, transform, and analyze the data directly within Excel spreadsheets using our provided integrations.

🛠️ Plug-and-Play SQL Templates: Skip the boilerplate. Use our pre-built SQL templates to immediately extract insights, calculate intrinsic values, or model standardized financial statements.

📚 Comprehensive Context: Deep-dive documentation mapping out table schemas, primary keys, and field definitions to support your specific financial research use cases.

🚀 Why Valuein Data and SDK Library

Easy of use and intelligence.

Feature	Benefit
🕒 Point‑in‑Time Data	Eliminate look‑ahead bias in backtests
⚖️ Survivorship‑Bias Free	Includes bankrupt, delisted, and acquired firms
📊 Standardized Concepts	15k+ XBRL tags mapped to ~150 canonical financial concepts
🚀 DuckDB SQL Engine	Millisecond analytics directly in Python
☁️ Cloud Parquet Streaming	No local data downloads required
🧩 Financial Templates	Production‑ready investment signals

🧠 What You Can Do With This Repository

Use Case	Who	Where to Start
Query financial data via Python	Quants, data engineers	Quickstart
Run 39 pre-built financial signals	Analysts, quants	SQL Templates
Learn with interactive notebooks	Students, new users	Examples & Notebooks
Pull data into Excel	Financial analysts	Excel Integration
Prove data quality to stakeholders	Institutional buyers, compliance	Research & Quality Proofs
Read methodology and compliance docs	Due diligence, enterprise	Documentation
Contribute templates, examples, research	Open-source contributors	Contributing

⚡ Quickstart

1. Install Package

pip install valuein-sdk

Run the script — no token required

Install and query real data immediately, no registration needed for the SAMPLE dataset:

from valuein_sdk import ValueinClient

client = ValueinClient()                            # load all tables by default
print("Me: ", client.me())                          # dict: plan, status, email, createdAt
print("Manifest: ", client.manifest())              # dict: snapshot, last_updated, tables (cached 5 min)
print("Tables: ", client.tables())                  # list of loaded table names
print("Fact Schema: ", client.get_schema("fact"))  # dict: column → DuckDB type for any loaded table

sql_query = "SELECT COUNT(cik) FROM entity"
result_df = client.query(sql_query)             # SQL → pandas DataFrame
print(f"Query Results: {result_df}")

Add a token at any time to unlock the full dataset — no code changes needed.

🔑 2. Get Your API Token

Data Plan	Coverage	Price	Get Access
Sample	S&P 500 coverage 5-years only Active & inactive companies	Free	No registration
S&P500	S&P 500 coverage Full history Active & inactive companies	Free	Register
Pro	Full universe of US stocks US core fundamentals	$200 / month	Subscribe
Pro (Annual)	Same as Pro plan (20% Discount)	$1920 / year	Subscribe

🔐 3. Set Your API Token

echo 'VALUEIN_API_KEY="your_token"' >> .env

▶️ 4. Production-ready code

The ValueinClient handles authentication, table discovery, and local caching in a high-performance DuckDB instance.

The Recommended Way For Production is the Context Manager block/pattern because it ensures that temporary files and database connections are closed automatically, even if your script crashes.

from valuein_sdk import ValueinClient, ValueinError

try:
    # load specific tables OR omit to load all tables
    with ValueinClient() as client:
                
        f_df = client.get(table="filing")               # download full table → pandas DataFrame
        print(f"Filings: {f_df.head()}")
                
        try:
            result_df = client.run_template(            # named SQL template → pandas DataFrame
                "fundamentals_by_ticker",
                ticker="AAPL",
                start_date="2020-01-01",
                end_date="2024-01-01",
                form_types=["10-K", "10-Q"],
                metrics=["TotalRevenue", "NetIncome", "OperatingCashFlow"],  # required — standard_concept filter
            )
            print(f"Template Results: {result_df}")
        except ValueinError as e:
            print(f"Catch all error types: {e}")       
except Exception as e:
    print(f"Initialization error: {e}")

🗂️ Data Schema

Find more information regarding all tables, and it fields in docs/schema.json.

Table	Description	Records
references	Start here. Flat join of entity + security + index_membership. One row per security. Boolean flags (`is_sp500`) replace three joins.	~7K
entity	Company metadata	19K+
security	Ticker history (SCD Type 2)	7K+
filing	Filing metadata	12M+
fact	Financial statement facts	108M+
valuation	Two-stage DCF + DDM intrinsic value per entity per period	19K+
taxonomy_guide	2026 US GAAP Taxonomy Guide	11,966
index_membership	Historical index membership with start/end dates	8K+
filing_text 🆕	10-K / 10-Q / 20-F narrative chunks (Risk Factors, MD&A, …) extracted from iXBRL TextBlocks. Source of the Vectorize index powering `search_filing_text` on MCP. PIT-correct via `knowledge_at`. See Semantic Search.	growing

🔗 Key Joins

references.cik                  →  entity.cik  (references is the fast entry point)
security.entity_id              →  entity.cik
filing.entity_id                →  entity.cik
fact.entity_id                  →  entity.cik
fact.accession_id               →  filing.accession_id
index_membership.security_id    →  security.id

⚡ DuckDB Query Patterns

Three patterns that eliminate redundant joins and scans on every cross-company query:

1. references replaces entity + security + index_membership

-- Filter S&P 500 tech companies — zero joins
SELECT symbol, name, sector
FROM   references
WHERE  is_sp500 = TRUE AND sector ILIKE '%technology%' AND is_active = TRUE

2. LATERAL for the latest filing per company

JOIN LATERAL (
  SELECT accession_id, filing_date
  FROM   filing
  WHERE  entity_id = r.cik AND form_type = '10-K'
  ORDER  BY filing_date DESC
  LIMIT  1
) f ON true

3. Pivot multiple concepts in one fact scan

-- Debt + equity in one pass — no self-join
SELECT
  MAX(CASE WHEN standard_concept = 'LongTermDebt'       THEN numeric_value END) AS debt,
  MAX(CASE WHEN standard_concept = 'StockholdersEquity' THEN numeric_value END) AS equity
FROM fact WHERE standard_concept IN ('LongTermDebt', 'StockholdersEquity')
GROUP BY accession_id

For quarterly cash flow metrics, use COALESCE(derived_quarterly_value, numeric_value) — Q2/Q3 10-Qs report YTD; this column isolates the single quarter.

See valuein_sdk/queries/SQL_CHEATSHEET.md for 8 complete patterns including FCF screens, PIT backtesting, and restatement auditing.

🏷️ Standard Concept Names

[!Note] Raw XBRL tags (11,966 unique) are normalized to canonical standard_concept values. We standardized the most used 3,200 concepts from the US GAAP Taxonomy Code which allows to categorize 95% of all facts, the rest has the 'Other' category.

Both the raw concept tag (xbrl_tag) and the normalized standard_concept are on the fact table — no join to a separate mapping table needed.

📅 Date Columns Reference

Column	Table	Use for
`report_date` / `period_end`	`filing` / `fact`	Aligning to fiscal calendar
`filing_date`	`filing`	PIT backtest filter — when the SEC received the filing
`knowledge_at`	`fact`	Millisecond-precision PIT for intraday signal research

🧩 Template Categories

Range	Category	Examples
01–04	Data Access	Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen
05–09	Income Statement	YoY revenue growth, TTM, margin analysis, FCF, R&D intensity
10–15	Balance Sheet	Liquidity, solvency, interest coverage, cash conversion, capex ratios
16–20	Investment Scores	DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly
21–26	Valuation & Screening	Sector aggregates, peer ranking, dilution, arbitrage signals
27–33	Short Signals	Late filers, restatements, 8-K events, ghost companies
34–39	Advanced Analytics	PIT backtest engine, Z-score outliers, seasonality, XBRL audit

See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.

🔎 Semantic Search — table `filing_text`

filing_text is a chunk-level table of narrative text pulled from every 10-K / 10-Q / 20-F's iXBRL TextBlock concepts: Risk Factors, MD&A, Business, Legal Proceedings, Controls & Procedures. Each chunk is ≤ 1,800 characters and reassembles into the full section via (accession_id, section, chunk_no).

Two ways to use it:

1. Semantic search (via MCP, no embedding work required). The Valuein MCP server at mcp.valuein.biz exposes a search_filing_text tool that:

embeds your query with @cf/baai/bge-base-en-v1.5 (Cloudflare Workers AI)
queries a Vectorize index populated from filing_text.parquet
returns top-K passages with ticker, accession, section, and preview

# Natural language from any MCP client (Claude, Cursor, agents)
> Which S&P 500 companies discussed AI capex pressure in their most recent 10-K?

No SDK code needed — the agent calls the MCP tool directly.

2. Raw chunks (via SDK) for your own retrieval pipeline.

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["references", "filing_text"])

# Pull every Risk Factors chunk for Apple since 2020
df = client.run_template(
    "filing_text_by_ticker",
    ticker="AAPL",
    section="risk_factors",
    start_date="2020-01-01",
)

# Reassemble the full section
full_text = " ".join(
    df[df["chunk_no"].astype(int).between(0, int(df["total_chunks"].iloc[0]) - 1)]
      .sort_values("chunk_no")["text"]
      .tolist()
)
print(full_text)

Typed wrappers are available via FilingTextRow in valuein_sdk.models.

Coverage: Every 10-K / 10-Q / 20-F filed since our RSS sentinel went live, plus backfill for 2019-present. Growth is real-time as new filings are accepted by SEC EDGAR — typically within 60 seconds of acceptance.

PIT contract: filing_text.knowledge_at equals the filing's SEC acceptedDateTime, same as fact.knowledge_at. Apply your as_of cutoff to filing_text the same way you do for fact — the SDK's client-side PIT filter handles it automatically.

📚 Documentation

Document	Description	Format
`docs/METHODOLOGY.md`	Data sourcing, PIT architecture, restatement handling, XBRL normalization logic	Markdown
`docs/COMPLIANCE_AND_DDQ.md`	Data provenance, MNPI policy, PIT integrity, security, SLA summary	Markdown
`docs/SLA.md`	Uptime targets, data freshness SLAs, support response times, SLA credits	Markdown
`docs/excel-guide.md`	Full Excel / Power Query setup walkthrough	Markdown
`docs/DATA_CATALOG.xlsx`	All columns, types, definitions, sample values	Excel
`docs/schema.json`	Machine-readable JSON schema	JSON

🐍 Python Examples

Standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.*

Ticker lookup example

Run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process.

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security"])
# This client only fetch these 2 tables, making it faster!

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)

You are now querying SEC financial statements directly from the cloud.

Python scripts (`examples/python/`)

Script	Level	What it demonstrates
`getting_started.py`	Beginner	Auth check, first query, entity counts by sector
`usage.py`	Reference	Every public SDK method demonstrated
`entity_screening.py`	Beginner	Screen by sector, SIC code, active vs inactive status
`financial_analysis.py`	Intermediate	Revenue trends, margins, concept normalization, peer comparison
`pit_backtest.py`	Intermediate	Correct PIT discipline, restatement impact, `filing_date` vs `report_date`
`survivorship_bias.py`	Intermediate	Delisted/bankrupt companies, index_membership, bias quantification

Jupyter notebooks (`examples/notebooks/`)

Notebook	Open in Colab
Quickstart
Fundamental Analysis
PIT Backtest
Survivorship Bias

🛡️ Error Handling

from valuein_sdk import (
    ValueinAuthError,      # HTTP 401/403 — invalid or expired token
    ValueinPlanError,      # HTTP 403 — endpoint requires a higher plan
    ValueinNotFoundError,  # HTTP 404 — no table found 
    ValueinRateLimitError, # HTTP 429 — includes .retry_after (seconds)
    ValueinAPIError,       # HTTP 5xx — includes .status_code
    ValueinClient
)

client = None
try:
    client = ValueinClient()
    df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinAuthError:
    print("Check your VALUEIN_API_KEY. It might be expired or invalid.")
except ValueinPlanError:
    print("This requires a higher-tier plan. Upgrade at valuein.biz.")
except ValueinRateLimitError as e:
    print(f"Slow down! Retry allowed in {e.retry_after}s.")
except ValueinNotFoundError as e:
    print(f"That table or endpoint doesn't exist: {e}")
except ConnectionError as e:
    print(f"Physical network issue: {e}")
except ValueinAPIError as e:
    print(f"The Gateway is having a bad day (Status {e.status_code}).")
except Exception as e:
    print(f"Non-SDK error (Python/Logic): {e}")
finally:
    # Always close manually if not using a context manager and if a client was created.
    if client is not None:
        client.close()

🔬 Research & Quality Proofs

16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.

# Install research dependencies
uv sync --group research

# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py

📊 Excel Integration

Stream live SEC fundamental data directly into your spreadsheets. No Python, or complex scripts, just pure data power.

⚡ Quick Start

Get up and running in less than 60 seconds.

Download Template: Get the valuein-fundamentals.xlsx workbook.
Authorize: Open the workbook and enter your API token in the Connectivity Guide sheet.
Sync Data: Click Data > Refresh All.

[!TIP] Data streams directly from Parquet files on Cloudflare R2, ensuring high-speed transfers and minimal local overhead.

🛠 Requirements

Microsoft 365 (Build 16.0.17531 or later)
Active API Token

📂 Pre-Configured Sheets

The workbook includes 8 high-performance sheets ready for analysis:

Financials: Income Statement, Balance Sheet, Cash Flow
Metadata: Entities, Securities, Filings
Reference: Index Membership, Data Dictionary

🧑‍💻 Developer Customization

For those who prefer a "Do It Yourself" approach, the raw M-language source files for Power Query are available in the excel/power-query/ directory. You can use these to build custom connections in your existing workbooks.

📖 Need more help? View the Full Setup Walkthrough.

📊 Research Modules

research/fundamental/ — Financial statement analysis workflows

Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score

research/quantitative/ — Factor model and strategy research

PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation

research/data_engineering/ — XBRL normalization and pipeline analysis

Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector

research/quality_proof/ — Automated data quality validation

Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check

See research/README.md for a full breakdown of what each module proves and the key metric it validates.

🤝 Contributing

We welcome contributions including SQL templates, notebooks, scripts, research modules, and documentation improvements.

See CONTRIBUTING.md for code standards, naming conventions, and the PR process.

📄 License

Apache-2.0 License — see LICENSE.

Disclosure: This repository is for research and educational purposes only and does not constitute financial advice.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

valuein

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.4.1

May 6, 2026

2.4.0

Apr 30, 2026

2.3.0

Apr 30, 2026

2.2.1

Apr 28, 2026

This version

2.1.0

Apr 25, 2026

2.0.3

Apr 10, 2026

2.0.0

Apr 10, 2026

0.9.0

Apr 9, 2026

0.8.0

Apr 9, 2026

0.7.7

Apr 3, 2026

0.7.6

Apr 3, 2026

0.7.5

Apr 2, 2026

0.7.4

Apr 2, 2026

0.7.3

Apr 2, 2026

0.7.2

Apr 2, 2026

0.7.1

Apr 2, 2026

0.7.0

Apr 2, 2026

0.6.4

Apr 1, 2026

0.6.3

Apr 1, 2026

0.6.2

Apr 1, 2026

0.6.1

Apr 1, 2026

0.6.0

Mar 26, 2026

0.5.8

Mar 17, 2026

0.5.6

Mar 15, 2026

0.5.5

Mar 15, 2026

0.5.4

Mar 15, 2026

0.5.3

Mar 13, 2026

0.5.2

Mar 13, 2026

0.5.0

Mar 12, 2026

0.4.1

Mar 12, 2026

0.3.4

Mar 12, 2026

0.3.3

Mar 12, 2026

0.3.2

Mar 12, 2026

0.3.1

Mar 12, 2026

0.2.2

Mar 11, 2026

0.2.0

Mar 10, 2026

0.1.1

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuein_sdk-2.1.0.tar.gz (63.6 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

valuein_sdk-2.1.0-py3-none-any.whl (89.8 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file valuein_sdk-2.1.0.tar.gz.

File metadata

Download URL: valuein_sdk-2.1.0.tar.gz
Upload date: Apr 25, 2026
Size: 63.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for valuein_sdk-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`91a3947576f9e3310241dff4d7b117d9a378c6a36013388a7e85ee349e2db13a`
MD5	`333c32359baa819cd9b6062e20d10d42`
BLAKE2b-256	`9b8d2f9fcee0934caa422f7c8ab51d574dfe3c5eb67b65075d9187931b7c4ece`

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-2.1.0.tar.gz:

Publisher: publish.yml on valuein/sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: valuein_sdk-2.1.0.tar.gz
- Subject digest: 91a3947576f9e3310241dff4d7b117d9a378c6a36013388a7e85ee349e2db13a
- Sigstore transparency entry: 1377106240
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: valuein/sdk@0f57bb2123bdf44650d0cda16d0b1c39f546212a
- Branch / Tag: refs/tags/v2.1.0
- Owner: https://github.com/valuein
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0f57bb2123bdf44650d0cda16d0b1c39f546212a
- Trigger Event: push

File details

Details for the file valuein_sdk-2.1.0-py3-none-any.whl.

File metadata

Download URL: valuein_sdk-2.1.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 89.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for valuein_sdk-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f11662550bcbbf366bd032255417695fa5560ae73a720c717d9db2658363e8dc`
MD5	`a251cba57b55d78fb4855b213c7623cd`
BLAKE2b-256	`8fe63d6a7777e30877f311311dada743ae96fdb34fb937aa9d678388d46cd234`

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuein_sdk-2.1.0-py3-none-any.whl:

Publisher: publish.yml on valuein/sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: valuein_sdk-2.1.0-py3-none-any.whl
- Subject digest: f11662550bcbbf366bd032255417695fa5560ae73a720c717d9db2658363e8dc
- Sigstore transparency entry: 1377106416
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: valuein/sdk@0f57bb2123bdf44650d0cda16d0b1c39f546212a
- Branch / Tag: refs/tags/v2.1.0
- Owner: https://github.com/valuein
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0f57bb2123bdf44650d0cda16d0b1c39f546212a
- Trigger Event: push

valuein-sdk 2.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

💎 Valuein Python SDK: Frictionless Financial Data

The Data Engine

Why use this toolkit?

🚀 Why Valuein Data and SDK Library

🧠 What You Can Do With This Repository

⚡ Quickstart

Run the script — no token required

🔑 2. Get Your API Token

🔐 3. Set Your API Token

▶️ 4. Production-ready code

🗂️ Data Schema

🔗 Key Joins

⚡ DuckDB Query Patterns

🏷️ Standard Concept Names

📅 Date Columns Reference

🧩 Template Categories

🔎 Semantic Search — table filing_text

📚 Documentation

🐍 Python Examples

Ticker lookup example

Python scripts (examples/python/)

Jupyter notebooks (examples/notebooks/)

🛡️ Error Handling

🔬 Research & Quality Proofs

📊 Excel Integration

⚡ Quick Start

🛠 Requirements

📂 Pre-Configured Sheets

🧑‍💻 Developer Customization

📊 Research Modules

🤝 Contributing

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

🔎 Semantic Search — table `filing_text`

Python scripts (`examples/python/`)

Jupyter notebooks (`examples/notebooks/`)