Skip to main content

Deterministic relationship discovery for structured datasets.

Project description

smartjoin: data relationship discovery in seconds

License PyPI pre-commit Code style

Stop guessing how your tables connect - smartjoin automatically discovers relationships between structured datasets — no schema, no docs, no manual SQL detective work.

When working with unfamiliar datasets, one of the hardest problems is understanding how files relate to each other.

smartjoin helps by scanning structured datasets, identifying candidate relationships, producing explainable outputs instead of opaque guesses and giving you an explorer to inspect and review the results.

Quickstart

Installation

pip install smartjoin-py

Run

smartjoin run <path> <out_dir>

This analyzes the structured datasets in <path> and writes results to <out_dir>.

Generate test datasets

To explore how smartjoin works, you can generate synthethic test datasets:

smartjoin generate-test-datasets --output-dir <output-dir>

Explorer

In addition to the output files, smartjoin generates an interactive HTML-based explorer that helps you inspect detected relationships visually.

Limitations

smartjoin identifies candidate relationships across structured datasets. It does not guarantee semantic correctness.

Please keep in mind:

  • inferred relationships should be reviewed before being relied on downstream
  • domain-specific meaning may still require human interpretation
  • output quality depends on the quality, consistency, and structure of the input data
  • the tool is intended for structured dataset analysis, not as a general-purpose data processing platform

Currently supported input formats include: .csv, .xlsx, .json, .parquet.

Roadmap

Future development may include:

  • stronger semantic matching across columns and tables
  • optional AI-assisted reasoning and scoring
  • improved explorer and debugging capabilities
  • broader support for real-world edge cases and heterogeneous datasets

Contributing

See CONTRIBUTING.md.

License

Licensed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartjoin_py-0.1.0.tar.gz (788.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartjoin_py-0.1.0-py3-none-any.whl (58.7 kB view details)

Uploaded Python 3

File details

Details for the file smartjoin_py-0.1.0.tar.gz.

File metadata

  • Download URL: smartjoin_py-0.1.0.tar.gz
  • Upload date:
  • Size: 788.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smartjoin_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c7fe7e9bd8b0f0533152b166c4bf74360fc2198799f3695587b47828b1cb6f2e
MD5 333e4b9d562bb7aad7c02b0d5b5f07a1
BLAKE2b-256 3c2a71ce81c3bc9e9b9576be44735543941f80c04b99b960bd1fbe9b0779d94d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartjoin_py-0.1.0.tar.gz:

Publisher: release.yml on tbrus/smartjoin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smartjoin_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smartjoin_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 58.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smartjoin_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15af88797643a7ecfe777f1650ccc82301a1e6a00e789629d0a74d4c147c441f
MD5 5eb9631039dfce5d75b227dbca108ead
BLAKE2b-256 20739a3543d6fcfd3026b22c8bb1b0a62b5028df3bb0e8a0e76ae6403b63047c

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartjoin_py-0.1.0-py3-none-any.whl:

Publisher: release.yml on tbrus/smartjoin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page