Skip to main content

Deterministic relationship discovery for structured datasets.

Project description

smartjoin: data relationship discovery in seconds

License PyPI pre-commit Code style

Stop guessing how your tables connect - smartjoin automatically discovers relationships between structured datasets — no schema, no docs, no manual SQL detective work.

When working with unfamiliar datasets, one of the hardest problems is understanding how files relate to each other.

smartjoin helps by scanning structured datasets, identifying candidate relationships, producing explainable outputs instead of opaque guesses and giving you an explorer to inspect and review the results.

Quickstart

Installation

pip install smartjoin-py

Run

smartjoin run <path> <out_dir>

This analyzes the structured datasets in <path> and writes results to <out_dir>.

Generate test datasets

To explore how smartjoin works, you can generate synthethic test datasets:

smartjoin generate-test-datasets --output-dir <output-dir>

Explorer

In addition to the output files, smartjoin generates an interactive HTML-based explorer that helps you inspect detected relationships visually.

Limitations

smartjoin identifies candidate relationships across structured datasets. It does not guarantee semantic correctness.

Please keep in mind:

  • inferred relationships should be reviewed before being relied on downstream
  • domain-specific meaning may still require human interpretation
  • output quality depends on the quality, consistency, and structure of the input data
  • the tool is intended for structured dataset analysis, not as a general-purpose data processing platform

Currently supported input formats include: .csv, .xlsx, .json, .parquet.

Roadmap

Future development may include:

  • stronger semantic matching across columns and tables
  • optional AI-assisted reasoning and scoring
  • improved explorer and debugging capabilities
  • broader support for real-world edge cases and heterogeneous datasets

Contributing

See CONTRIBUTING.md.

License

Licensed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartjoin_py-0.1.1.tar.gz (788.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartjoin_py-0.1.1-py3-none-any.whl (88.5 kB view details)

Uploaded Python 3

File details

Details for the file smartjoin_py-0.1.1.tar.gz.

File metadata

  • Download URL: smartjoin_py-0.1.1.tar.gz
  • Upload date:
  • Size: 788.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smartjoin_py-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ad89195c6105de46c964dfd793c7b39f148c1cd4a84120fe425d1e7032a005e2
MD5 31fa6b8d1a6c3e50c27685be7974d672
BLAKE2b-256 036b87397f0041bfd09e3d1a9882007eed54689c76bb1f317ddd15d7601e5b9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartjoin_py-0.1.1.tar.gz:

Publisher: release.yml on tbrus/smartjoin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smartjoin_py-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: smartjoin_py-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 88.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smartjoin_py-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9e166c18e4b2a510f6a009f2abfc06b44b713dc46137b8f341b2b4631bf45628
MD5 2e03647d5919def04850b96c29fb6798
BLAKE2b-256 d58c5af192c534950d35511064930f349570899d1c13f92fad43e3cb03e50fe5

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartjoin_py-0.1.1-py3-none-any.whl:

Publisher: release.yml on tbrus/smartjoin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page