Spatiotemporal Index Extraction from Unstructured Text

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

moebuta

These details have not been verified by PyPI

Project links

Documentation

Project description

STIndex - Spatiotemporal Information Extraction

STIndex is a multi-dimensional information extraction system that uses LLMs to extract temporal, spatial, and custom dimensional data from unstructured text. Features end-to-end pipeline with preprocessing, extraction, and visualization.

🌐 Try the Demo Dashboard

Quick Start

Installation

pip install stindex

# Install spaCy language model (required for NER)
python -m spacy download en_core_web_sm

Basic Extraction

# Extract spatiotemporal entities
stindex extract "On March 15, 2022, a cyclone hit Broome, Western Australia."

# Use specific LLM provider
stindex extract "Text here..." --config openai  # or anthropic, hf

End-to-End Pipeline

from stindex import InputDocument, STIndexPipeline

# Create input documents (URL, file, or text)
docs = [
    InputDocument.from_url("https://example.com/article"),
    InputDocument.from_file("/path/to/document.pdf"),
    InputDocument.from_text("Your text here")
]

# Run full pipeline: preprocessing → extraction → warehouse → visualization
pipeline = STIndexPipeline(
    dimension_config="dimensions",
    output_dir="data/output",
    enable_warehouse=True,  # NEW in v0.6.0: Load data into warehouse
    warehouse_config="warehouse"
)
results = pipeline.run_pipeline(docs, load_to_warehouse=True)
# Automatically generates zip archive: data/visualizations/stindex_report_{timestamp}.zip
# Contains: HTML report + all plots, maps, and source files

Python API (Direct Extraction)

from stindex import DimensionalExtractor

# Initialize with default config (cfg/extract.yml)
extractor = DimensionalExtractor()

# Or specify a config
extractor = DimensionalExtractor(config_path="openai")

# Extract entities
result = extractor.extract("March 15, 2022 in Broome, Australia")

# Access results
print(f"Temporal: {len(result.temporal_entities)} entities")
print(f"Spatial: {len(result.spatial_entities)} entities")

# Raw LLM output available for debugging
if result.extraction_config:
    raw_output = result.extraction_config.get("raw_llm_output") if isinstance(result.extraction_config, dict) else result.extraction_config.raw_llm_output
    print(f"Raw output: {raw_output}")

Server Deployment

MS-SWIFT Server (Model Sharding with Tensor Parallelism)

Deploy a single MS-SWIFT server that uses all available GPUs via tensor parallelism:

# Deploy server (auto-detects GPUs by default)
./scripts/deploy_ms_swift.sh

# Stop server
./scripts/stop_ms_swift.sh

# Check logs
tail -f logs/hf_server.log

Configuration (cfg/hf.yml):

deployment.port: Server port (default: 8001)
deployment.model: HuggingFace model ID or local path
deployment.result_path: Directory for inference logs (default: data/output/result)
deployment.vllm.tensor_parallel_size:
- auto (default): Auto-detect all available GPUs
- Or set manually: 1, 2, 4, etc.
deployment.vllm.gpu_memory_utilization: GPU memory fraction (default: 0.7)

Output Logs:

Server logs: logs/hf_server.log
Inference logs: data/output/result/{model_name}/deploy_result/{timestamp}.jsonl

Each inference log contains:

response: Complete LLM output (including <think> tags and JSON)
infer_request: Input messages and generation config
generation_config: Sampling parameters used

Configuration

Configuration files in cfg/:

extract.yml: Main configuration (sets LLM provider)
evaluate.yml: Evaluation settings
dimensions.yml: Multi-dimensional extraction configuration
warehouse.yml: Data warehouse configuration (connection, ETL, embeddings)
openai.yml: OpenAI API settings (GPT-4)
anthropic.yml: Anthropic API settings (Claude)
hf.yml: HuggingFace/MS-SWIFT server settings
- Client config (llm): API endpoint and generation parameters
- Server config (deployment): Model deployment settings
  - result_path: Inference log directory (default: data/output/result)
  - vllm.tensor_parallel_size: GPU configuration (auto or number)

Switching Providers

Edit cfg/extract.yml:

llm:
  llm_provider: hf  # or openai, anthropic

Or specify at runtime:

extractor = DimensionalExtractor(config_path="openai")

Quick Evaluation

# Sequential mode (default)
stindex evaluate

# With specific config
stindex evaluate --llm-config openai

# Limit samples
stindex evaluate --sample-limit 10

Output Structure

Results are organized by dataset and model:

data/output/evaluations/
└── {dataset_name}-{model_name}/
    ├── eval_{timestamp}_{config}.csv         # Detailed results
    └── eval_{timestamp}_{config}.summary.json # Aggregate metrics

TODOs

Backend server implementation
Data warehouse integration

License

MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

moebuta

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

1.1.1

Dec 11, 2025

1.1.0

Dec 11, 2025

1.0.2

Nov 17, 2025

This version

1.0.1

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stindex-1.0.1.tar.gz (143.0 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stindex-1.0.1-py3-none-any.whl (174.1 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file stindex-1.0.1.tar.gz.

File metadata

Download URL: stindex-1.0.1.tar.gz
Upload date: Nov 17, 2025
Size: 143.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stindex-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5440253cef002025997856de0bb28f0b5720cabc866940a93659ac998f6c21f1`
MD5	`86b84bf19f307b945eb0530c3a1042fb`
BLAKE2b-256	`6c62381c799b08c9cf367dc6525510a645dbbfd7c20e38ad8e4675a12c55aa87`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stindex-1.0.1.tar.gz:

Publisher: publish.yml on MoeBuTa/STIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stindex-1.0.1.tar.gz
- Subject digest: 5440253cef002025997856de0bb28f0b5720cabc866940a93659ac998f6c21f1
- Sigstore transparency entry: 704345805
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: MoeBuTa/STIndex@26a4ff5d73b892155323953e73b378555c5533a6
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/MoeBuTa
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@26a4ff5d73b892155323953e73b378555c5533a6
- Trigger Event: release

File details

Details for the file stindex-1.0.1-py3-none-any.whl.

File metadata

Download URL: stindex-1.0.1-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 174.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stindex-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f0d96d7aaf525ef2d05c817c1409f235861f0071f87b518d0ee97fc786d0eaeb`
MD5	`5dca65f08326328269fb6bea73962f16`
BLAKE2b-256	`28896eeff1f4bf138a4d1b6dd6c1c20d4373d7f89c966804ca822448c0ded1d0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stindex-1.0.1-py3-none-any.whl:

Publisher: publish.yml on MoeBuTa/STIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stindex-1.0.1-py3-none-any.whl
- Subject digest: f0d96d7aaf525ef2d05c817c1409f235861f0071f87b518d0ee97fc786d0eaeb
- Sigstore transparency entry: 704345810
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: MoeBuTa/STIndex@26a4ff5d73b892155323953e73b378555c5533a6
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/MoeBuTa
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@26a4ff5d73b892155323953e73b378555c5533a6
- Trigger Event: release

stindex 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

STIndex - Spatiotemporal Information Extraction

Quick Start

Installation

Basic Extraction

End-to-End Pipeline

Python API (Direct Extraction)

Server Deployment

MS-SWIFT Server (Model Sharding with Tensor Parallelism)

Configuration

Switching Providers

Quick Evaluation

Output Structure

TODOs

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance