ArchiTXT is a tool for structuring textual data into a valid database model. It is guided by a meta-grammar and uses an iterative process of tree rewriting.
Project description
ArchiTXT: Text-to-Database Structuring Tool
ArchiTXT is a robust tool designed to convert unstructured textual data into structured formats that are ready for database storage. It automates the generation of database schemas and creates corresponding data instances, simplifying the integration of text-based information into database systems.
Working with unstructured text can be challenging when you need to store and query it in a structured database. ArchiTXT bridges this gap by transforming raw text into organized, query-friendly structures. By automating both schema generation and data instance creation, it streamlines the entire process of managing textual information in databases.
Installation
To install ArchiTXT, make sure you have Python 3.10+ and pip installed. Then, run:
pip install architxt
For the development version, you can install it directly through GIT using
pip install git+https://github.com/Neplex/ArchiTXT.git
Usage
ArchiTXT is built to work seamlessly with BRAT-annotated corpora that includes pre-labeled named entities. It also requires access to a CoreNLP server, which you can set up using the Docker configuration available in the source repository.
$ architxt --help
Usage: architxt [OPTIONS] COMMAND [ARGS]...
ArchiTXT is a tool for structuring textual data into a valid database model.
It is guided by a meta-grammar and uses an iterative process of tree rewriting.
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ run Extract a database schema form a corpus. │
│ ui Launch the web-based UI. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ architxt run --help
Usage: architxt run [OPTIONS] CORPUS_PATH
Extract a database schema form a corpus.
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * corpus_path PATH Path to the input corpus. [default: None] [required] │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --tau FLOAT The similarity threshold. [default: 0.7] │
│ --epoch INTEGER Number of iteration for tree rewriting. [default: 100] │
│ --min-support INTEGER Minimum support for tree patterns. [default: 20] │
│ --corenlp-url TEXT URL of the CoreNLP server. [default: http://localhost:9000] │
│ --gen-instances INTEGER Number of synthetic instances to generate. [default: 0] │
│ --language TEXT Language of the input corpus. [default: French] │
│ --debug --no-debug Enable debug mode for more verbose output. [default: no-debug] │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
To deploy the CoreNLP server using the source repository, you can use Docker Compose with the following command:
docker compose up -d corenlp
Sponsors
This work has received support under the JUNON Program, with financial support from Région Centre-Val de Loire (France).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file architxt-0.5.1.tar.gz.
File metadata
- Download URL: architxt-0.5.1.tar.gz
- Upload date:
- Size: 108.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
834be4cd5bfe0ccd5555edd1fd0426f1173f1c661f549227ce45c3eaad409aef
|
|
| MD5 |
f29dc61a5f5860cebaf95f2645daa57f
|
|
| BLAKE2b-256 |
e197188fd27efbe03eb8d0a06b299432676b2149140a356df53dea37e2f850ab
|
Provenance
The following attestation bundles were made for architxt-0.5.1.tar.gz:
Publisher:
python-build.yml on Neplex/ArchiTXT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
architxt-0.5.1.tar.gz -
Subject digest:
834be4cd5bfe0ccd5555edd1fd0426f1173f1c661f549227ce45c3eaad409aef - Sigstore transparency entry: 748731550
- Sigstore integration time:
-
Permalink:
Neplex/ArchiTXT@7354e378bfa61a2652ad8226734e905c0b1d061a -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/Neplex
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yml@7354e378bfa61a2652ad8226734e905c0b1d061a -
Trigger Event:
release
-
Statement type:
File details
Details for the file architxt-0.5.1-py3-none-any.whl.
File metadata
- Download URL: architxt-0.5.1-py3-none-any.whl
- Upload date:
- Size: 130.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d703d6bc3355fef3f982f8b1a8223a013ee0cf1fd73b5a5cfed3d2cb859cb3bd
|
|
| MD5 |
e61d073a0d9d1367188bf163f6f03c03
|
|
| BLAKE2b-256 |
6205696637db0104a6e74b24498a14cf012c0b745de9cdda0582a3bba9e7dbf2
|
Provenance
The following attestation bundles were made for architxt-0.5.1-py3-none-any.whl:
Publisher:
python-build.yml on Neplex/ArchiTXT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
architxt-0.5.1-py3-none-any.whl -
Subject digest:
d703d6bc3355fef3f982f8b1a8223a013ee0cf1fd73b5a5cfed3d2cb859cb3bd - Sigstore transparency entry: 748731552
- Sigstore integration time:
-
Permalink:
Neplex/ArchiTXT@7354e378bfa61a2652ad8226734e905c0b1d061a -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/Neplex
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yml@7354e378bfa61a2652ad8226734e905c0b1d061a -
Trigger Event:
release
-
Statement type: