Python library for SOLI data generation
Project description
SOLI Data Generator
SOLI Data Generator is a Python package for generating synthetic legal data using the SOLI (Standards for Open Legal Information) knowledge graph. It provides both procedural and LLM-based generation techniques to create realistic legal text and data.
Features
- Procedural generation using templates with SOLI and Faker tags
- LLM-based text generation using various AI models
- Easy integration with the SOLI knowledge graph
- Flexible and extensible architecture
Installation
You can install SOLI Data Generator using pip:
pip install soli-data-generator
Usage
Procedural Template Generation
from soli import SOLI
from soli_data_generator.procedural.template import TemplateFormatter
# Initialize the SOLI graph
soli_graph = SOLI()
# Initialize the TemplateFormatter
formatter = TemplateFormatter()
# Define a template with SOLI and Faker tags
template = """
Company: <|company|>
Industry: <|industry|>
Legal Issue: <|area_of_law|>
Date: <|date|>
Document Type: <|document_artifact|>
"""
# Format the template
formatted_text = formatter(template)
print(formatted_text)
Output:
Company: Griffith-Mahoney
Industry: Electric Power Generation, Transmission and Distribution Industry
Legal Issue: Privacy
Date: 2024-08-19
Document Type: Request to Take Judicial Notice
Multiple Values per Type
template = """
From: <|name:1|>
To: <|name:2|>, <|email:1|>, <|email:b|>
Date: <|date|>
Subject: <|company|> matter updates
"""
print(formatter(template))
Output:
From: David Henry
To: Jean Vance, obryant@example.com, landrysamuel@example.com
Date: 2024-08-31
Subject: Dorsey Ltd
LLM-based Text Generation
from alea_llm_client import VLLMModel
from soli_data_generator.llm.text import TextGenerator
# Initialize the VLLM model
model = VLLMModel()
# Initialize the TextGenerator
generator = TextGenerator(model)
# Generate text
generated_text = generator()
print(generated_text)
Output with llama3.1 8B:
Be it known that White, Johnson and Morgan is in good standing, and I, the undersigned,
hereby attest to this fact. Were I to have knowledge of any reason why the said company
should not be considered in good standing, I would bring such to the attention of the
proper authorities.
Were the company not in good standing, I would not be able to issue this certificate. Were
there any outstanding matters or issues that would prevent the company from being
considered in good standing, I would be aware of them. Were this not the case, I would not
be able to provide this certification.
Were I to have knowledge of any reason why the said company should not be considered in
good standing, I would take immediate action to rectify the situation. Were this not
possible, I would report the matter to the relevant authorities. Were the company to be
found in bad standing, I would not be able to provide this certification.
It is hereby certified that White, Johnson and Morgan is in good standing as of the date
of this certificate. Were this certification to be found to be false or misleading, I
would be subject to penalties and consequences. Were I to have any knowledge that would
prevent the company from being considered in good standing, I would be obligated to report
such to the proper authorities.
Quality of generated text obviously varies by model and generation parameters.
Examples
For more detailed examples, please check the examples/
directory in this repository.
Contributing
We welcome contributions to all SOLI libraries!
If you'd like to contribute, please follow these steps:
- Fork the repository
- Create a new branch for your feature or bug fix
- Make your changes and write tests if applicable
- Run the test suite to ensure everything is working
- Submit a pull request with a clear description of your changes
SOLI Python library
This library relies on the SOLI Python library for interacting with the SOLI knowledge graph. For more information about the SOLI Python library, please visit the SOLI Python library repository.
SOLI API
A public, freely-accessible API is available for the SOLI ontology.
The API is hosted at https://soli.openlegalstandard.org/.
The source code for the API is available on GitHub at https://github.com/alea-institute/soli-api.
License
The SOLI data generation library is released under the MIT License. See the LICENSE file for details.
Support
If you encounter any issues or have questions about using the SOLI Python library, please open an issue on GitHub.
Learn More
To learn more about SOLI, its development, and how you can get involved, visit the SOLI website or join the SOLI community forum.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file soli_data_generator-0.1.2.tar.gz
.
File metadata
- Download URL: soli_data_generator-0.1.2.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-41-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e1727390e6cf0f375494567374fd30eb8f54187f4b74e719fb7edcdc2ddb8ec |
|
MD5 | 3fd64cf698f966bfb613658466e7cdbf |
|
BLAKE2b-256 | dc91aa38fcc0a4239b9118b5aab2553cb640500bec2bacf1392de207837902f7 |
File details
Details for the file soli_data_generator-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: soli_data_generator-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-41-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d740b7585eeefd9463e6801f3fedd7b094ee4b4f9d8bf28862c22925f199711 |
|
MD5 | e44cc0e9119c06757e3251004f6b566e |
|
BLAKE2b-256 | 6e74b1638f610f6a320d423470995f7fd2a27935bf72831da24ec6b9c423fc8b |