Skip to main content

Python library for SOLI data generation

Project description

SOLI Logo

PyPI version License: MIT Python Versions

SOLI Data Generator

SOLI Data Generator is a Python package for generating synthetic legal data using the SOLI (Standards for Open Legal Information) knowledge graph. It provides both procedural and LLM-based generation techniques to create realistic legal text and data.

Features

  • Procedural generation using templates with SOLI and Faker tags
  • LLM-based text generation using various AI models
  • Easy integration with the SOLI knowledge graph
  • Flexible and extensible architecture

Installation

You can install SOLI Data Generator using pip:

pip install soli-data-generator

Usage

Procedural Template Generation

from soli import SOLI
from soli_data_generator.procedural.template import TemplateFormatter

# Initialize the SOLI graph
soli_graph = SOLI()

# Initialize the TemplateFormatter
formatter = TemplateFormatter()

# Define a template with SOLI and Faker tags
template = """
Company: <|company|>
Industry: <|industry|>
Legal Issue: <|area_of_law|>
Date: <|date|>
Document Type: <|document_artifact|>
"""

# Format the template
formatted_text = formatter(template)
print(formatted_text)

Output:

Company: Griffith-Mahoney
Industry: Electric Power Generation, Transmission and Distribution Industry
Legal Issue: Privacy
Date: 2024-08-19
Document Type: Request to Take Judicial Notice

Multiple Values per Type

template = """
From: <|name:1|>
To: <|name:2|>, <|email:1|>, <|email:b|>
Date: <|date|>
Subject: <|company|> matter updates
"""

print(formatter(template))

Output:

From: David Henry
To: Jean Vance, obryant@example.com, landrysamuel@example.com
Date: 2024-08-31
Subject: Dorsey Ltd

LLM-based Text Generation

from alea_llm_client import VLLMModel
from soli_data_generator.llm.text import TextGenerator

# Initialize the VLLM model
model = VLLMModel()

# Initialize the TextGenerator
generator = TextGenerator(model)

# Generate text
generated_text = generator()

print(generated_text)

Output with llama3.1 8B:

Be it known that White, Johnson and Morgan is in good standing, and I, the undersigned,
hereby attest to this fact. Were I to have knowledge of any reason why the said company
should not be considered in good standing, I would bring such to the attention of the
proper authorities.

Were the company not in good standing, I would not be able to issue this certificate. Were
there any outstanding matters or issues that would prevent the company from being
considered in good standing, I would be aware of them. Were this not the case, I would not
be able to provide this certification.

Were I to have knowledge of any reason why the said company should not be considered in
good standing, I would take immediate action to rectify the situation. Were this not
possible, I would report the matter to the relevant authorities. Were the company to be
found in bad standing, I would not be able to provide this certification.

It is hereby certified that White, Johnson and Morgan is in good standing as of the date
of this certificate. Were this certification to be found to be false or misleading, I
would be subject to penalties and consequences. Were I to have any knowledge that would
prevent the company from being considered in good standing, I would be obligated to report
such to the proper authorities.

Quality of generated text obviously varies by model and generation parameters.

Examples

For more detailed examples, please check the examples/ directory in this repository.

Contributing

We welcome contributions to all SOLI libraries!

If you'd like to contribute, please follow these steps:

  1. Fork the repository
  2. Create a new branch for your feature or bug fix
  3. Make your changes and write tests if applicable
  4. Run the test suite to ensure everything is working
  5. Submit a pull request with a clear description of your changes

SOLI Python library

This library relies on the SOLI Python library for interacting with the SOLI knowledge graph. For more information about the SOLI Python library, please visit the SOLI Python library repository.

SOLI API

A public, freely-accessible API is available for the SOLI ontology.

The API is hosted at https://soli.openlegalstandard.org/.

The source code for the API is available on GitHub at https://github.com/alea-institute/soli-api.

License

The SOLI data generation library is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions about using the SOLI Python library, please open an issue on GitHub.

Learn More

To learn more about SOLI, its development, and how you can get involved, visit the SOLI website or join the SOLI community forum.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soli_data_generator-0.1.1.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

soli_data_generator-0.1.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file soli_data_generator-0.1.1.tar.gz.

File metadata

  • Download URL: soli_data_generator-0.1.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-41-generic

File hashes

Hashes for soli_data_generator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 635d74c11103dc323f996f30a748278f283a47a9b39990279d13861827a85343
MD5 3f3b6cb2b9b05430e347856cc9b76d4c
BLAKE2b-256 cbf5cdb1110cabc677963a228db77e3a64a12efa61d605b1249c87eeea61687f

See more details on using hashes here.

File details

Details for the file soli_data_generator-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for soli_data_generator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f4ed0b075cee43e011a78a344203b9369c3254e3fdb984769220b36fd2c84d8
MD5 63da84e3a1595692e90afd927714b076
BLAKE2b-256 fe2a1ffc314bbc9e820ecc438234f7864a72d44dc7772701dbb21ac81bfb2509

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page