Skip to main content

A package purpose built for simplifying document processing for LLM based application development

Project description

missing_text

A package purpose built for simplifying document processing for LLM based application development

Description

missing_text is an open-source project that purpose built for simplifying document processing for LLM based application development. It aims to make it easy to ingest documents, extract text and metadata, and prepare the data for training or inference or storage to be used in an LLM based application, so that developers can focus on building their application.

Installation

This project uses the UV package manager. To install missing_text, follow these steps:

  1. Install UV if you haven't already:

    pip install uv
    
  2. Clone the repository:

    git clone https://github.com/yourusername/missing_text.git
    cd missing_text
    
  3. Create a virtual environment and install dependencies:

    uv venv
    source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`
    uv install  # Install primary dependencies
    uv install --group dev # Install development dependencies
    

Usage

Here's a quick example of how to use missing_text:

from missing_text.hello_missing import hello_missing
result = hello_missing()
print(result)

For more detailed usage instructions, please refer to the documentation.

CLI Usage

After installing the package, you can use the CLI as follows:

# Run the hello_missing function
missing run

# Run with a custom name
missing run --name Alice

# Show the version
missing version

# Start a FastAPI server
missing fastapi

# Start a FastAPI server with custom host and port
missing fastapi --host 0.0.0.0 --port 5000

# Show help
missing --help

The FastAPI server can be configured using environment variables or command-line arguments:

  • MISSING_FAST_API_HOST: Sets the host for the FastAPI server (default: 0.0.0.0)
  • MISSING_FAST_API_PORT: Sets the port for the FastAPI server (default: 8000)

You can set these in a .env file in your project root or as system environment variables.

Command-line arguments will override environment variables:

#

The FastAPI server will have two endpoints:

  • /: Returns a welcome message
  • /hello/{name}: Returns the result of hello_missing(name)

You can access these endpoints in your browser or using tools like curl:

curl http://localhost:8000/
curl http://localhost:8000/hello/Alice

Development

To set up the development environment:

  1. Follow the installation steps above.
  2. Install development dependencies:
    uv pip install -r requirements-dev.txt
    
  3. Install pre-commit hooks:
    pre-commit install
    
  4. [Optional] Run pre-commit checks manually:
    pre-commit run --all-files
    
  5. Build the package locally:
    uv run python -m build
    
  6. Install the package in editable mode:
    uv install --editable .
    

Testing

We use pytest for automated testing. To run the tests:

pytest

If you want to run the tests with coverage, yet to be implemented:

pytest --cov=missing_text

All new features should have corresponding test cases. Tests are located in the tests/ directory.

Contributing

We welcome contributions to missing_text! Here's how you can contribute:

  1. Check the Issues page for open issues or create a new one to discuss your ideas.
  2. Fork the repository and create a new branch for your feature or bug fix.
  3. Write code and tests for your changes.
  4. Ensure all tests pass and the code adheres to the project's style guide.
  5. Submit a pull request with a clear description of your changes.

Please read our Contributing Guidelines for more details.

For any other queries, please reach out to us at maruthi@typeless.io

License

This project is licensed under the Apache License 2.0.


For more information, please visit our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

missing_text-0.0.1a5.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

missing_text-0.0.1a5-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file missing_text-0.0.1a5.tar.gz.

File metadata

  • Download URL: missing_text-0.0.1a5.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for missing_text-0.0.1a5.tar.gz
Algorithm Hash digest
SHA256 a79798b7b27725eabd6443abfad0066683ca0d2fc6a7cb4dd69db7f939cc4a7c
MD5 1bfd109003eaf7ff795994f9e6b5a609
BLAKE2b-256 f3e235843e840d92e076cb0b10fa08a5902c7fa051a0d1b52cc32f80ffc79375

See more details on using hashes here.

File details

Details for the file missing_text-0.0.1a5-py3-none-any.whl.

File metadata

File hashes

Hashes for missing_text-0.0.1a5-py3-none-any.whl
Algorithm Hash digest
SHA256 193e99a0f1d63db68c274edcb2f44dc7e65a6136067c167ebfeb76219fead4b6
MD5 57d00ff38fe889f5069b347a3d1bf75f
BLAKE2b-256 c38771bc2f5824c292c6ffcbb8c1227cf9a8a3a331b646850ac25fcaf6b56947

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page