A package purpose built for simplifying document processing for LLM based application development
Project description
missing_text
A package purpose built for simplifying document processing for LLM based application development
Description
missing_text is an open-source project that purpose built for simplifying document processing for LLM based application development. It aims to make it easy to ingest documents, extract text and metadata, and prepare the data for training or inference or storage to be used in an LLM based application, so that developers can focus on building their application.
Installation
This project uses the UV package manager. To install missing_text, follow these steps:
-
Install UV if you haven't already:
pip install uv
-
Clone the repository:
git clone https://github.com/yourusername/missing_text.git cd missing_text
-
Create a virtual environment and install dependencies:
uv venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate` uv install # Install primary dependencies uv install --group dev # Install development dependencies
Usage
Here's a quick example of how to use missing_text:
from missing_text.hello_missing import hello_missing
result = hello_missing()
print(result)
For more detailed usage instructions, please refer to the documentation.
CLI Usage
After installing the package, you can use the CLI as follows:
# Run the hello_missing function
missing run
# Run with a custom name
missing run --name Alice
# Show the version
missing version
# Start a FastAPI server
missing fastapi
# Start a FastAPI server with custom host and port
missing fastapi --host 0.0.0.0 --port 5000
# Show help
missing --help
The FastAPI server can be configured using environment variables or command-line arguments:
MISSING_FAST_API_HOST
: Sets the host for the FastAPI server (default: 0.0.0.0)MISSING_FAST_API_PORT
: Sets the port for the FastAPI server (default: 8000)
You can set these in a .env
file in your project root or as system environment variables.
Command-line arguments will override environment variables:
#
The FastAPI server will have two endpoints:
/
: Returns a welcome message/hello/{name}
: Returns the result ofhello_missing(name)
You can access these endpoints in your browser or using tools like curl:
curl http://localhost:8000/
curl http://localhost:8000/hello/Alice
Development
To set up the development environment:
- Follow the installation steps above.
- Install development dependencies:
uv pip install -r requirements-dev.txt
- Install pre-commit hooks:
pre-commit install
- [Optional] Run pre-commit checks manually:
pre-commit run --all-files
- Build the package locally:
uv run python -m build
- Install the package in editable mode:
uv install --editable .
Testing
We use pytest for automated testing. To run the tests:
pytest
If you want to run the tests with coverage, yet to be implemented:
pytest --cov=missing_text
All new features should have corresponding test cases. Tests are located in the tests/
directory.
Contributing
We welcome contributions to missing_text! Here's how you can contribute:
- Check the Issues page for open issues or create a new one to discuss your ideas.
- Fork the repository and create a new branch for your feature or bug fix.
- Write code and tests for your changes.
- Ensure all tests pass and the code adheres to the project's style guide.
- Submit a pull request with a clear description of your changes.
Please read our Contributing Guidelines for more details.
For any other queries, please reach out to us at maruthi@typeless.io
License
This project is licensed under the Apache License 2.0.
For more information, please visit our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file missing_text-0.0.1a5.tar.gz
.
File metadata
- Download URL: missing_text-0.0.1a5.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a79798b7b27725eabd6443abfad0066683ca0d2fc6a7cb4dd69db7f939cc4a7c |
|
MD5 | 1bfd109003eaf7ff795994f9e6b5a609 |
|
BLAKE2b-256 | f3e235843e840d92e076cb0b10fa08a5902c7fa051a0d1b52cc32f80ffc79375 |
File details
Details for the file missing_text-0.0.1a5-py3-none-any.whl
.
File metadata
- Download URL: missing_text-0.0.1a5-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 193e99a0f1d63db68c274edcb2f44dc7e65a6136067c167ebfeb76219fead4b6 |
|
MD5 | 57d00ff38fe889f5069b347a3d1bf75f |
|
BLAKE2b-256 | c38771bc2f5824c292c6ffcbb8c1227cf9a8a3a331b646850ac25fcaf6b56947 |