Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.
Project description
DataFog Instructor SDK
DataFog Instructor is a Python SDK for named entity recognition (NER) using the Ollama Instructor client. It provides an easy-to-use interface for detecting and classifying entities in text.
Installation
To install the DataFog Instructor SDK, you can use pip:
pip install datafog-instructor
Quick Start
Here's a simple example to get you started with DataFog Instructor:
from datafog_instructor import DataFog
# Initialize DataFog with default settings
datafog = DataFog()
# Detect entities in text
text = "Cisco acquires Hess for $20 billion"
result = datafog.detect_entities(text)
# Print results
for entity in result.entities:
print(f"Text: {entity.text}, Type: {entity.type}")
Configuration
You can customize the DataFog instance with the following parameters:
host
: The host URL for the Ollama service (default: "http://localhost:11434")model
: The model to use for entity detection (default: "phi3")entity_types
: A dictionary of custom entity types (optional)
Example with custom settings:
datafog = DataFog(
host="http://custom-host:11434",
model="custom-model",
entity_types={"CUSTOM_TYPE": "Custom Entity"}
)
Features
Detect Entities
Use the detect_entities
method to identify and classify named entities in a given text:
text = "Apple Inc. reported $100 billion in revenue for Q4 2023"
result = datafog.detect_entities(text)
for entity in result.entities:
print(f"Text: {entity.text}, Type: {entity.type}")
Manage Entity Types
You can add or remove entity types dynamically:
# Add a new entity type
datafog.add_entity_type("CUSTOM", "Custom Entity")
# Remove an entity type
datafog.remove_entity_type("CUSTOM")
# Get all entity types
entity_types = datafog.get_entity_types()
print(entity_types)
Default Entity Types
The SDK comes with predefined entity types, including:
- ORG (Organization)
- PERSON
- TRANSACTION_TYPE
- DEAL_STRUCTURE
- FINANCIAL_INFO
- PRODUCT
- LOCATION
- DATE
- INDUSTRY
- ROLE
- REGULATORY
- SENSITIVE_INFO
- CONTACT
- ID (Identifier)
- STRATEGY
- COMPANY
- MONEY
Error Handling
The SDK includes basic error handling. If there's an issue with processing the response or an unexpected response format, it will raise a ValueError
with details about the error.
Contributing
Contributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.
License
MIT
Support
If you encounter any problems or have any questions, please open an issue on the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datafog_instructor-0.1.0b4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab5821fc16833acaf2e6277dc0bcda9e1b6786c90f67f2077a9f4394cf035542 |
|
MD5 | f4388f7c9a33c6a339d068beed5f8d92 |
|
BLAKE2b-256 | faa0785881d2d23da07ac20918fb61d3308bd43a143771c4a2b43c1b585db20e |
Hashes for datafog_instructor-0.1.0b4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fcd2854cab65615300bba40d4c01a45cf94cb48a93e3e69c2705cc202afcd2a |
|
MD5 | 9539ff5e303bf341e0ef6ff48c9409bd |
|
BLAKE2b-256 | 79d1b8ccc7f88f38a89fd59d7d09f6dba4bdfa4e0a64449529cba4e92b3e726a |