FlexiData is an open-source Python package designed for processing unstructured data.
Project description
flexidata
FlexiData is an open-source Python package designed for processing unstructured data. Currently, it supports PDF extraction with plans to expand to other file types in the future.
Features
-
Document Parser: The tool is designed to parse various file types, efficiently extracting text blocks and their metadata. This enables streamlined data extraction and content manipulation across multiple document formats.
Supported File Formats:
- PDF: For detailed extraction of text and metadata.
- Images (JPEG, PNG, BMP): Enables image data and metadata processing.
- EPUB: Converts and extracts content from EPUB files.
- HTML: Parses HTML content for data extraction.
- reStructuredText (RST): Handles conversion and parsing of RST files.
- Rich Text Format (RTF): Facilitates conversion and content extraction from RTF documents.
- DOCX: Allows extraction from DOCX documents, providing access to structured content and metadata.
Upcoming Features:
- Content Chunking: This will enable dividing text into meaningful and manageable pieces for better processing and analysis.
- Data Embedding: Planned to support embedding textual data into vector spaces for advanced data analysis and machine learning applications.
Contributing
Contributions to FlexiData are welcome! If you're interested in contributing, please read our contributing guidelines.
Development Status
FlexiData is currently in active development and we are working towards releasing our first version soon.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flexidata-0.0.19.tar.gz
.
File metadata
- Download URL: flexidata-0.0.19.tar.gz
- Upload date:
- Size: 61.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc50385c66f85a0d89e1b818c0cc8133b1c4bb8b8ea0aa37c518d84764a4a975 |
|
MD5 | c33d2d6691e584c2bd296b308556225c |
|
BLAKE2b-256 | 0c38b5b82bc8c21abaea5f68b5277ba33ab3c26b5591753382f7aacc274dede5 |
File details
Details for the file flexidata-0.0.19-py3-none-any.whl
.
File metadata
- Download URL: flexidata-0.0.19-py3-none-any.whl
- Upload date:
- Size: 80.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3da22c3d0c99487ac88bf3bc8f34b7ada1b9c2e476b310353391e782b5bc5fff |
|
MD5 | 46e7e6bbf57853ad52d7bee8a20bb205 |
|
BLAKE2b-256 | eee6e91e2122624e2458f7cad35d1d2e05e06d79804341da16d11b764874ee48 |