Haystack custom components for your favourite dataframe library.
Project description
Dataframes Haystack
📃 Description
dataframes-haystack
is an extension for Haystack 2 that enables integration with dataframe libraries.
The dataframe libraries currently supported are:
The library offers various custom Converters components to transform dataframes into Haystack Document
objects:
FileToPandasDataFrame
andFileToPolarsDataFrame
read files and convert them into dataframes.PandasDataFrameConverter
orPolarsDataFrameConverter
convert data stored in dataframes into HaystackDocument
objects.
🛠️ Installation
# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack
# for polars
pip install "dataframes-haystack[polars]"
💻 Usage
[!TIP] See the Example Notebooks for complete examples.
Pandas
FileToPandasDataFrame
from dataframes_haystack.components.converters.pandas import FileToPandasDataFrame
converter = FileToPandasDataFrame(file_format="csv")
output_dataframe = converter.run(
file_paths=["data/doc1.csv", "data/doc2.csv"]
)
Result:
>>> output_dataframe
{'dataframe': <pandas.DataFrame>}
PandasDataFrameConverter
import pandas as pd
from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter
df = pd.DataFrame({
"text": ["Hello world", "Hello everyone"],
"filename": ["doc1.txt", "doc2.txt"],
})
converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
Result:
>>> documents
{'documents': [
Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
Polars
FileToPolarsDataFrame
from dataframes_haystack.components.converters.polars import FileToPolarsDataFrame
converter = FileToPolarsDataFrame(file_format="csv")
output_dataframe = converter.run(
file_paths=["data/doc1.csv", "data/doc2.csv"]
)
Result:
>>> output_dataframe
{'dataframe': <polars.DataFrame>}
PolarsDataFrameConverter
import polars as pl
from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter
df = pl.DataFrame({
"text": ["Hello world", "Hello everyone"],
"filename": ["doc1.txt", "doc2.txt"],
})
converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
Result:
>>> documents
{'documents': [
Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
🤝 Contributing
Do you have an idea for a new feature? Did you find a bug that needs fixing?
Feel free to open an issue or submit a PR!
Setup development environment
Requirements: hatch
, pre-commit
- Clone the repository
- Run
hatch shell
to create and activate a virtual environment - Run
pre-commit install
to install the pre-commit hooks. This will force the linting and formatting checks.
Run tests
- Linting and formatting checks:
hatch run lint:fmt
- Unit tests:
hatch run test-cov-all
✍️ License
dataframes-haystack
is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataframes_haystack-0.0.2.tar.gz
(117.9 kB
view hashes)
Built Distribution
Close
Hashes for dataframes_haystack-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 442a1ad00d3dafbddbd933d3bf72dbdabfa9249b62978592263169354a3ee844 |
|
MD5 | 826589aaedd0edd6ab97f4a446f1922a |
|
BLAKE2b-256 | 2107688833c253328e9f6c5d131ff49af0f7c657d275db42d80174342ccecac3 |
Close
Hashes for dataframes_haystack-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68b7f350909d29a50e6ea0e584face3feedd2079a2a60b6fab619929a893737c |
|
MD5 | d96365cafeb0fa528905a421423daf73 |
|
BLAKE2b-256 | 7104be9076ea94d9f9da74021c011eb9a904c48f354d07e0fefb3e0a95f916a7 |