Skip to main content

llama-index readers wordpress integration

Project description

Wordpress Loader

pip install llama-index-readers-wordpress

This loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.

Usage

To use this loader, you need to pass base url of the Wordpress installation (e.g. https://www.mysite.com) and optionally a username, and an application password for the user (more about application passwords here)

from llama_index.readers.wordpress import WordpressReader

loader = WordpressReader(
    url="https://www.mysite.com",
    username="my_username",
    password="my_password",
)
documents = loader.load_data()

This loader is designed to be used as a way to load data into LlamaIndex.

Pages and Posts

Be default, the loader retrieves both Wordpress pages (static content) and posts (blog entries) from the target site. This behavior can be configured by setting get_pages=False or get_posts=False when initializing the WordpressReader object.

Additional Custom Post types

To scrape additional custom endpoints beside posts and pages, you can specify additional_post_types as a comma-separated list (e.g., additional_post_types="custom-pages,custom-posts") when initializing the WordpressReader object.

from llama_index.readers.wordpress import WordpressReader

loader = WordpressReader(
    url="https://www.mysite.com",
    username="my_username",
    password="my_password",
    additional_post_types="webiners,podcasts",
)
documents = loader.load_data()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_wordpress-0.3.0.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_readers_wordpress-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_wordpress-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5dd529e2918e6303779eefd8a9ab94570bcb4a06c4615937438686c534ca222a
MD5 e8409364c5249d17d1571cd4b81001c2
BLAKE2b-256 4aaf2f6734bead507ce91244f6ed567c704f807bb935a0caf5ea12302a913624

See more details on using hashes here.

File details

Details for the file llama_index_readers_wordpress-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_wordpress-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92d44cce032a115ec3b7356f2bc6632660156f7d59df61a25350b0fe96c0fbb0
MD5 c61abb255bbfe5c8bffda4bf61dbcf72
BLAKE2b-256 51a0b6f1c315a1fc00786af9a768b5f980b57bbe787cad9fd99ad0747491f9bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page