Skip to main content

llama-index readers wordpress integration

Project description

Wordpress Loader

pip install llama-index-readers-wordpress

This loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.

Usage

To use this loader, you need to pass base url of the Wordpress installation (e.g. https://www.mysite.com) and optionally a username, and an application password for the user (more about application passwords here)

from llama_index.readers.wordpress import WordpressReader

loader = WordpressReader(
    url="https://www.mysite.com",
    username="my_username",
    password="my_password",
)
documents = loader.load_data()

This loader is designed to be used as a way to load data into LlamaIndex.

Pages and Posts

Be default, the loader retrieves both Wordpress pages (static content) and posts (blog entries) from the target site. This behavior can be configured by setting get_pages=False or get_posts=False when initializing the WordpressReader object.

Additional Custom Post types

To scrape additional custom endpoints beside posts and pages, you can specify additional_post_types as a comma-separated list (e.g., additional_post_types="custom-pages,custom-posts") when initializing the WordpressReader object.

from llama_index.readers.wordpress import WordpressReader

loader = WordpressReader(
    url="https://www.mysite.com",
    username="my_username",
    password="my_password",
    additional_post_types="webiners,podcasts",
)
documents = loader.load_data()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_wordpress-0.4.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_wordpress-0.4.0-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_wordpress-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_wordpress-0.4.0.tar.gz
Algorithm Hash digest
SHA256 83952fb91ee0d6eff0a9875d8bd71bd414397937bd813d631e986f3abefe5ec3
MD5 c830fcf88b8c0f6a94361e616b674553
BLAKE2b-256 75f908f92984ed0938f1fe087c558d8ffcf5b4a7bef77f49c5065901be553695

See more details on using hashes here.

File details

Details for the file llama_index_readers_wordpress-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_wordpress-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b3802e3c11bdd6cff978b325e36bb7c849166af304aa36d15d6d9df35ea1b56
MD5 f90c31beaeba33a0a64e3fa87ee826ea
BLAKE2b-256 1250eca99f0396b09617fcdf3dd9a3730b874e2e1c73c28a1e3634336ff266a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page