Flask extension to parse websites and extract structured data to build sitemaps.
Project description
canonicalwebteam.directory-parser
Flask extension to parse websites and extract structured data to build sitemaps.
Install
Install the project with pip: pip install canonicalwebteam.directory-parser
Using the directory parser
Sitemap templates
Include sitemap templates in your Flask app. Copy the following codeblock to where your application is instantiated e.g app.py. The template loader should be placed right after the app is instantiated.
from jinja2 import ChoiceLoader, FileSystemLoader
from pathlib import Path
import canonicalwebteam.directory_parser as directory_parser
# Set up Flask application
app = FlaskBase(...)
# Include directory parser templates
directory_parser_templates = (
Path(directory_parser.__file__).parent / "templates"
)
loader = ChoiceLoader(
[
FileSystemLoader(str(directory_parser_templates)),
]
)
app.jinja_loader = loader
Generate sitemaps
The generate_sitemap function will generate a sitemap given directory path and base url using the sitemap templates.
# Dynamic sitemaps that do not need to be included in the sitemap tree.
# Differ from project to project, can be checked on /sitemap.xml
DYNAMIC_SITEMAPS = [
"tutorials",
"engage",
"ceph/docs",
"blog",
"security/notices",
"security/cves",
"security/livepatch/docs",
"robotics/docs",
]
directory_path = os.getcwd() + "/templates"
base_url = "https://ubuntu.com"
xml_sitemap = directory_parser.generate_sitemap(
directory_path,
base_url,
exclude_paths=DYNAMIC_SITEMAPS
)
if xml_sitemap:
with open(sitemap_path, "w") as f:
f.write(xml_sitemap)
# Serve the existing sitemap
with open(sitemap_path, "r") as f:
xml_sitemap = f.read()
response = flask.make_response(xml_sitemap)
response.headers["Content-Type"] = "application/xml"
return response
Parse project directory tree
If you'd like to get the parsed tree of a given directory, you can use the scan_directory function.
directory_path = os.getcwd() + "/templates"
tree = directory_parser.scan_directory(
directory_path, exclude_paths=DYNAMIC_SITEMAPS
)
tree will return a tree of all the templates given in the directory_path
Local development
Running the project
This guide assumes that you are using dotrun to run your Flask app.
Include a relative path to the project
This example assumes both project exist in the same directory
In requirements.txt:
# Comment out package import
# canonicalwebteam.directory-parser==1.2.6
-e ../directory-parser
Run project with a mounted additor
dotrun -m /path/to/canonicalwebteam.directory-parser:../directory-parser
Linting and formatting
To follow the standard linting rules of this project, we are using Tox
pip3 install tox # Install tox
tox -e lint # Check the format of Python code
tox -e format # Reformat the Python code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file canonicalwebteam_directory_parser-1.2.10.tar.gz.
File metadata
- Download URL: canonicalwebteam_directory_parser-1.2.10.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc77fc100d47ddf358ad9ee945d192354da8101e90084766fea99500d95d1d52
|
|
| MD5 |
3a78775618b61c788fb06bfb95b93c22
|
|
| BLAKE2b-256 |
123d90294eb3b4683e2db771988a76a4a83961394fab6e391eb6b7cf510d35e5
|
File details
Details for the file canonicalwebteam_directory_parser-1.2.10-py3-none-any.whl.
File metadata
- Download URL: canonicalwebteam_directory_parser-1.2.10-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f30e8c21f002de463e9dce9f3431537a3bc85840f1cd86a57875daad9ff0f89a
|
|
| MD5 |
6c30a185963f6d756ad2ea48f574ccb1
|
|
| BLAKE2b-256 |
23745f39c955ef788b8f046d753c27124157aed91e4a4a7c593f755cc1c52c10
|