A practical Markdown image url converter
Project description
imarkdown
imarkdown is a lightweight markdown image link converter that allows you to easily convert image links between local and image server, as well as between different image servers.
When converting markdown from Yuque, the images are protected against external linking. If you want to publish the converted markdown on other platforms, you need to change all the image addresses in the markdown to local image addresses or custom image server addresses, so that others can view them correctly. This project aims to solve this problem by providing a converter that can batch convert image links in markdown and supports customized conversion in complex scenarios.
Features
- Batch Download: imarkdown can batch download all the images in the markdown that are referenced using image links.
- Multiple Conversion Methods: imarkdown supports various conversion methods for image links in markdown, such as converting to local image, converting web URL to local image, and converting web URL to image server URL.
- Batch Conversion: It supports batch conversion of single or multiple files, as well as formatting and renaming of generated files.
- Highly Customizable: By inheriting the
MdAdapter
class, you can easily implement custom URL conversion for different image servers. - Image Server Adapters: Currently, only Aliyun OSS is supported as an image server. Contributions are welcome to add support for more types of image servers.
- Custom recognition format: With ElementFinder, users can customize the search method for elements (such as image addresses) to meet the needs of special element recognition.
Target Audience
- People who need to batch convert image links in markdown.
- Users who export markdown from Yuque and need to convert image links.
- People who need to develop third-party extensions.
Technical Architecture
imarkdown
is designed with a modular architecture, allowing easy extension of each component. The following diagram provides a simplified overview of the technical architecture of imarkdown, which consists of the following components:
MdImageConverter
: The image converter responsible for converting the image addresses in markdown and generating new markdown files.MdAdapter
: The adapter for convertingImage
objects to different types, such asLocalFileAdapter
for local conversion,AliyunAdapter
for Aliyun OSS conversion, andCustomAdapter
for custom conversion. By injecting the adapter intoMdImageConverter
, you can define the type of address to convertImage
objects to.MdMedium
: IncludesMdFile
andMdFolder
, encapsulating some features used for data conversion inMdImageConverter
.
The execution process of imarkdown
is roughly as follows: after the convert
method is called on MdImageConverter
, imarkdown
builds a virtual MdTree
based on the provided MdMedium
, and performs batch image URL conversion on the files according to this tree.
Quick Start
This project is developed in Python and distributed on PyPI. Users can easily use imarkdown by installing it with pip. The following sections will explain several usage scenarios and methods of imarkdown.
- Third-Party Package Installation: Open a terminal or command prompt and run the following command.
pip install -U imarkdown
Examples
- Convert Web URLs to Local Addresses
- Convert Web URLs to Image Server URLs
- Batch Conversion: Web URLs to Local
- Batch Conversion: Local to Image Server
- Custom Image Hosting Service
Converting Web URLs to Local File Paths
If you have web URL links for images in your Markdown file and you want to download them in bulk to your local machine while converting the image addresses in Markdown to local file paths, the following example will solve your problem.
Assuming the file you want to convert is test.md
with the following content:
The examples below will be based on the initial Markdown file if it is a web URL. They are not repeated here.
## 6.3 Converting MD Image Addresses
Only supports uploading to a local image hosting service.
- [https://github.com/JyHu/useful_script.git](https://github.com/JyHu/useful_script.git)
- [https://github.com/JyHu/useful_script/blob/](https://github.com/JyHu/useful_script/blob/master/Scripts/md%E6%96%87%E4%BB%B6%E5%9B%BE%E7%89%87%E5%9B%BE%E5%BA%8A%E8%BD%AC%E6%8D%A2/%E8%87%AA%E5%8A%A8%E8%BD%AC%E6%8D%A2markdown%E6%96%87%E4%BB%B6%E4%B8%AD%E5%9B%BE%E7%89%87%E5%88%B0%E5%9B%BE%E5%BA%8A.md/)
After all the hassle and trying this, I found it doesn't work either.
![image.png](https://cdn.nlark.com/yuque/0/2022/png/26910220/1670091709979-52f8c3c4-a00f-4668-a236-29ad2c09d0da.png#averageHue=%23272c34&clientId=ubb991e0d-3414-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=928&id=u4a7a8376&margin=%5Bobject%20Object%5D&name=image.png&originHeight=928&originWidth=1050&originalType=binary&ratio=1&rotation=0&showTitle=false&size=201083&status=done&style=none&taskId=u27493dc0-9d78-4c07-929c-cc946d41409&title=&width=1050)
In the end, PigGo is still the best. It provides a shortcut for uploading, and after uploading, you can directly xxxTODO.
## 6.4 Pycasbin
In pycasbin, I saw a colleague who frequently contributes to pycasbin. You can refer to some of his contributions:
- [https://github.com/Nekotoxin/nekotoxin.github.io/blob/gsoc_2022_summary/GSoC2022-summary.md](https://github.com/Nekotoxin/nekotoxin.github.io/blob/gsoc_2022_summary/GSoC2022-summary.md)
![image.png](https://cdn.nlark.com/yuque/0/2022/png/26910220/1670150012015-3a93ec6b-bb27-4ed3-b42f-252a0f70b65c.png#averageHue=%23fcfbf5&clientId=u86ce0a81-ec80-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=936&id=ube9c482c&margin=%5Bobject%20Object%5D&name=image.png&originHeight=936&originWidth=1920&originalType=binary&ratio=1&rotation=0&showTitle=false&size=205691&status=done&style=none&taskId=u6a6825da-aaf4-471c-ad0e-2280c325c66&title=&width=1920)
This content is the exported Markdown file from Yuque, and its image links are protected against external access. They need to be downloaded and replaced with local file paths in the Markdown.
from imarkdown import MdFile, LocalFileAdapter, MdImageConverter
def main():
adapter = LocalFileAdapter()
converter = MdImageConverter(adapter=adapter)
md_file = MdFile(name="test.md")
converter.convert(md_file)
if __name__ == '__main__':
main()
In imarkdown, there are many places where you need to provide paths. You can use relative paths or absolute paths. It is recommended to use
/
as the path separator instead of\\
.
The converted result will generate a new Markdown file named test_converted.md
, with the following content:
## 6.3 Converting MD Image Addresses
Only supports uploading to a local image hosting service.
- [https://github.com/JyHu/useful_script.git](https://github.com/JyHu/useful_script.git)
- [https://github.com/JyHu/useful_script/blob/](https://github.com/JyHu/useful_script/blob/master/Scripts/md%E6%96%87%E4%BB%B6%E5%9B%BE%E7%89%87%E5%9B%BE%E5%BA%8A%E8%BD%AC%E6%8D%A2/%E8%87%AA%E5%8A%A8%E8%BD%AC%E6%8D%A2markdown%E6%96%87%E4%BB%B6%E4%B8%AD%E5%9B%BE%E7%89%87%E5%88%B0%E5%9B%BE%E5%BA%8A.md/)
After all the hassle and trying this, I found it doesn't work either.
![image.png](./images/20230713_1356451324.png)
In the end, PigGo is still the best. It provides a shortcut for uploading, and after uploading, you can directly xxxTODO.
## 6.4 Pycasbin
In pycasbin, I saw a colleague who frequently contributes to pycasbin. You can refer to some of his contributions:
- [https://github.com/Nekotoxin/nekotoxin.github.io/blob/gsoc_2022_summary/GSoC2022-summary.md](https://github.com/Nekotoxin/nekotoxin.github.io/blob/gsoc_2022_summary/GSoC2022-summary.md)
![image.png](./images/20230713_1356469646.png)
- Customizing the Output File Name
When using imarkdown for conversion, the default name for the converted Markdown file is
{markdown_file_name}_converted.md
. If you want to customize the output file name, you can use the following configuration:
md_converter.convert(md_file, name_prefix="new_", name_suffix="_converted")
Using this method of conversion will result in a file named new_test_converted.md
. If you want to completely customize the converted name, you can use the following method:
md_converter.convert(md_file, new_name="A new markdown.md")
With the above method, a file named A new markdown.md
will be generated.
Converting Web URLs to Image Hosting Service URLs
In the following example, we will use the test.md
file provided earlier to demonstrate the conversion of image addresses from web URLs to URLs on the Alibaba Cloud OSS server, showcasing the functionality of converting web URLs to image hosting service URLs.
from imarkdown import MdImageConverter, AliyunAdapter, MdFile
def main():
aliyun_config = {
"access_key_id": "key_id",
"access_key_secret": "key_secret",
"bucket_name": "bucket_name",
"place": "bucket_place",
"storage_path_prefix": "prefix",
}
adapter = AliyunAdapter(**aliyun_config)
md_converter = MdImageConverter(adapter=adapter)
md_file = MdFile(name="markdown.md")
md_converter.convert(md_file)
if __name__ == "__main__":
main()
In the example above, storage_path_prefix
represents the file prefix in the bucket where the image is uploaded to OSS. If you want to store the image in the /imarkdown
directory, you need to set storage_path_prefix="/imarkdown"
. The remaining configuration details are specific to Alibaba Cloud OSS, so please fill them in accordingly.
Batch Conversion of Multiple Files: Web URL to Local
The following example demonstrates a solution for batch conversion of multiple Markdown files, converting web image URLs to local image addresses.
from imarkdown import LocalFileAdapter, MdFolder, MdImageConverter
def main():
adapter = LocalFileAdapter()
converter = MdImageConverter(adapter=adapter)
# Folder name is "mds"
md_folder = MdFolder(name="mds")
# Output files to "converted_mds"
converter.convert(md_folder, output_directory="converted_mds")
Using the above code snippet will create a folder named "converted_mds" where the converted files will be saved. The images will be saved in the converted_mds/images
directory. If you want to output the images to a specific folder, you can set it as follows:
md_folder = MdFolder(name="mds", image_directory="mds/my_images")
Batch Conversion of Multiple Files: Local to Image Hosting Service
The following example demonstrates a solution for batch conversion of multiple Markdown files, converting local image addresses to image hosting service URLs.
from imarkdown import LocalFileAdapter, MdFolder, MdImageConverter
def main():
adapter = LocalFileAdapter()
converter = MdImageConverter(adapter=adapter)
# Folder name is "local_mds", images are of local type, and image URLs are saved in "local_mds/images"
md_folder = MdFolder(name="local_mds", image_type="local", image_directory="local_mds/images")
# Output files to "converted_mds"
converter.convert(md_folder, output_directory="converted_mds")
Custom Image Hosting Service
The following example demonstrates how to use imarkdown to upload images to a custom file server and retrieve the URL.
First, you need to create a custom adapter by inheriting from BaseMdAdapter
and implementing the upload
and get_replaced_url
methods. Then, you can inject the custom adapter into MdImageConverter
.
import os
import json
import requests
from imarkdown import BaseMdAdapter, MdImageConverter, MdFolder
class CustomMdAdapter(BaseMdAdapter):
name = "custom"
url = "https://server/upload/file/batch"
headers = {
"X-Upload-Token": "my_token"
}
cur_key = ""
def upload(self, key: str, file):
files = {"file": file}
response = requests.post(self.url, headers=self.headers, files=files)
if response.status_code == 200:
res_data = json.loads(response.content.decode("utf-8"))
self.cur_key = res_data["data"]["images"]["url"]
else:
raise Exception(response.content)
def get_replaced_url(self, key):
return self.cur_key
def get_all_folder_names():
return os.listdir("my-blog")
def main():
adapter = CustomMdAdapter()
md_converter = MdImageConverter(adapter=adapter)
md_folders = []
for folder_name in get_all_folder_names():
md_folders.append(
MdFolder(
name=f"my-blog/{folder_name}",
image_directory=f"my-blog/{folder_name}/images",
image_type="local",
)
)
md_converter.convert(md_folders, output_directory="converted")
if __name__ == "__main__":
main()
Custom Regular Expression
imarkdown
uses the regular expression element finder ReElementFinder
to recognize the URL of an image, the finder currently supports ![](image_url)
and <img src="image_url"/>
are two types of image URL format recognition. Of course, if your image URL is strange, sometimes the default regular expression for imarkdown
cannot be recognized.
At this point, you can customize an element finder called CustomElementFinder
, which can recognize the content you need to recognize through custom regular expressions or other recognition methods, and use it to pass it to MdImageConverter for element replacement. The following example shows how to use a custom ElementFinder
to identify image links.
import re
from typing import List
from imarkdown import BaseElementFinder, MdFile, MdImageConverter, LocalFileAdapter
class CustomElementFinder(BaseElementFinder):
def find_all_elements(self, md_str) -> List[str]:
re_rule: str = r"(?:!\[(.*?)\]\((.*?)\))|<img.*?src=[\'\"](.*?)[\'\"].*?>"
images = re.findall(re_rule, md_str)
return list(map(lambda item: item[1], images))
def main():
adapter = LocalFileAdapter()
converter = MdImageConverter(adapter=adapter)
element_finder = CustomElementFinder()
md_file = MdFile(name="test.md")
converter.convert(md_file, element_finder=element_finder)
if __name__ == "__main__":
main()
In this example, CustomElementFinder
needs to inherit from BaseElementFinder
and implement find_all_elements()
function and implements specific search logic to construct an array of all elements found in the markdown (such as the urls of all images) and return it to MdImageConverter
.
Roadmap
- Add client-side support
- Support Tencent Cloud, Qiniu Cloud, and other image hosting services
- Support batch file modification
- Custom adapters
- Support compression of large images
- Support command-line interface
- Support PyPI for simplified operations
- Provide file custom naming
- Provide custom formatting for image names
FAQ
1. How can I extend to other image hosting services?
If you want to develop support for other image hosting services, all you need to do is implement an adapter similar to AliyunAdapter
and inject it into MdImageConverter
when using it. Refer to Custom Image Hosting Service. That's all you need to do. In fact, I've wrapped it up nicely, so extending it is straightforward.
2. Usage of file addresses
In Python, both /
and \\
can be used as file path separators. However, there are some differences in their usage.
/
(forward slash): In most operating systems, including Windows, Linux, and Mac,/
is used as the file path separator. Using/
as the separator can make your code portable across different operating systems. For example:
path = "folder/file.txt"
\\
(backslash): In the Windows operating system,\\
is used as the file path separator. This is because\
is used as an escape character in Windows, so to represent a plain backslash, you need to use two consecutive backslashes. For example:
path = "folder\\file.txt"
When using \\
as the separator, keep the following points in mind:
- When using
\\
in a string, you need to escape it as\\\\
because the first\
will be interpreted as an escape character. - You can use raw strings to avoid the hassle of escaping. In a raw string,
\
is not interpreted as an escape character. For example:path = r"folder\file.txt"
In summary, if your code needs to run on different operating systems, it is recommended to use /
as the file path separator to maintain code portability. If you are only running the code on Windows, using \\
is also acceptable. However, in imarkdown
, all \\
paths will be converted to /
. In this project, all \\
paths will be converted to /
.
Contribution
Contributions are welcome! If you would like to contribute to this project, you can submit a pull request or an issue. There are some extensible features listed in the Development Roadmap, and there are still many third-party image hosting services that need to be adapted. If you are working on these aspects, feel free to submit a pull request! I'm excited to see more people involved in improving and optimizing it.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file imarkdown-1.2.2.tar.gz
.
File metadata
- Download URL: imarkdown-1.2.2.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4565c22fd85fda7daa7dd606fe46eee9a2e47c8be1113ec9ff7ef8017d16fa6 |
|
MD5 | 431cc263fd270e75b213bdadc3944078 |
|
BLAKE2b-256 | 8bf1a464a7316f1b7ae27c6db26c8e900d94ee0e9d5a141943520f6a92d6d811 |