A tool for generating Anki cards by web scraping
Project description
cardscraper
Webscraping tool for generating Anki packages.
Installation
From PyPI:
pip install cardscraper
From git:
pip install git+https://github.com/sakhezech/cardscraper
Usage
cardscraper ...
or python -m cardscraper ...
Generate a skeleton input file:
cardscraper init filename.yaml
Edit it with your favorite text editor:
nvim filename.yaml
Generate the package:
cardscraper gen filename.yaml
For more info use cardscraper -h
.
Input files
You can generate a skeleton input file by using cardscraper init filename.yaml
.
Here is a big self-explaining input file example:
# here you can specify which function to use for each step
# (every one defaults to 'default')
meta:
# controls package details and package dumping
package: default
# controls deck creation
deck: default
# controls model creation
model: default
# controls scraping and note creation
scraping: default
# anki package info
package:
# package name
name: package_name
# output folder (defaults to '.')
output: ./out/
# media folder (defaults to null)
# the directory will be walked recursively
# every pattern matched file will be added to the package as media
media: ./media/
# pattern to match files against for media (defaults to **/*.*)
pattern: "**/*.png"
# anki deck info
deck:
# deck name
name: Deck
# deck id
# don't forget to make this value unique
id: 987
# anki model info
model:
# model name
name: Model
# model id
# don't forget to make this value unique
id: 321
# card styling (defaults to '')
css: |
.question, .answer {
text-align: center;
}
.question {
font-size: 5rem;
font-weight: 700;
}
.answer {
font-size: 3rem;
}
# list of cards
templates:
# card name
- name: Front
# front side
qfmt: |
<div class='question'>
{{Question}}
</div>
# back side
afmt: |
{{FrontSide}}
<hr id='answer'>
<div class='answer'>
{{Answer}}
</div>
# same here
- name: Back
qfmt: |
<div class='question'>
{{Answer}}
</div>
afmt: |
{{FrontSide}}
<hr id='answer'>
<div class='answer'>
{{Question}}
</div>
# scraping info
scraping:
# list of urls to scrape
urls:
- https://www.scrapethissite.com/pages/simple/
# you can set your own custom user agent (defaults to null)
agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0
# list of queries
# each query selects an html element and lets you use its text in the templates
# each child query runs inside the parent one
queries:
# query name which you can use in the templates like {{Country}}
- name: Country
# css selector
query: .country
# you can select something specific from the query by providing a regex
# this is a python regex with re.DOTALL enabled i.e. '.' captures '\n'
# uses the first captured group
# (defaults to null)
regex: null
# if true: we select every instance and iterate over them
# if false: we only select the first one
# basically it's querySelector() vs querySelectorAll()
# (defaults to false)
many: true
children:
- name: Question
query: .country-info
many: false
regex: (Area .*)$
children: null
- name: Answer
query: .country-name
many: false
regex: null
children: null
Usage in code
It is possible to use cardscraper programmatically, but it is created to be used as a CLI application.
import yaml
from cardscraper import (
Config,
generate_anki_package,
select_function_by_step_and_name,
write_package,
)
from genanki import Model, Note
if __name__ == '__main__':
with open('/path/to/config.yaml', 'r') as f:
config: Config = yaml.load(f, yaml.Loader)
# or you can make a config manually
get_model = select_function_by_step_and_name('model', 'default')
get_deck = select_function_by_step_and_name('deck', 'default')
get_package = select_function_by_step_and_name('package', 'default')
def get_notes(config: Config, model: Model) -> list[Note]:
notes = []
...
return notes
package, path = generate_anki_package(
config, get_model, get_notes, get_deck, get_package
)
write_package(package, path)
Plugin system
A plugin system is present in cardscraper. To expose your functions to cardscraper expose them in an entry point named cardscraper.STEPNAME
.
This is how the default functions are exposed:
[project.entry-points.'cardscraper.model']
default = 'cardscraper.default:get_model'
[project.entry-points.'cardscraper.scraping']
default = 'cardscraper.default:get_notes'
[project.entry-points.'cardscraper.deck']
default = 'cardscraper.default:get_deck'
[project.entry-points.'cardscraper.package']
default = 'cardscraper.default:get_package'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cardscraper-0.4.2.tar.gz
.
File metadata
- Download URL: cardscraper-0.4.2.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1832952230a2e10ede5a271e1aacde91efdc962701f24e10f7f72ed1529c9ed |
|
MD5 | 5ec01437def532f6d01fee7fd7389259 |
|
BLAKE2b-256 | 27e8d168534de72c884cdac636095a85d004cb8e3dd8f104c04642df5ddd9f29 |
File details
Details for the file cardscraper-0.4.2-py2.py3-none-any.whl
.
File metadata
- Download URL: cardscraper-0.4.2-py2.py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5d0d4ae4b68e337cbc61a97ce6aa3a7f9e4b724527e51d0503c75e6cb5105c9 |
|
MD5 | 91af56e5799644c032b13d381ed7fdfc |
|
BLAKE2b-256 | 6b10a9dc30dd1ba602514969fc1ad952cf3ef793085249aab5137805d575071f |