A tool for generating Anki cards by web scraping
Project description
cardscraper
A tool for generating Anki packages by webscraping.
Installation
install from PyPI
pip install cardscraper
Basic usage
cardscraper ...
or python -m cardscraper ...
cardscraper has 3 main subcommands:
cardscraper gen
- takes in input files and generates Anki packagescardscraper init
- generates skeleton input filescardscraper list
- lists all available functions for each step (cardscraper has a plugin system)
and you can always use cardscraper <subcommand> -h
I recommend doing something like:
cardscraper init hello.yaml
- edit the file to suit your needs
cardscraper gen hello.yaml
Input file format
cardscraper takes in YAML files as input
# here you can specify which function to use for each step
# (every one defaults to 'default')
meta:
# controls package details and package dumping
package: default
# controls deck creation
deck: default
# controls model creation
model: default
# controls scraping and note creation
scraping: default
# anki package info
package:
# package name
name: package_name
# output folder (defaults to '.')
output: ./out/
# media folder (defaults to null)
# every file in the directory will be added to the package as media
media: ./media/
# anki deck info
deck:
# deck name
name: Deck
# deck id
# don't forget to make this value unique
id: 987
# anki model info
model:
# model name
name: Model
# model id
# don't forget to make this value unique
id: 321
# card styling (defaults to '')
css: |
.question, .answer {
text-align: center;
}
.question {
font-size: 5rem;
font-weight: 700;
}
.answer {
font-size: 3rem;
}
# list of cards
templates:
# card name
- name: Front
# front side
qfmt: |
<div class='question'>
{{Question}}
</div>
# back side
afmt: |
{{FrontSide}}
<hr id='answer'>
<div class='answer'>
{{Answer}}
</div>
# same here
- name: Back
qfmt: |
<div class='question'>
{{Answer}}
</div>
afmt: |
{{FrontSide}}
<hr id='answer'>
<div class='answer'>
{{Question}}
</div>
# scraping info
scraping:
# list of urls to scrape
urls:
- https://www.scrapethissite.com/pages/simple/
# you can set your own custom user agent (defaults to the one here)
agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0
# list of queries
# each query selects an html element and lets you use its text in the templates
# each child query runs inside the parent one
queries:
# query name which you can use in the templates like {{Country}}
- name: Country
# css selector
query: .country
# you can select something specific from the query by providing a regex
# this is a python regex with re.DOTALL enabled i.e. '.' captures '\n'
# uses the first captured group
# (defaults to null)
regex: null
# if true: we select every instance and iterate over them
# if false: we only select the first one
# basically it's querySelector() vs querySelectorAll()
# (defaults to false)
many: true
children:
- name: Question
query: .country-info
many: false
regex: (Area .*)$
children: null
- name: Answer
query: .country-name
many: false
regex: null
children: null
Using in code
import yaml
from cardscraper import (
Config,
Step,
generate_anki_package,
get_function_by_group_and_name,
)
from genanki import Model, Note
if __name__ == '__main__':
with open('/path/to/config.yaml', 'r') as f:
config: Config = yaml.load(f, yaml.Loader)
# or you can make a config manually
get_model = get_function_by_group_and_name(Step.MODEL, 'default')
get_deck = get_function_by_group_and_name(Step.DECK, 'default')
get_package = get_function_by_group_and_name(Step.PACKAGE, 'default')
def get_notes(config: Config, model: Model) -> list[Note]:
notes = []
...
return notes
generate_anki_package(config, get_model, get_notes, get_deck, get_package)
Plugin system
you can add custom functions by exposing cardscraper.step
entry point in your package
[project.entry-points.'cardscraper.model']
my_impl = 'mypackage:gen_model'
[project.entry-points.'cardscraper.scraping']
my_impl = 'mypackage:gen_notes'
[project.entry-points.'cardscraper.deck']
my_impl = 'mypackage:gen_deck'
[project.entry-points.'cardscraper.package']
my_impl = 'mypackage:gen_package'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cardscraper-0.3.0.tar.gz
(9.5 kB
view hashes)
Built Distribution
Close
Hashes for cardscraper-0.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1958c5504745dc3aff35d530e8b2fa503d10c669d28392638426578c94f069ce |
|
MD5 | f6107da9306c8faa2b151021cf02cbb2 |
|
BLAKE2b-256 | fd719c7f4bec77b3bc54579904ec79ed89f31c18d6c608db45d9b8a4a4f65ad3 |