A tool for generating Anki cards by web scraping
Project description
cardscraper
cardscraper
is a tool for generating Anki packages by webscraping.
use me
install cardscraper
from PyPI
pip install cardscraper
generate a skeleton input file
cardscraper init hello.yaml
edit the file and generate the Anki package
cardscraper gen hello.yaml
editing the file
cardscraper
takes in YAML files as input
quick example
go down for an in-depth one
scraping:
urls:
- https://www.scrapethissite.com/pages/simple/
queries:
- name: Entry
query: .country
many: true
children:
- name: Info
query: .country-info
- name: Name
query: .country-name
model:
name: My Model
id: 123 # make this unique
templates:
- name: My Card Template
qfmt: |
{{Info}}
afmt: |
{{FrontSide}}
<hr id='answer'>
{{Name}}
deck:
name: My Deck
id: 987 # make this unique
package:
name: sample_package.apkg
output: ./output/
full explanation
# here you can specify which module to use for each step
# (every one defaults to 'default')
meta:
# controls package details and package dumping
package: default
# controls deck creation
deck: default
# controls model creation
model: default
# controls scraping and note creation
scraping: default
# anki package info
package:
# package name
name: package_name
# output folder (defaults to '.')
output: ./out/
# media folder (defaults to null)
# every file in the directory will be added to the package as media
media: ./media/
# anki deck info
deck:
# deck name
name: Deck
# deck id
# don't forget to make this value unique
id: 987
# anki model info
model:
# model name
name: Model
# model id
# don't forget to make this value unique
id: 321
# card styling (defaults to '')
css: |
.question, .answer {
text-align: center;
}
.question {
font-size: 5rem;
font-weight: 700;
}
.answer {
font-size: 3rem;
}
# list of cards
templates:
# card name
- name: Front
# front side
qfmt: |
<div class='question'>
{{Question}}
</div>
# back side
afmt: |
{{FrontSide}}
<hr id='answer'>
<div class='answer'>
{{Answer}}
</div>
# same here
- name: Back
qfmt: |
<div class='question'>
{{Answer}}
</div>
afmt: |
{{FrontSide}}
<hr id='answer'>
<div class='answer'>
{{Question}}
</div>
# scraping info
scraping:
# list of urls to scrape
urls:
- https://www.scrapethissite.com/pages/simple/
# you can set your own custom user agent (defaults to the one here)
agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0
# list of queries
# each query selects an html element and lets you use its text in the templates
# each child query runs inside the parent one
queries:
# query name which you can use in the templates like {{Country}}
- name: Country
# css selector
query: .country
# you can select something specific from the query by providing a regex
# this is a python regex with re.DOTALL enabled i.e. '.' captures '\n'
# uses the first captured group
# (defaults to null)
regex: null
# if true: we select every instance and iterate over them
# if false: we only select the first one
# basically it's querySelector() vs querySelectorAll()
# (defaults to false)
many: true
children:
- name: Question
query: .country-info
many: false
regex: (Area .*)$
children: null
- name: Answer
query: .country-name
many: false
regex: null
children: null
using in code
from cardscraper import generate_anki_package, get_plugin_by_group_and_name
from cardscraper.__main__ import read_yaml_file
from cardscraper.generate import Config, Module
from genanki import Model, Note
if __name__ == '__main__':
config = read_yaml_file('/path/to/config.yaml')
# or
# config: Config = {...}
get_model = get_plugin_by_group_and_name(Module.MODEL, 'default')
get_deck = get_plugin_by_group_and_name(Module.DECK, 'default')
get_package = get_plugin_by_group_and_name(Module.PACKAGE, 'default')
def get_notes(config: Config, model: Model) -> list[Note]:
notes = []
...
return notes
generate_anki_package(config, get_model, get_notes, get_deck, get_package)
plugin system
you can add custom modules by exposing cardscraper.x
entry point in your package
[project.entry-points.'cardscraper.model']
my_impl = 'mypackage:gen_model'
[project.entry-points.'cardscraper.scraping']
my_impl = 'mypackage:gen_notes'
[project.entry-points.'cardscraper.deck']
my_impl = 'mypackage:gen_deck'
[project.entry-points.'cardscraper.package']
my_impl = 'mypackage:gen_package'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for cardscraper-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6963faa009649c5f305a90b782103f4275a99650d862f2072929919597835133 |
|
MD5 | 3557c040af0ce036b053f90430748620 |
|
BLAKE2b-256 | 0734682be1decde8f85ed5163bce17af4484e76039f0dc46700cb10419364483 |