A tool for generating Anki cards by web scraping
Project description
cardscraper
A tool for generating Anki cards by web scraping
Installation
with pip
python3 -m pip install cardscraper
with pipx
pipx install --include-deps cardscraper
Playwright
cardscraper uses Playwright for scraping by default; you will need to install Chromium
playwright install chromium
Usage
cardscraper ...
or python3 -m cardscraper ...
cardscraper has 3 main subcommands:
cardscraper gen
- takes in yaml instruction files and generates Anki packagescardscraper init
- generates yaml instruction file templatescardscraper list
- lists all available module implementations
and you can always use cardscraper <subcommand> -h
I recommend doing something like:
cardscraper init hello.yaml
- edit the file to suit your needs
cardscraper gen hello.yaml
YAML instruction file
# Meta defines which functions will take care of each step
# Get list of available implementations from 'cardscraper list'
meta:
package: default
deck: default
model: default
scraping: default
package:
# Output package name
name: sample_package
# Output folder
output_path: ./out/
# Path to the media folder where 'all to include' media files are
# Defaults to null
media: null
deck:
# Deck name
name: Countries
# Deck ID
id: 84269713
model:
# Model name
name: Countries Model
# Model ID
id: 97138426
# CSS styling
css: |
* {
color: #333;
background-color: #fffffa;
}
.q, .a {
text-align: center;
}
.q {
font-size: 5rem;
font-weight: 700;
}
.a {
font-size: 3rem;
}
# Templates
templates:
CountryToInfo: # Template name
# Front template
# Note that you can use query output by {{QueryName}}
qfmt: |
<div class='q'>
{{Country}}
</div>
# Back template
afmt: |
{{FrontSide}}
<hr id=answer>
<div class='a'>
{{Info}}
<a href="https://en.wikipedia.org/w/index.php?search={{Country}}">
more info
</a>
</div>
CapitalToCountry:
qfmt: |
<div class='q'>
{{Capital}}
</div>
afmt: |
{{FrontSide}}
<hr id=answer>
<div class='a'>
{{Country}}<br>
<a href="https://en.wikipedia.org/w/index.php?search={{Capital}}">
more info
</a>
</div>
scraping:
# List of URLs to scrape
urls:
- https://www.scrapethissite.com/pages/simple/
# Queries to run
# Each child query runs inside the parent
queries:
CountryEntry: # Query name
# What to query for
query: .country
# Should we select all elements?
# (querySelector or querySelectorAll)
# Defaults to false
all: true
# JS function to evaluate the selected element(s)
# Defaults to (e) => e.innerText
eval: (e) => e.innerText
# Python regex
# If set, catches the first group with re.DOTALL enabled
# Defaults to null
regex: null
# Result formatting
# 'Hello my name is {}'.format(...)
# Defaults to '{}'
format: "{}"
# Queries to run inside the selected element(s)
# Defaults to null
children:
Country:
query: .country-name
# all: false
# eval: (e) => e.innerText
# regex: null
# format: '{}'
# children: null
Info:
query: .country-info
# all: false
eval: (e) => e.innerHTML
# regex: null
# format: '{}'
# children: null
Capital:
query: .country-capital
# all: false
# eval: (e) => e.innerText
# regex: null
# format: '{}'
# children: null
Plugin system
You can add custom implementations by exposing 'cardscraper.x' entry point in your package
[project.entry-points.'cardscraper.model']
my_impl = 'mypackage:gen_model'
[project.entry-points.'cardscraper.scraping']
my_impl = 'mypackage:gen_notes'
[project.entry-points.'cardscraper.deck']
my_impl = 'mypackage:gen_deck'
[project.entry-points.'cardscraper.package']
my_impl = 'mypackage:gen_package'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cardscraper-0.0.1.tar.gz
(8.3 kB
view hashes)
Built Distribution
Close
Hashes for cardscraper-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed060feb80ffdcd5034eea1a89c2f9a5df86a7aa627e497de06034e432257676 |
|
MD5 | c9bf9d4df3f2d4816e59fbcf7bbada01 |
|
BLAKE2b-256 | 913db60ba5fde90b1d93e487e4d368865a158e6f231ffba61df15deee25db9eb |