Mathematics Genealogy Project Spider
Project description
mathgenproject
Mathematics Genealogy Project Spider
A webspider for the Mathematics Genealogy Project.
Installation
Use the package manager pip to install the spider.
pip install mathgenproject
Usage
Define a pipeline through which to process each mathematician returned from the spider.
class MyPipeline(object):
def open_spider(self, spider):
...
def process_item(self, item, spider):
print(item['name'])
return item
def close_spider(self, spider):
...
Run the spider using scrapy's CrawlerProcess
, passing in the mathematician's MGP ID.
from scrapy.crawler import CrawlerProcess
from mathgenproject.spiders import MathGenProjectSpider
process = CrawlerProcess(settings={
'FEED_FORMAT': 'json',
'FEED_URI': 'items.json',
'ITEM_PIPELINES': {
'MyPipeline': 300,
},
})
process.crawl(MathGenProjectSpider, mgp_id='216087')
process.start()
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for mathgenproject-jgolden17-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84d85c6dd74adb1ad6646f3fc8ede4d06a012adc87247af4527c31e3ccfd7610 |
|
MD5 | 82c35cbcc20f8a9bf219d97a115a2337 |
|
BLAKE2b-256 | 8feed465fb2d72a4f1984609ebbdb238736533aa3c3750cfe23e371aa9849849 |
Close
Hashes for mathgenproject_jgolden17-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a706dd17329530377e3286cb8a6cb176084fa968c5dfc5e9bc873c3e04f87c9f |
|
MD5 | 94eeebe4e4666d585045b3fbbd7494d6 |
|
BLAKE2b-256 | 338ffdbfbcb43a68f0a4b2683f4cf22d7f57809338bc7395bdb1798bea4f68c3 |