Skip to main content

Mathematics Genealogy Project Spider

Project description

mathgenproject

Mathematics Genealogy Project Spider

A webspider for the Mathematics Genealogy Project.

Installation

Use the package manager pip to install the spider.

pip install mathgenproject

Usage

Define a pipeline through which to process each mathematician returned from the spider.

class MyPipeline(object):
    def open_spider(self, spider):
        ...

    def process_item(self, item, spider):
        print(item['name'])
        return item

    def close_spider(self, spider):
        ...

Run the spider using scrapy's CrawlerProcess, passing in the mathematician's MGP ID.

from scrapy.crawler import CrawlerProcess
from mathgenproject.spiders import MathGenProjectSpider

process = CrawlerProcess(settings={
    'FEED_FORMAT': 'json',
    'FEED_URI': 'items.json',
    'ITEM_PIPELINES': {
        'MyPipeline': 300,
    },
})

process.crawl(MathGenProjectSpider, mgp_id='216087')
process.start()

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for mathgenproject-jgolden17, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size mathgenproject_jgolden17-0.1.0-py2.py3-none-any.whl (7.9 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size mathgenproject-jgolden17-0.1.0.tar.gz (5.1 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page