Skip to main content

Mathematics Genealogy Project Spider

Project description

mathgenproject

Mathematics Genealogy Project Spider

A webspider for the Mathematics Genealogy Project.

Installation

Use the package manager pip to install the spider.

pip install mathgenproject

Usage

Define a pipeline through which to process each mathematician returned from the spider.

class MyPipeline(object):
    def open_spider(self, spider):
        ...

    def process_item(self, item, spider):
        print(item['name'])
        return item

    def close_spider(self, spider):
        ...

Run the spider using scrapy's CrawlerProcess, passing in the mathematician's MGP ID.

from scrapy.crawler import CrawlerProcess
from mathgenproject.spiders import MathGenProjectSpider

process = CrawlerProcess(settings={
    'FEED_FORMAT': 'json',
    'FEED_URI': 'items.json',
    'ITEM_PIPELINES': {
        'MyPipeline': 300,
    },
})

process.crawl(MathGenProjectSpider, mgp_id='216087')
process.start()

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mathgenproject-jgolden17-0.1.0.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

mathgenproject_jgolden17-0.1.0-py2.py3-none-any.whl (7.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page