Skip to main content

A light weight tool to crawl stock data from yahoo finance.

Project description

py-stock-crawler

A multithreading web crawler that retrieves stock data from yahoo finance.

Usage

Extremely easy to install !!! No extra C or binary libs required !!!

Only python 3 supported.

pip install rookie-stock-crawler

Use four lines to create a multi-thread crawler and save the data to local as json files.

from rookie_stock_crawler import StockCrawler

sc = StockCrawler(['MSFT', 'GOOG', 'AMZN', 'AAPL'])
sc.start()
sc.save_all()

The crawler object is iterable:

for item in sc:
    print(item.get())

Settings

class StockCrawler(symbol_list, concurrent=5, auto_save=False, auto_sleep=None)[Source]

symbol_list

A list containing the symbols you want to crawl. The symbol must exist at yahoo finance or it will print some message to notify you(will not raise an exception).

concurrent

An integer. The number of processes the program will start at one time to crawl data. Each process retrieves one stock.

auto_save

True or False. If set to True, the data of a stock will save instantly after crawled.

auto_sleep

None, True, or any positive integer. This param decides how many seconds the crawler will sleep after yahoo start to return 404 response (which means your client has reached its accessing limits).

If set to true, it will sleep 600 seconds by default.

Utilities

The whole package is designed to be detachable. All methods and object can be imported and used independently.

class Stock[Source]

The main object stored in the crawler. The variable item in the code above is a Stock object. Which means you can access its attributes directly or use other methods.

from rookie_stock_crawler.stock import Stock

st = Stock('AAPL')
st.retrieve()
st.save()
print(st.get())

This is an example of creating a single small crawler without multi-thread.

rookie_stock_crawler.utils[Source]

As we keep on breaking down the modules, we can import the methods whick Stock used to retrieve data. They works almost like a pure function(as long as your home router didn't explode).

from rookie_stock_crawler.utils import get_financial, get_statistic, get_historical

symbol = 'AAPL'
print(get_financial(symbol))
print(get_statistic(symbol))
print(get_historical(symbol))

All methods return a tuple of length 2. The first element is stock data(a list or dict), and the second one is the latest date of the data (e.g. The latest financial of Apple Inc. was released on 2018-9-29).

Exceptions

This package do not offer any customized exceptions. However, all exceptions raised during crawling are caught and printed with a prefix tag like "[Error]". This is designed not to interrupt the crawling, in which case all the data will lose if you do not set the auto_save option to True.

All the exceptions raised outside crawling will still interrupt the program.

Special Instructions

Yahoo finance no longer maintain its API or YQL query. As a result, we cannot know the exact limit of accesing frequency. The crawler actually get the data by directly sending request to the server, which is exactly the same as you open a browser and visit the yahoo website.

In other words, you cannot crawl huge amount of data in a short time. It's already enough for individual developers and crawler fans. But if you want to get faster, the package also provide a distributed version to run on different servers. The whole solution includes a Django server and a front end.

If you are interested, you can visit my other repos, or else you can try other methods like fake-useragent (🤣that won't work) or global proxies(🤪may also not work), etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rookie-stock-crawler-0.0.3.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

rookie_stock_crawler-0.0.3-py3-none-any.whl (10.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page