SEC Web Scraper for the EDGAR API
Project description
sec-web-scraper
A Python based web scraper for the SEC EDGAR database
Overview
This library will for scraping certain financial documents from the EDGAR database such as the 10-K (and it's versions such as 10-K405,10-KSB), 20-F and 40-F.
The two main features of the library will be:
- A document downloader portion that will fetch documents from the EDGAR database based on parameters such as a text query, time period, company ticker, and file type.
- A scraper that will parse sections and information from the retrieved files.
Installation
Run the command below!
pip install sec-web-scraper
Usage
# Downloader
from sec_web_scraper.Downloader import Downloader
# Create new downloader object
d = Downloader()
# input the year range for filing data
d.build_index_sec(2000, 2002)
# After you've built the index, see all forms type filed in that period as a list
d.get_forms()
# If you want to find the cik of company, provide the name (fuzzy match). Returns a list
d.get_company_info('apple')
# If you want all 8-K's filled in the range above.This is a DataFrame
res = d.find_files_by_type('8-K')
#More features to be added!
#Scraper
#With a particular filing
sample_10k = "https://www.sec.gov/Archives/edgar/data/20/0000893220-96-000500.txt"
#Get the raw text
raw_txt = get_document_given_link(sample_10k)
#Get the sections in the document
doc_tags = get_document_tags(raw_txt)
#More features to be added!
References
- Python project template from https://github.com/ColumbiaOSS/example-project-python
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sec-web-scraper-0.1.0.tar.gz
(12.4 kB
view hashes)