Scrape data from SEC's EDGAR
Project description
EDGAR
A small library to access files from SEC's edgar.
Installation
pip install edgar
Example
To get a company's latest 5 10-Ks, run
from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)
or
from edgar import Company, TXTML
company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143")
doc = company.get_10K()
text = TXTML.parse_full_10K(doc)
To get all companies and find a specific one, run
from edgar import Edgar
edgar = Edgar()
possible_companies = edgar.find_company_name("Cisco System")
To avoid pull of all company data from sec.gov on Edgar initialization, pass in a local path to the data
from edgar import Edgar
edgar = Edgar("/path/to/cik-lookup-data.txt")
possible_companies = edgar.find_company_name("Cisco System")
To get XBRL data, run
from edgar import Company, XBRL, XBRLElement
company = Company("Oracle Corp", "0001341439")
results = company.get_data_files_from_10K("EX-101.INS", isxml=True)
xbrl = XBRL(results[0])
XBRLElement(xbrl.relevant_children_parsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef
API
Company
Company(name, cik, timeout=10)
- name (company name)
- cik (company CIK number)
- timeout (optional) (default: 10)
Methods
get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str
Returns a url to fetch filings data
- filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
- prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
- ownership: defaults to include. Options are include, exclude, only.
- no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.
get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement
Returns the HTML in the form of lxml.html
- filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
- prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
- ownership: defaults to include. Options are include, exclude, only.
- no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.
get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]
Returns the HTML in the form of lxml.html of concatenation of all the documents in the 10-K
- no_of_documents (default: 1): numer of documents to be retrieved
- When
as_documents
is set toTrue
, it returns-> List[edgar.document.Documents]
a list of Documents
get_10Ks_metadata(self) -> List[dict]
Returns the HTML in the form of a dictionary of concatenation of all the document metadata in the 10-K
get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]
Returns the HTML in the form of lxml.html of the document within 10-K
- document_type: Tye type of document you want, i.e. 10-K, EX-3.2
- no_of_documents (default: 1): numer of documents to be retrieved
get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]
Returns the HTML in the form of lxml.html of the data file within 10-K
- document_type: Tye type of document you want, i.e. EX-101.INS
- no_of_documents (default: 1): numer of documents to be retrieved
- isxml (default: False): by default, things aren't case sensitive and is parsed with
html
inlxml. If this is True, then it is parsed with
etree` which is case sensitive
Class Method
get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement]
Returns a list of strings, each string contains the body of the specified document from input
- tree: lxml.html form that is returned from Company.getAllFilings
- no_of_documents: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.
- debug (default: False): if True, displays the URL and form
- When
as_documents
is set toTrue
, it returns-> List[edgar.document.Documents]
a list of Documents
Edgar
Gets all companies from EDGAR
get_cik_by_company_name(company_name: str) -> str
: Returns the CIK if given the exact name or the company
get_company_name_by_cik(cik: str) -> str
: Returns the company name if given the CIK (with the 000
s)
find_company_name(words: str) -> List[str]
: Returns a list of company names by exact word matching
find_company_name_cik(words: str) -> List[tuple[str, str]]
: Return a list of company names and their CIK values
match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]
: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score
top (default: 5)
returns the top number of fuzzy matches. If set toNone
, it'll return the whole list (which is a lot)
XBRL
Parses data from XBRL
Properties
relevant_children
- get children that are not
context
relevant_children_parsed
- get children that are not
context
,unit
,schemaRef
- cleans tags
Documents
Filing and Documents Details for the SEC EDGAR Form (such as 10-K)
Documents(url, timeout=10)
Properties
url: str
: URL of the document
content: dict
: Dictionary of meta data of the document
content['Filing Date']: str
: Document filing date
content['Accepted']: str
: Document accepted datetime
content['Period of Report']: str
: The date period that the document is for
element: lxml.html.HtmlElement
: The HTML element for the Document (from the url) so it can be further parsed
Contribution
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file edgar-5.6.3.tar.gz
.
File metadata
- Download URL: edgar-5.6.3.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.1.0 keyring/24.3.1 pkginfo/1.10.0 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.31.0 rfc3986/1.5.0 tqdm/4.66.2 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e8ca83809a72ee3872736fac211b01fada7d3228052a088437717147f445a07 |
|
MD5 | dfdd5f639ac0b9f567abad987371ac9d |
|
BLAKE2b-256 | fc626aad0e8c98c11935b90e4c256bdc209fd38ebfe4c4dfd4ffe7a4f14cad07 |
File details
Details for the file edgar-5.6.3-py3-none-any.whl
.
File metadata
- Download URL: edgar-5.6.3-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.1.0 keyring/24.3.1 pkginfo/1.10.0 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.31.0 rfc3986/1.5.0 tqdm/4.66.2 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f4d1c299c8ef61e4be40812d24588ff6d230ed16e884083204a5601f6a465ba |
|
MD5 | b95e999d6713697f85d1c445197e8a31 |
|
BLAKE2b-256 | d0b436966cabbaddc606111c59adb6b0fa25b615679089fcbc9bbfc167e2f582 |