Web scraper that take a TAX Code and return the PEC address of the company
Project description
iniscrapec
iniscrapec is a simple scraper project that take a TAX Code of a company and return the PEC address of it
Tech
iniscrapec uses a number of open source projects to work properly:
- [pip]==20.2.2
- [beautifulsoup4]~=4.9.1
- [mechanize]~=0.4.5
- [pymongo]~=3.11.0
- [dnspython]~=2.0.0
- [python_dotenv]~=0.14.0
- [setuptools]~=50.3.0
And iniscrapec itself is open source with a public repository on GitHub.
Also it uses a third part service to solve the reCaptcha "I am not a robot"
Installation
iniscrapec requires python 3.7 to run.
How to get it from git
$ git clone https://github.com/riccardopaltrinieri/iniscrapec.git
You can also install it from pip with
*pip install iniscrapec* but something doesn't work well
After installation
You need to fill the environment variables in the .env file:
CAP_KEY = "" # The API key given from the site 2capthca.com
DB_USER = "" # The user of the Mongo DB
DB_PWD = "" # The password of the Mongo DB
DATA_SITEKEY = "" # The captcha code as written in the step 2 of the link below
URL = "https://www.inipec.gov.it/cerca-pec/-/pecs/companies" #the gov website where to search the pec
TAX_EXAMPLE = "" # Variable used for testing and debugging
How to run it with a simple [tkinter] gui
$ cd .\path\of\repo\iniscrapec
$ python3 iniscrapec.py
You can also use only the scraper code with
$ cd .\iniscrapec\modules
$ python3 scraper.py
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
iniscrapec-0.0.3.tar.gz
(5.8 kB
view hashes)