Skip to main content

Web scraper that take a TAX Code and return the PEC address of the company

Project description

iniscrapec

iniscrapec is a simple scraper project that take a TAX Code of a company and return the PEC address of it

Tech

iniscrapec uses a number of open source projects to work properly:

  • [pip]==20.2.2
  • [beautifulsoup4]~=4.9.1
  • [mechanize]~=0.4.5
  • [pymongo]~=3.11.0
  • [dnspython]~=2.0.0
  • [python_dotenv]~=0.14.0
  • [setuptools]~=50.3.0

And iniscrapec itself is open source with a public repository on GitHub.

Also it uses a third part service to solve the reCaptcha "I am not a robot"

Installation

iniscrapec requires python 3.7 to run.

How to get it from git

$ git clone https://github.com/riccardopaltrinieri/iniscrapec.git

How to get it from pip

$ pip install iniscrapec

After installation

You need to fill the environment variables in the .env file:

CAP_KEY = "" # The API key given from the site 2capthca.com
DB_USER = "" # The user of the Mongo DB 
DB_PWD = "" # The password of the Mongo DB
DATA_SITEKEY = "" # The captcha code as written in the step 2 of the link below
URL = "https://www.inipec.gov.it/cerca-pec/-/pecs/companies" #the gov website where to search the pec
TAX_EXAMPLE = "" # Variable used for testing and debugging

link on how to use 2captcha

How to run it with a simple [tkinter] gui

$ cd .\path\of\repo\iniscrapec
$ python3 iniscrapec.py

You can also use only the scraper code with

$ cd .\iniscrapec\modules
$ python3 scraper.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iniscrapec-0.0.4.tar.gz (5.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page