Skip to main content

Package to crawl country information of German Foreign Office

Project description

pyreiseamt

The German Foreign Office publishes travel information for nearly all countries in the world. This includes information on security and medical matters. Information is up-to-date and allows to asses whether a certain country is safe to visit or not.

pyreiseamt is designed to crawl the information on https://www.auswaertiges-amt.de/de/ReiseUndSicherheit/reise-und-sicherheitshinweise and present the data in structured JSON format. Also, pyreiseamt can deliver sentiment analysis for every country and every category. This allows you to mine the available data more efficently.

Installation

You can install pyreiseamt via pip.

pip install pyreiseamt

After installation has finished, you have the choice to use pyreiseamt as a CLI tool or input its scraper into one of your scripts.

Usage

pyreiseamt offers a unified point of entry for two basic tasks: list available countries and extract information on one or more countries.

List Available Countries

If you want to get a list of available country names (or want to make sure that you use the correct one), you can use the list command. There are no additonal arguments needed for that:

pyreiseamt list

After fetching the newest data from the Foreign Office's website, you will get a list of all available countries printed to your screen. There are four countries for every row and the single countries are seperated by ' | '.

Extract Information

Assume that you want to crawl information on all available countries. You can use the extract with the -o (output path) argument. -o should point to a json file where the results will be written to. Note that your output file should always end with '.json'

pyreiseamt extract -o ~/all_countries.json

If you want to limit the crawl job to certain countries, you can use the -c argument. A single string should list all countries you want to extract, seperated by a semicolon.

pyreiseamt extract -o ~/select_countries.json -c "Frankreich;Georgien;Griechenland"

This will limit the extraction to France, Georgia, and Greece.

There are two remaining options to the extract command. The presence of -s will calculate the sentiment for every top category for every country. Also, -n will make sure that the top category names are all consistent for every country. The last option is necessary due to the fact that a category might have a different name in one country than in the other (despite the same content). If you want to extract information on all countries but also include sentiment and consistent category names, you could use pyreiseamt like so:

pyreiseamt extract -o ~/all_countries.json -s -n

####Use the Scraper in your own scripts If you prefer to use the built-in crawler in your scripts, you can do so by importing the scraper from the package (assuming that you've installed the package).

from pyreiseamt.scraper import extract_country

url = "https://www.auswaertiges-amt.de/de/ReiseUndSicherheit/australiensicherheit/213920"

australia = extract_country(url)

australia will be a dictionary holding the relevant texts for every top and sub category.

Links

pyreiseamt is a crawler with a CLI to https://www.auswaertiges-amt.de/de/ReiseUndSicherheit/reise-und-sicherheitshinweise. However, right now there is only information on country specific security issues, general travel guidances, and medical conditions. You can use the website to extract more information if needed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyreiseamt-0.0.1.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

pyreiseamt-0.0.1-py3-none-any.whl (212.5 kB view details)

Uploaded Python 3

File details

Details for the file pyreiseamt-0.0.1.tar.gz.

File metadata

  • Download URL: pyreiseamt-0.0.1.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.0 CPython/3.7.3

File hashes

Hashes for pyreiseamt-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b3e23f7f02ffcc9804c270e0d61e978e488cf5f5a26f97434d69655b690589be
MD5 719ae600b5f8e715f58bde4ec1168063
BLAKE2b-256 4810c0838157b570ea4695c04a1403f246de1ba364e3e79d43f34814b7c12af9

See more details on using hashes here.

File details

Details for the file pyreiseamt-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pyreiseamt-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 212.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.0 CPython/3.7.3

File hashes

Hashes for pyreiseamt-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a00088473e3ad3e0641346bb2a04220cfe9148c709cc44d72d1b5bf0cf4f1271
MD5 12906d9e2fe2cbc7ce308c23cab5aef6
BLAKE2b-256 b41a35d15dcefbf892e169c4f2a42145750dbe56055565dff45db65ab2e330ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page