Skip to main content

A data scraping tool for collection and storage of the railway codes used in the UK rail industry

Project description

pyrcs

Author: Qian Fu Twitter URL

PyPI PyPI - Python Version GitHub GitHub code size in bytes PyPI - Downloads

A small web scraper for collecting railway codes and other data used in the UK rail industry.

Installation

pip3 install pyrcs

Note:

  • Make sure you have the most up-to-date version of pip installed.

    python -m pip3 install --upgrade pip
    
  • The installation of one of the dependencies, Python-Levenshtein, requires VC2015 (or above). A workaround is to download and install its .whl file. For example, if you're using Python 3.8 on 64-bit OS, you can download and install "python_Levenshtein-0.12.0-cp38-cp38-win_amd64.whl" manually:

    pip3 install \path\to\python_Levenshtein-0.12.0-cp38-cp38-win_amd64.whl
    

Quick start (Examples)

The following examples provide a quick guide to the use of the package.

1. Get "CRS, NLC, TIPLOC and STANOX Codes"

There are several ways of importing the module/class.

Alternative 1:

If your preferred import style is from <module> import <name>:

from pyrcs.line_data_cls import crs_nlc_tiploc_stanox as ldlc

Or

from pyrcs.line_data import crs_nlc_tiploc_stanox as ldlc

If your prefer import <module>.<name>:

import pyrcs.line_data_cls.crs_nlc_tiploc_stanox as ldlc

After importing the module, you can create a 'class' for the location codes (including all CRS, NLC, TIPLOC, STANME and STANOX) :

location_codes = ldlc.LocationIdentifiers()

Alternative 2 (Used for the following examples):

from pyrcs.line_data import LineData
line_data = LineData()

line_data contains all classes under the category of "Line data". That way, location_codes is equivalent to line_data.LocationIdentifiers.

location_codes = line_data.LocationIdentifiers

1.1 Locations beginning with a given letter

By using the method collect_location_codes_by_initial, you can get the location codes, which start with a specific letter, say 'A' or 'a':

# The input is case-insensitive
location_codes_a = line_data.LocationIdentifiers.collect_location_codes_by_initial('A')

location_codes_a is a dict, with the keys being:

  • 'A'
  • 'Additional_note'
  • 'Last_updated_date'

Their corresponding values are:

  • location_codes_a['A'] is a pandas.DataFrame that contains the table data. You may compare it with the table on the web page.
  • location_codes_a['Additional_note'] is some additional information on the web page (if available).
  • location_codes_a['Last_updated_date'] is the date (str) when the web page was last updated.

1.2 All available location codes in this category

You can also get all available location codes in this category as a whole , using the method fetch_location_codes, which also returns a dict:

location_codes_data = line_data.LocationIdentifiers.fetch_location_codes()

The keys of location_codes_a include:

  • 'Location_codes'
  • 'Latest_updated_date'
  • 'Additional_note'
  • 'Other_systems'

Their corresponding values are:

  • location_codes_data['Location_codes'] is a pandas.DataFrame that contains all table data (from 'A' to 'Z').
  • location_codes_data['Latest_updated_date'] is the latest 'Last_updated_date' (in str) among all initial-specific table data.
  • location_codes_data['Additional_note'] is some important additional information on the web page (if available).
  • location_codes_data['Other_systems'] is a dict for "Other systems".

2. Get "Engineer's Line References (ELRs)"

Now you need to use the classline_data.ELRMileages, which could just be assigned to any simpler variable, e.g.em

em = line_data.ELRMileages

2.1 ELR codes

To get ELR codes starting with a specific letter, say 'A', you can use the method collect_elr_by_initial, which returns a dict.

elr_a = em.collect_elr_by_initial('A')  
# Or elr_a = line_data.ELRMileages.collect_elr_by_initial('a')

The keys of elr_a include:

  • 'A'
  • 'Last_updated_date'

Their corresponding values are:

  • elr_a['A'] is a pandas.DataFrame that contains the table data. You may compare it with the table on the web page.
  • elr_a['Last_updated_date'] is the date (in str) when the web page was last updated.

To get all available ELR codes, by using the method fetch_elr, which also returns a dict:

elr_codes = em.fetch_elr()

The keys of elr_codes include:

  • 'ELRs_mileages'
  • 'Latest_update_date'

Their corresponding values are:

  • elr_codes['ELRs_mileages'] is a pandas.DataFrame that contains all table data (from 'A' to 'Z').
  • elr_codes['Latest_updated_date'] is the latest 'Last_updated_date' (in str) among all initial-specific table data.

2.2 Mileage files

To collect more detailed mileage data for a given ELR, for example, 'AAM', you can use the method fetch_mileage_file, which returns a dict:

em_amm = em.fetch_mileage_file('AAM')

The keys of em_amm include:

  • 'ELR'
  • 'Line'
  • 'Sub-Line'
  • 'AAM'
  • 'Note'

Their corresponding values are:

  • em_amm['ELR'] is the name (in str) of the given ELR
  • em_amm['Line'] is associated line name (in str)
  • em_amm['Sub-Line'] is associated sub line name (in str), if available
  • em_amm['AAM'] is a pandas.DataFrame of the mileage file data

3. Get "Railway stations data"

The data of railway stations belongs to another category - "Other assets"

from pyrcs.other_assets import OtherAssets
other_assets = OtherAssets()

Similarly to Sections 1.1 and 2.1, to get stations data by a given initial letter (say 'A'):

stations_a = other_assets.Stations.collect_station_locations('A')

To get all available stations data:

stations_data = other_assets.Stations.fetch_station_locations()

Both the data types of stations_a and stations_data are dict.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrcs-0.2.6.tar.gz (41.2 kB view hashes)

Uploaded Source

Built Distribution

pyrcs-0.2.6-py3-none-any.whl (1.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page