Skip to main content

A small web scraper for collecting the railway codes used in the UK rail industry

Project description

pyrcs

Author: Qian Fu Twitter URL

PyPI PyPI - Python Version GitHub GitHub code size in bytes PyPI - Downloads

A small web scraper for collecting railway codes and other data used in the UK rail industry.


Contents


Installation

pip install --upgrade pyrcs

Note:

  • Make sure you have the most up-to-date version of pip installed.

    python -m pip install --upgrade pip
    
  • Python-Levenshtein, one of the dependencies of this package, may fail to be installed on a Windows OS without VC2015 (or above). A workaround is to download and install its .whl file. In this case, you should go for python_Levenshtein-0.12.0-cp37-cp37m-win_amd64.whl if you're using Python 3.7 on 64-bit OS:

    pip install --upgrade \path\to\python_Levenshtein-0.12.0-cp37-cp37m-win_amd64.whl
    

Quick start (Examples)

The following examples may provide a quick guide to the usage of the package.

1. CRS, NLC, TIPLOC and STANOX Codes

If your preferred import style is from <module> import <name>:

from pyrcs.line_data_cls import crs_nlc_tiploc_stanox as ldlc

If your preferred import style is import <module>.<name>:

import pyrcs.line_data_cls.crs_nlc_tiploc_stanox as ldlc

After importing the module, you can create a 'class' for the location codes (including all CRS, NLC, TIPLOC, STANME and STANOX) :

location_codes = ldlc.LocationIdentifiers()

Given different preferences, there are several alternative ways of importing the module.

Alternative 1:

from pyrcs.line_data import crs_nlc_tiploc_stanox as ldlc
location_codes = ldlc.LocationIdentifiers()

Alternative 2 (Preferred and used for the following examples):

from pyrcs.line_data import LineData
line_data_cls = LineData()  # contains all classes under the category of 'Line data'
location_codes = line_data_cls.LocationIdentifiers

1.1 Locations beginning with a given letter

You can get the location codes starting with a specific letter, say 'A' or 'a', by using the methodcollect_location_codes_by_initial, which returns a dict.

# The input is case-insensitive
location_codes_a = location_codes.collect_location_codes_by_initial('A')

The keys of location_codes_a include:

  • 'A'
  • 'Last_updated_date'
  • 'Additional_note'

The corresponding values are:

  • location_codes_a['A'] is a pandas.DataFrame that contains the table data. You may compare it with the table on the web page: http://www.railwaycodes.org.uk/crs/CRSa.shtm
  • location_codes_a['Last_updated_date'] is the date (in str) when the web page was last updated
  • location_codes_a['Additional_note'] is some important additional information on the web page (if available)

1.2 All available location codes in this category

You can also get all available location codes in this category as a whole , using the method fetch_location_codes, which also returns a dict:

location_codes_data = location_codes.fetch_location_codes()

The keys of location_codes_a include:

  • 'Location_codes'
  • 'Latest_updated_date'
  • 'Additional_note'
  • 'Other_systems'

The corresponding values are:

  • location_codes_data['Location_codes'] is a pandas.DataFrame that contains all table data (from 'A' to 'Z')
  • location_codes_data['Latest_updated_date'] is the latest 'Last_updated_date' (in str) among all initial-specific table data
  • location_codes_data['Additional_note'] is some important additional information on the web page (if available)
  • location_codes_data['Other_systems'] is a dict for Other systems

2. Engineer's Line References (ELRs)

em = line_data_cls.ELRMileages

2.1 ELR codes

To get ELR codes starting with a specific letter, say 'A', by using the method collect_elr_by_initial, which returns a dict.

elr_a = em.collect_elr_by_initial('A')  # em.collect_elr_by_initial('a')

The keys of elr_a include:

  • 'A'
  • 'Last_updated_date'

The corresponding values are:

  • elr_a['A'] is a pandas.DataFrame that contains the table data. You may compare it with the table on the web page: http://www.railwaycodes.org.uk/elrs/elra.shtm
  • elr_a['Last_updated_date'] is the date (in str) when the web page was last updated

To get all available ELR codes, by using the method fetch_elr, which returns a dict:

elr_codes = em.fetch_elr()

The keys of elr_codes include:

  • 'ELRs_mileages'
  • 'Latest_updated_date'

The corresponding values are:

  • elr_codes['ELRs_mileages'] is a pandas.DataFrame that contains all table data (from 'A' to 'Z')
  • elr_codes['Latest_updated_date'] is the latest 'Last_updated_date' (in str) among all initial-specific table data

2.2 Mileage files

To collect more detailed mileage data for a given ELR, say 'AAM', by using the method fetch_mileage_file, which returns a dict:

em_amm = em.fetch_mileage_file('AAM')

The keys of em_amm include:

  • 'ELR'
  • 'Line'
  • 'Sub-Line'
  • 'AAM'

The corresponding values are:

  • em_amm['ELR'] is the name (in str) of the given ELR
  • em_amm['Line'] is associated line name (in str)
  • em_amm['Sub-Line'] is associated sub line name (in str), if available
  • em_amm['AAM'] is a pandas.DataFrame of the mileage file data

3. Railway stations data

The data of railway stations belongs to another category, 'Other assets'

from pyrcs.other_assets import OtherAssets
other_assets_cls = OtherAssets()

Similar to getting 'CRS, NLC, TIPLOC and STANOX Codes' above, to get stations data by a given initial letter (say 'A'):

stations_a = other_assets_cls.Stations.collect_station_locations('A')

To get all available stations data:

stations = other_assets_cls.Stations.fetch_station_locations()

The data type of both stations_a and stations are dict.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrcs-0.2.2.tar.gz (40.3 kB view hashes)

Uploaded Source

Built Distribution

pyrcs-0.2.2-py3-none-any.whl (1.7 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page