A small web scraper for collecting the railway codes used in the UK rail industry
Project description
pyrcs
A small web scraper for collecting railway codes and other data used in the UK rail industry.
Installation
pip install --upgrade pyrcs
Note:
-
Make sure you have the most up-to-date version of
pip
installed.python -m pip install --upgrade pip
-
Python-Levenshtein
, one of the dependencies of this package, may fail to be installed on a Windows OS without VC2015 (or above). A workaround is to download and install its .whl file. In this case, you should go forpython_Levenshtein-0.12.0-cp37-cp37m-win_amd64.whl
if you're using Python 3.7 on 64-bit OS:pip install --upgrade \path\to\python_Levenshtein-0.12.0-cp37-cp37m-win_amd64.whl
Quick start (Examples)
The following examples may provide a quick guide to the usage of the package.
1. CRS, NLC, TIPLOC and STANOX Codes
If your preferred import style is from <module> import <name>
:
from pyrcs.line_data_cls import crs_nlc_tiploc_stanox as ldlc
If your preferred import style is import <module>.<name>
:
import pyrcs.line_data_cls.crs_nlc_tiploc_stanox as ldlc
After importing the module, you can create a 'class' for the location codes (including all CRS, NLC, TIPLOC, STANME and STANOX) :
location_codes = ldlc.LocationIdentifiers()
Given different preferences, there are several alternative ways of importing the module.
Alternative 1:
from pyrcs.line_data import crs_nlc_tiploc_stanox as ldlc
location_codes = ldlc.LocationIdentifiers()
Alternative 2 (Preferred and used for the following examples):
from pyrcs.line_data import LineData
line_data_cls = LineData() # contains all classes under the category of 'Line data'
location_codes = line_data_cls.LocationIdentifiers
1.1 Locations beginning with a given letter
You can get the location codes starting with a specific letter, say 'A' or 'a', by using the methodcollect_location_codes_by_initial
, which returns a dict
.
# The input is case-insensitive
location_codes_a = location_codes.collect_location_codes_by_initial('A')
The keys of location_codes_a
include:
'A'
'Last_updated_date'
'Additional_note'
The corresponding values are:
location_codes_a['A']
is apandas.DataFrame
that contains the table data. You may compare it with the table on the web page: http://www.railwaycodes.org.uk/crs/CRSa.shtmlocation_codes_a['Last_updated_date']
is the date (instr
) when the web page was last updatedlocation_codes_a['Additional_note']
is some important additional information on the web page (if available)
1.2 All available location codes in this category
You can also get all available location codes in this category as a whole , using the method fetch_location_codes
, which also returns a dict
:
location_codes_data = location_codes.fetch_location_codes()
The keys of location_codes_a
include:
'Location_codes'
'Latest_updated_date'
'Additional_note'
'Other_systems'
The corresponding values are:
location_codes_data['Location_codes']
is apandas.DataFrame
that contains all table data (from 'A' to 'Z')location_codes_data['Latest_updated_date']
is the latest 'Last_updated_date' (instr
) among all initial-specific table datalocation_codes_data['Additional_note']
is some important additional information on the web page (if available)location_codes_data['Other_systems']
is adict
for Other systems
2. Engineer's Line References (ELRs)
em = line_data_cls.ELRMileages
2.1 ELR codes
To get ELR codes starting with a specific letter, say 'A'
, by using the method collect_elr_by_initial
, which returns a dict
.
elr_a = em.collect_elr_by_initial('A') # em.collect_elr_by_initial('a')
The keys of elr_a
include:
'A'
'Last_updated_date'
The corresponding values are:
elr_a['A']
is apandas.DataFrame
that contains the table data. You may compare it with the table on the web page: http://www.railwaycodes.org.uk/elrs/elra.shtmelr_a['Last_updated_date']
is the date (instr
) when the web page was last updated
To get all available ELR codes, by using the method fetch_elr
, which returns a dict
:
elr_codes = em.fetch_elr()
The keys of elr_codes
include:
'ELRs_mileages'
'Latest_updated_date'
The corresponding values are:
elr_codes['ELRs_mileages']
is apandas.DataFrame
that contains all table data (from 'A' to 'Z')elr_codes['Latest_updated_date']
is the latest 'Last_updated_date' (instr
) among all initial-specific table data
2.2 Mileage files
To collect more detailed mileage data for a given ELR, say 'AAM'
, by using the method fetch_mileage_file
, which returns a dict
:
em_amm = em.fetch_mileage_file('AAM')
The keys of em_amm
include:
'ELR'
'Line'
'Sub-Line'
'AAM'
The corresponding values are:
em_amm['ELR']
is the name (instr
) of the given ELRem_amm['Line']
is associated line name (instr
)em_amm['Sub-Line']
is associated sub line name (instr
), if availableem_amm['AAM']
is apandas.DataFrame
of the mileage file data
3. Railway stations data
The data of railway stations belongs to another category, 'Other assets'
from pyrcs.other_assets import OtherAssets
other_assets_cls = OtherAssets()
Similar to getting 'CRS, NLC, TIPLOC and STANOX Codes' above, to get stations data by a given initial letter (say 'A'):
stations_a = other_assets_cls.Stations.collect_station_locations('A')
To get all available stations data:
stations = other_assets_cls.Stations.fetch_station_locations()
The data type of both stations_a
and stations
are dict
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.