A data scraping tool for collection and storage of the railway codes used in the UK rail industry
Project description
pyrcs
A small web scraper for collecting railway codes and other data used in the UK rail industry.
Installation
pip3 install pyrcs
Note:
-
Make sure you have the most up-to-date version of
pip
installed.python -m pip3 install --upgrade pip
-
The installation of one of the dependencies,
Python-Levenshtein
, requires VC2015 (or above). A workaround is to download and install its .whl file. For example, if you're using Python 3.8 on 64-bit OS, you can download and install "python_Levenshtein-0.12.0-cp38-cp38-win_amd64.whl" manually:pip3 install \path\to\python_Levenshtein-0.12.0-cp38-cp38-win_amd64.whl
Quick start (Examples)
The following examples provide a quick guide to the use of the package.
1. Get "CRS, NLC, TIPLOC and STANOX Codes"
There are several ways of importing the module/class.
Alternative 1:
If your preferred import style is from <module> import <name>
:
from pyrcs.line_data_cls import crs_nlc_tiploc_stanox as ldlc
Or
from pyrcs.line_data import crs_nlc_tiploc_stanox as ldlc
If your prefer import <module>.<name>
:
import pyrcs.line_data_cls.crs_nlc_tiploc_stanox as ldlc
After importing the module, you can create a 'class' for the location codes (including all CRS, NLC, TIPLOC, STANME and STANOX) :
location_codes = ldlc.LocationIdentifiers()
Alternative 2 (Used for the following examples):
from pyrcs.line_data import LineData
line_data = LineData()
line_data
contains all classes under the category of "Line data". That way, location_codes
is equivalent to line_data.LocationIdentifiers
.
location_codes = line_data.LocationIdentifiers
1.1 Locations beginning with a given letter
By using the method collect_location_codes_by_initial
, you can get the location codes, which start with a specific letter, say 'A'
or 'a'
:
# The input is case-insensitive
location_codes_a = line_data.LocationIdentifiers.collect_location_codes_by_initial('A')
location_codes_a
is a dict
, with the keys being:
'A'
'Additional_note'
'Last_updated_date'
Their corresponding values are:
location_codes_a['A']
is apandas.DataFrame
that contains the table data. You may compare it with the table on the web page.location_codes_a['Additional_note']
is some additional information on the web page (if available).location_codes_a['Last_updated_date']
is the date (str
) when the web page was last updated.
1.2 All available location codes in this category
You can also get all available location codes in this category as a whole , using the method fetch_location_codes
, which also returns a dict
:
location_codes_data = line_data.LocationIdentifiers.fetch_location_codes()
The keys of location_codes_a
include:
'Location_codes'
'Latest_updated_date'
'Additional_note'
'Other_systems'
Their corresponding values are:
location_codes_data['Location_codes']
is apandas.DataFrame
that contains all table data (from 'A' to 'Z').location_codes_data['Latest_updated_date']
is the latest 'Last_updated_date' (instr
) among all initial-specific table data.location_codes_data['Additional_note']
is some important additional information on the web page (if available).location_codes_data['Other_systems']
is adict
for "Other systems".
2. Get "Engineer's Line References (ELRs)"
Now you need to use the classline_data.ELRMileages
, which could just be assigned to any simpler variable, e.g.em
em = line_data.ELRMileages
2.1 ELR codes
To get ELR codes starting with a specific letter, say 'A'
, you can use the method collect_elr_by_initial
, which returns a dict
.
elr_a = em.collect_elr_by_initial('A')
# Or elr_a = line_data.ELRMileages.collect_elr_by_initial('a')
The keys of elr_a
include:
'A'
'Last_updated_date'
Their corresponding values are:
elr_a['A']
is apandas.DataFrame
that contains the table data. You may compare it with the table on the web page.elr_a['Last_updated_date']
is the date (instr
) when the web page was last updated.
To get all available ELR codes, by using the method fetch_elr
, which also returns a dict
:
elr_codes = em.fetch_elr()
The keys of elr_codes
include:
'ELRs_mileages'
'Latest_update_date'
Their corresponding values are:
elr_codes['ELRs_mileages']
is apandas.DataFrame
that contains all table data (from 'A' to 'Z').elr_codes['Latest_updated_date']
is the latest 'Last_updated_date' (instr
) among all initial-specific table data.
2.2 Mileage files
To collect more detailed mileage data for a given ELR, for example, 'AAM'
, you can use the method fetch_mileage_file
, which returns a dict
:
em_amm = em.fetch_mileage_file('AAM')
The keys of em_amm
include:
'ELR'
'Line'
'Sub-Line'
'AAM'
'Note'
Their corresponding values are:
em_amm['ELR']
is the name (instr
) of the given ELRem_amm['Line']
is associated line name (instr
)em_amm['Sub-Line']
is associated sub line name (instr
), if availableem_amm['AAM']
is apandas.DataFrame
of the mileage file data
3. Get "Railway stations data"
The data of railway stations belongs to another category - "Other assets"
from pyrcs.other_assets import OtherAssets
other_assets = OtherAssets()
Similarly to Sections 1.1 and 2.1, to get stations data by a given initial letter (say 'A'
):
stations_a = other_assets.Stations.collect_station_locations('A')
To get all available stations data:
stations_data = other_assets.Stations.fetch_station_locations()
Both the data types of stations_a
and stations_data
are dict
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.