Skip to main content

Download and store MTA turnstile data

Project description

# pymtattl

## Introduction

Download and store MTA Turnstile Data in text files or a SQLite database

Automate downloading of turnstile entry/exit data from MTA. Allow to save data within requested time frame into a SQlite database and reshape data prior to 10/18/2014 into a more “relational” format. Considering to support more database types.

MTA Turnstile Data:

## Table of Contents

  • [Installation](#installation)
  • [Download](#download)
    • [Urls](#urls)
    • [Text Files](#text-files)
    • [SQlite Database](#sqlite-database)
  • [Caveats](#caveats)
  • [To-Do](#to-do)

## Installation

pip install pymtattl

## Requirements

  • Written for Python 3! Feel free to test and contribute using Python 2!
  • Requires bs4, Pandas

## Download

### Initate MTADownloader

from import MTADownloader mta_downlowder = MTADownloader(work_dir=’Current’, start=141018, end=None)


  • work_dir
    • Type: String
    • specify full folder directory to store downloaded data
    • ‘Current’: default, uses current working directory os.getcwd()
    • Or a specific valid directory
  • start/end
    • Type: Integer or None
    • define the date range to pull data files (recommend testing with small date ranges, as downloading all files might be slow)
    • Example (yymmdd) for 2014-10-18: 141018

### Urls

Get urls for data within date range and resource files (description, name key)

urls = mta_downloader.get_urls(keep_urls=True)
  • Set keep_url = True to save the returned urls in a text file data_urls.txt.
  • Returns list of url strings

### Text Files

Download requested data as separate text files

dat_dir = mta_downloader.download_to_txt()
  • Default create and store in a new folder data under working directory
  • data_folder (optional): provide a custom folder name
  • Returns data folder directory

### SQlite Database

Reformat All local data or download requested data and store in a SQLite database

db_path = mta_downloader.download_to_db()
  • Create a SQlite database data.db under working directory and 3 tables
    • turnstile: holds turnstile data
    • name_keys: a matching table to lookup station name given remote and booth
    • file_names: names of data files that are already in turnstile table
  • data_path:
    • ‘’ | None: must run download_to_txt() first, then the function uses text files within data folder directory (instance attribute)
    • Otherwise, specify a full data folder directory to search for existing data text files
  • If can not find local text files, choose to download text files first or directly store in the database (use with caution, could be very slow!)
  • Returns database directory

## Caveats

  • Some know data issues and these rows will be skipped while building the database
    • In Turnstile_120428.txt, one line with empty (‘’) exit number
    • In Turnstile_120714.txt, first few lines could not be parsed

## To-Do

  • De-cumulate entry and exit numbers, and store data within selected date range into a new table
  • A Summary table (ie. number of booth per station, average daily station entries/exits, …) for “cleaned” data table above
  • More to come…

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymtattl-0.1.3.tar.gz (5.8 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page