Skip to main content

Django app for Texas higher education data

Project description

The Texas Higher Education Data Project
---------------------------------------
[![Build Status](https://travis-ci.org/texastribune/the-dp.svg)](https://travis-ci.org/texastribune/the-dp)

## A very rough guide to starting development

### Example `.env` file for environment variables:

```
DJANGO_SETTINGS_MODULE=exampleproject.settings.dev
DATABASE_URL=postgis:///tx_highered
```

Complete guide to getting started (remove steps to suit you):

```bash
# install postgresql libpq-dev

git clone $REPOSITORY && cd $PATH
mkvirtualenv tx_higher_ed
setvirtualenvproject
add2virtualenv .
pip install -r requirements.txt

# if you need to create a database:
# `postdoc` greatly simplifies connecting to Docker databases
pip install postdoc
phd createdb --encoding=UTF8 -T template0
echo "CREATE EXTENSION postgis;" | phd psql
echo "CREATE EXTENSION postgis_topology;" | phd psql

# or if you need to reset your database:
make resetdb

# syncdb and load fixtures
make syncdb

#######################################################################
# You can stop at this point if you're just playing with the project. #
#######################################################################

# if using 2012 data, bump it up to 2014 standards
python tx_highered/scripts/2014_update.py

# get ipeds data, requires https://github.com/texastribune/ipeds_reporter
../ipeds_reporter/csv_downloader/csv_downloader.py \
--uid data/ipeds/ipeds_institutions.uid --mvl data/ipeds
mv ~/Downloads/Data_*.csv data/ipeds
# get thecb data
cd data && make all
# load data
# timing: 10m25.069s
make load
# post-process the data
python exampleproject/manage.py tx_highered_process


####################################
# placeholder for post-2014 update #
####################################
# the 2012->2014 specific stuff can go out and the above importing
# instructions can get updated
```

### Database

This project currently requires a PostGIS database (hopefully not for long):

```bash
$ phd createdb
$ phd psql

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;
```

#### Moving data between databases

You can do a sql dump to move data from one postgres database to another
(excluding geo info):

```bash
$ phd SOURCE_DATABASE_URL pg_dump --no-owner --no-acl --table=tx_highered* --clean > tx_highered.sql
$ phd DEST_DATABASE_URL psql -f tx_highered.sql
```

#### After deploy

1. Freeze the current data in a fixture
1. Edit the tx_highered_YYYY.json.gz make task
2. Run the task to save the data
2. Adjust the loading scripts to reference the new fixture
3. Deprecate (or delete) any one-time data migration scripts, e.g.
2014_update.py won't be necessary after 2015


Getting Data from the IPEDS Data Center
-----------------
When it asks you for an Institution, enter a list of UnitIDs generated by:

list(Institution.objects.filter(ipeds_id__isnull=False).values_list('ipeds_id', flat=True))

Getting Data from the Texas Higher Education Coordinating Board
------------------
If you want to regrab data from THECB's web site, first find the data file that you want to re-grab.
It will be named something like "top_10_percent.html". There will also be a file called "top_10_percent.POST". From that file you can recreate the report with the command:

curl -X POST -d @top_10_percent.POST http://www.txhighereddata.org/interactive/accountability/InteractiveGenerate.cfm -s -v > blahblahblah.html

If you need to modify the report, you can reverse engineer it from the POST data and the form markup.




(c) 2012 The Texas Tribune

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tx_highered, version 0.3.5
Filename, size File type Python version Upload date Hashes
Filename, size tx_highered-0.3.5-py2-none-any.whl (848.4 kB) File type Wheel Python version 2.7 Upload date Hashes View hashes
Filename, size tx_highered-0.3.5.tar.gz (783.3 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page