Skip to main content

Python code to help read ASEG GDF2 packages

Project description

aseg_gdf2

Python code to help read ASEG GDF2 packages. See the ASEG technical standards page for more information about the file format.

Still very much a work in progress.

Usage

In [1]: import aseg_gdf2

In [2]: gdf = aseg_gdf2.read(r'tests/example_datasets/3bcfc711/GA1286_Waveforms')

In [3]: gdf.field_names()
Out[3]: ['FLTNUM', 'Rx_Voltage', 'Flight', 'Time', 'Tx_Current']

In [4]]: for row in gdf.iterrows():
   ...:     print(row)
   ...:
OrderedDict([('Index', 0), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0052), ('Tx_Current', 0.00176)])
OrderedDict([('Index', 1), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0104), ('Tx_Current', 0.00176)])
OrderedDict([('Index', 2), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0156), ('Tx_Current', 0.00176)])
...

For .dat files that will fit in memory, you can read them into a pandas.DataFrame:

In [5]: gdf.df()
Out[5]:
       FLTNUM  Rx_Voltage  Flight     Time  Tx_Current
0         1.0        -0.0       1   0.0052     0.00176
1         1.0        -0.0       1   0.0104     0.00176
2         1.0        -0.0       1   0.0156     0.00176
3         1.0        -0.0       1   0.0208     0.00176
4         1.0        -0.0       1   0.0260     0.00176
5         1.0        -0.0       1   0.0312     0.00176
...       ...         ...     ...      ...         ...
23034     2.0         0.0       2  59.9687    -0.00170
23035     2.0        -0.0       2  59.9740    -0.00170
23036     2.0        -0.0       2  59.9792    -0.00170
23037     2.0        -0.0       2  59.9844    -0.00170
23038     2.0        -0.0       2  59.9896    -0.00170
23039     2.0        -0.0       2  59.9948    -0.00170

[23040 rows x 5 columns]

For .dat files that are too big for memory, you can use the chunksize= keyword argument to specify the number of rows. Normally you could get away with a few hundred thousand, but for the example we'll use something less:

In [6]: for chunk in gdf.df_chunked(chunksize=10000):
    ...:     print('{} length = {}'.format(type(chunk), len(chunk)))
    ...:
<class 'pandas.core.frame.DataFrame'> length = 10000
<class 'pandas.core.frame.DataFrame'> length = 10000
<class 'pandas.core.frame.DataFrame'> length = 3040

The metadata from the .dfn file is there too:

In [7]: gdf.record_types
Out[7]:
{'': {'fields': [{'cols': 1,
    'comment': '',
    'format': 'F10.1',
    'long_name': 'FlightNumber',
    'name': 'FLTNUM',
    'null': None,
    'unit': '',
    'width': 10},
   {'cols': 1,
    'comment': '',
    'format': 'F10.5',
    'long_name': 'Rx_Voltage',
    'name': 'Rx_Voltage',
    'null': '-99.99999',
    'unit': 'Volt',
    'width': 10},
   {'cols': 1,
    'comment': '',
    'format': 'I6',
    'long_name': 'Flight',
    'name': 'Flight',
    'null': '-9999',
    'unit': '',
    'width': 6},
   {'cols': 1,
    'comment': '',
    'format': 'F10.4',
    'long_name': 'Time',
    'name': 'Time',
    'null': '-999.9999',
    'unit': 'msec',
    'width': 10},
   {'cols': 1,
    'comment': '',
    'format': 'F13.5',
    'long_name': 'Tx_Current',
    'name': 'Tx_Current',
    'null': '-99999.99999',
    'unit': 'Amp',
    'width': 13}],
  'format': None}}

Get the data just for one field/column:

In [8]: gdf.get_field('Time')
Out[8]:
array([  5.20000000e-03,   1.04000000e-02,   1.56000000e-02, ...,
         5.99844000e+01,   5.99896000e+01,   5.99948000e+01])

What about fields which are 2D arrays? Some GDF2 data files have fields with more than one value per row/record. e.g. in this one the last four fields each take up 30 columns:

In [9]: gdf = aseg_gdf2.read(r'tests/example_datasets/9a13704a/Mugrave_WB_MGA52.dfn')

In [10]: print(gdf.dfn_contents)
DEFN   ST=RECD,RT=COMM;RT:A4;COMMENTS:A76
DEFN 1 ST=RECD,RT=;GA_Project:I10:Geoscience Australia airborne survey project number
DEFN 2 ST=RECD,RT=;Job_No:I10:SkyTEM Australia Job Number
DEFN 3 ST=RECD,RT=;Fiducial:F15.2:Fiducial
DEFN 4 ST=RECD,RT=;DATETIME:F18.10:UNIT=days,Decimal days since midnight December 31st 1899
DEFN 5 ST=RECD,RT=;LINE:I10:Line number
DEFN 6 ST=RECD,RT=;Easting:F12.2:NULL=-9999999.99,UNIT=m,Easting (GDA94 MGA Zone 52)
DEFN 7 ST=RECD,RT=;NORTH:F15.2:NULL=-9999999999.99,UNIT=m,Northing (GDA 94 MGA Zone 52)
DEFN 8 ST=RECD,RT=;DTM_AHD:F10.2:NULL=-99999.99,Digital terrain model (AUSGeoid09 datum)
DEFN 9  ST=RECD,RT=;RESI1:F10.3:NULL=-9999.999,Residual of data
DEFN 10 ST=RECD,RT=;HEIGHT:F10.2:NULL=-99999.99,UNIT=m,Laser altimeter measured height of Tx loop centre above ground
DEFN 11 ST=RECD,RT=;INVHEI:F10.2:NULL=-99999.99,UNIT=m,Calulated inversion height
DEFN 12 ST=RECD,RT=;DOI:F10.2:NULL=-99999.99,UNIT=m,Calculated depth of investigation
DEFN 13 ST=RECD,RT=;Elev:30F12.2:NULL=-9999999.99,UNIT=m,Elevation to the top of each layer
DEFN 14 ST=RECD,RT=;Con:30F15.5:NULL=-9999999.99999,UNIT=mS/m,Inverted Conductivity for each layer
DEFN 15 ST=RECD,RT=;Con_doi:30F15.5:NULL=-9999999.99999,UNIT=mS/m, Inverted conductivity for each layer, masked to the depth of investigation
DEFN 16 ST=RECD,RT=;RUnc:30F12.3:NULL=-999999.999,Relative uncertainty of conductivity layer;END DEFN

You can see the field names in the normal manner:

In [11]: gdf.field_names()
Out[11]:
['GA_Project',
 'Job_No',
 'Fiducial',
 'DATETIME',
 'LINE',
 'Easting',
 'NORTH',
 'DTM_AHD',
 'RESI1',
 'HEIGHT',
 'INVHEI',
 'DOI',
 'Elev',
 'Con',
 'Con_doi',
 'RUnc']

Or you can see an "expanded" version of the fields, which is used for the column headings of the data table:

In [12]: gdf.column_names()
Out[12]:
['GA_Project', 'Job_No', 'Fiducial', 'DATETIME', 'LINE', 'Easting', 'NORTH', 'DTM_AHD', 'RESI1',
'HEIGHT', 'INVHEI', 'DOI', 'Elev[0]', 'Elev[1]', 'Elev[2]', 'Elev[3]', 'Elev[4]', 'Elev[5]',
'Elev[6]', 'Elev[7]', 'Elev[8]', 'Elev[9]', 'Elev[10]', 'Elev[11]', 'Elev[12]', 'Elev[13]',
'Elev[14]', 'Elev[15]', 'Elev[16]', 'Elev[17]', 'Elev[18]', 'Elev[19]', 'Elev[20]', 'Elev[21]',
'Elev[22]', 'Elev[23]', 'Elev[24]', 'Elev[25]', 'Elev[26]', 'Elev[27]', 'Elev[28]', 'Elev[29]',
'Con[0]', 'Con[1]', 'Con[2]', 'Con[3]', 'Con[4]', 'Con[5]', 'Con[6]', 'Con[7]', 'Con[8]', 'Con[9]',
'Con[10]', 'Con[11]', 'Con[12]', 'Con[13]', 'Con[14]', 'Con[15]', 'Con[16]', 'Con[17]', 'Con[18]',
'Con[19]', 'Con[20]', 'Con[21]', 'Con[22]', 'Con[23]', 'Con[24]', 'Con[25]', 'Con[26]', 'Con[27]',
'Con[28]', 'Con[29]', 'Con_doi[0]', 'Con_doi[1]', 'Con_doi[2]', 'Con_doi[3]', 'Con_doi[4]',
'Con_doi[5]', 'Con_doi[6]', 'Con_doi[7]', 'Con_doi[8]', 'Con_doi[9]', 'Con_doi[10]', 'Con_doi[11]',
'Con_doi[12]', 'Con_doi[13]', 'Con_doi[14]', 'Con_doi[15]', 'Con_doi[16]', 'Con_doi[17]',
'Con_doi[18]', 'Con_doi[19]', 'Con_doi[20]', 'Con_doi[21]', 'Con_doi[22]', 'Con_doi[23]',
'Con_doi[24]', 'Con_doi[25]', 'Con_doi[26]', 'Con_doi[27]', 'Con_doi[28]', 'Con_doi[29]', 'RUnc[0]',
'RUnc[1]', 'RUnc[2]', 'RUnc[3]', 'RUnc[4]', 'RUnc[5]', 'RUnc[6]', 'RUnc[7]', 'RUnc[8]', 'RUnc[9]',
'RUnc[10]', 'RUnc[11]', 'RUnc[12]', 'RUnc[13]', 'RUnc[14]', 'RUnc[15]', 'RUnc[16]', 'RUnc[17]',
'RUnc[18]', 'RUnc[19]', 'RUnc[20]', 'RUnc[21]', 'RUnc[22]', 'RUnc[23]', 'RUnc[24]', 'RUnc[25]',
'RUnc[26]', 'RUnc[27]', 'RUnc[28]', 'RUnc[29]']

In [13]: gdf.df().head()
Out[13]:
  GA_Project  Job_No   Fiducial      DATETIME    LINE   Easting      NORTH  \
0        1288   10013  3621109.0  42655.910984  112601  948001.6  7035223.1
1        1288   10013  3621110.0  42655.910995  112601  948001.9  7035196.8
2        1288   10013  3621111.0  42655.911007  112601  948001.5  7035169.5
3        1288   10013  3621112.0  42655.911019  112601  948000.6  7035141.6
4        1288   10013  3621113.0  42655.911030  112601  947999.1  7035113.6

  DTM_AHD  RESI1  HEIGHT    ...     RUnc[20]  RUnc[21]  RUnc[22]  RUnc[23]  \
0    354.1  1.091   40.98    ...         1.39      1.76      2.35      3.26
1    353.8  1.101   41.08    ...         1.43      1.84      2.47      3.41
2    353.7  0.813   41.03    ...         1.45      1.88      2.53      3.48
3    353.9  0.567   40.79    ...         1.45      1.87      2.53      3.49
4    354.2  0.522   40.37    ...         1.45      1.88      2.54      3.52

  RUnc[24]  RUnc[25]  RUnc[26]  RUnc[27]  RUnc[28]  RUnc[29]
0      4.45      5.74      6.94      8.00      8.99      98.0
1      4.62      5.90      7.09      8.15      9.15      98.0
2      4.70      5.97      7.16      8.22      9.21      98.0
3      4.71      5.98      7.16      8.21      9.20      98.0
4      4.74      6.01      7.18      8.23      9.22      98.0

[5 rows x 132 columns]

You can retrieve one of the original field arrays using get_field():

In [14]: gdf.get_field('Elev')
Out[14]:
array([[ 354.1,  352.1,  349.8, ..., -105.8, -171.2, -245.7],
       [ 353.8,  351.8,  349.5, ..., -106.1, -171.5, -246. ],
       [ 353.7,  351.7,  349.4, ..., -106.2, -171.6, -246.1],
       ...,
       [ 510.5,  508.5,  506.2, ...,   50.6,  -14.8,  -89.3],
       [ 510.5,  508.5,  506.2, ...,   50.6,  -14.8,  -89.3],
       [ 510.6,  508.6,  506.3, ...,   50.7,  -14.7,  -89.2]])

Or one of the columns:

In [15]: gdf.get_field('Elev[0]')
Out[15]:
array([ 354.1,  353.8,  353.7,  353.9,  354.2,  354.5,  354.6,  354.7,
        354.6,  354.5,  354.3,  354.1,  353.9,  353.8,  353.9,  354. ,
        512.8,  512.6,  512.4,  512.3,  512.3,  512.5,  512.7,  512.9,
        512.9,  512.8,  512.6,  512.4,  512. ,  511.7,  511.4,  511.2,
        511. ,  510.6,  510.5,  510.5,  510.5,  510.6])

You can also retrieve a subset of fields and column names as a pandas.DataFrame using the usecols keyword argument -- you don't necessarily need to retrieve the whole file at once. Note that the multidimensional 'Con' field is expanded into the column names:

In [16]: gdf.df(usecols=['Easting', 'NORTH', 'Con']).head()
Out[16]:
    Easting      NORTH    Con[0]    Con[1]    Con[2]     Con[3]     Con[4]  \
0  948001.6  7035223.1  28.76870  31.88776  46.04052   83.68201  157.48031
1  948001.9  7035196.8  31.06555  35.47357  51.17707   92.08103  165.37126
2  948001.5  7035169.5  38.18251  42.48088  59.91612  103.59474  174.79462
3  948000.6  7035141.6  47.61905  51.84033  70.17544  114.31184  178.79492
4  947999.1  7035113.6  58.58231  61.12469  77.45933  118.04982  173.64126

      Con[5]     Con[6]     Con[7]    ...        Con[20]    Con[21]  \
0  231.53508  242.01355  198.84669    ...      108.63661  145.39110
1  235.73786  237.41690  190.65777    ...      108.95620  144.84357
2  232.07241  225.88660  178.98693    ...      110.29006  146.09204
3  219.92523  212.44954  170.64846    ...      112.81588  148.58841
4  209.51184  204.49898  168.60563    ...      114.48197  150.03751

     Con[22]    Con[23]    Con[24]    Con[25]    Con[26]    Con[27]  \
0  181.29079  191.60759  178.44397  162.31131  152.43902  148.38997
1  179.79144  190.36741  177.99929  162.33766  152.55530  148.47810
2  179.88847  189.35808  177.11654  161.89089  152.46227  148.54427
3  180.31013  187.68769  175.37706  161.00467  152.23017  148.65468
4  180.21265  186.35855  174.27675  160.56519  152.23017  148.83167

     Con[28]    Con[29]
0  147.49263  147.42739
1  147.53615  147.47087
2  147.66686  147.62327
3  147.88524  147.86337
4  148.10427  148.08233

[5 rows x 32 columns]

Project details


Release history Release notifications

This version
History Node

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
aseg_gdf2-0.1.tar.gz (8.8 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page