Python code to help read ASEG GDF2 packages
Project description
aseg_gdf2
Python code to help read ASEG GDF2 packages. See the ASEG technical standards page for more information about the file format.
Still very much a work in progress.
Usage
In [1]: import aseg_gdf2
In [2]: gdf = aseg_gdf2.read(r'tests/example_datasets/3bcfc711/GA1286_Waveforms')
In [3]: gdf.field_names()
Out[3]: ['FLTNUM', 'Rx_Voltage', 'Flight', 'Time', 'Tx_Current']
In [4]]: for row in gdf.iterrows():
...: print(row)
...:
OrderedDict([('Index', 0), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0052), ('Tx_Current', 0.00176)])
OrderedDict([('Index', 1), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0104), ('Tx_Current', 0.00176)])
OrderedDict([('Index', 2), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0156), ('Tx_Current', 0.00176)])
...
For .dat files that will fit in memory, you can read them into a pandas.DataFrame:
In [5]: gdf.df()
Out[5]:
FLTNUM Rx_Voltage Flight Time Tx_Current
0 1.0 -0.0 1 0.0052 0.00176
1 1.0 -0.0 1 0.0104 0.00176
2 1.0 -0.0 1 0.0156 0.00176
3 1.0 -0.0 1 0.0208 0.00176
4 1.0 -0.0 1 0.0260 0.00176
5 1.0 -0.0 1 0.0312 0.00176
... ... ... ... ... ...
23034 2.0 0.0 2 59.9687 -0.00170
23035 2.0 -0.0 2 59.9740 -0.00170
23036 2.0 -0.0 2 59.9792 -0.00170
23037 2.0 -0.0 2 59.9844 -0.00170
23038 2.0 -0.0 2 59.9896 -0.00170
23039 2.0 -0.0 2 59.9948 -0.00170
[23040 rows x 5 columns]
For .dat files that are too big for memory, you can use the chunksize=
keyword argument to specify the number of rows. Normally you could get away with a few hundred thousand, but for the example we'll use something less:
In [6]: for chunk in gdf.df_chunked(chunksize=10000):
...: print('{} length = {}'.format(type(chunk), len(chunk)))
...:
<class 'pandas.core.frame.DataFrame'> length = 10000
<class 'pandas.core.frame.DataFrame'> length = 10000
<class 'pandas.core.frame.DataFrame'> length = 3040
The metadata from the .dfn file is there too:
In [7]: gdf.record_types
Out[7]:
{'': {'fields': [{'cols': 1,
'comment': '',
'format': 'F10.1',
'long_name': 'FlightNumber',
'name': 'FLTNUM',
'null': None,
'unit': '',
'width': 10},
{'cols': 1,
'comment': '',
'format': 'F10.5',
'long_name': 'Rx_Voltage',
'name': 'Rx_Voltage',
'null': '-99.99999',
'unit': 'Volt',
'width': 10},
{'cols': 1,
'comment': '',
'format': 'I6',
'long_name': 'Flight',
'name': 'Flight',
'null': '-9999',
'unit': '',
'width': 6},
{'cols': 1,
'comment': '',
'format': 'F10.4',
'long_name': 'Time',
'name': 'Time',
'null': '-999.9999',
'unit': 'msec',
'width': 10},
{'cols': 1,
'comment': '',
'format': 'F13.5',
'long_name': 'Tx_Current',
'name': 'Tx_Current',
'null': '-99999.99999',
'unit': 'Amp',
'width': 13}],
'format': None}}
Get the data just for one field/column:
In [8]: gdf.get_field('Time')
Out[8]:
array([ 5.20000000e-03, 1.04000000e-02, 1.56000000e-02, ...,
5.99844000e+01, 5.99896000e+01, 5.99948000e+01])
What about fields which are 2D arrays? Some GDF2 data files have fields with more than one value per row/record. e.g. in this one the last four fields each take up 30 columns:
In [9]: gdf = aseg_gdf2.read(r'tests/example_datasets/9a13704a/Mugrave_WB_MGA52.dfn')
In [10]: print(gdf.dfn_contents)
DEFN ST=RECD,RT=COMM;RT:A4;COMMENTS:A76
DEFN 1 ST=RECD,RT=;GA_Project:I10:Geoscience Australia airborne survey project number
DEFN 2 ST=RECD,RT=;Job_No:I10:SkyTEM Australia Job Number
DEFN 3 ST=RECD,RT=;Fiducial:F15.2:Fiducial
DEFN 4 ST=RECD,RT=;DATETIME:F18.10:UNIT=days,Decimal days since midnight December 31st 1899
DEFN 5 ST=RECD,RT=;LINE:I10:Line number
DEFN 6 ST=RECD,RT=;Easting:F12.2:NULL=-9999999.99,UNIT=m,Easting (GDA94 MGA Zone 52)
DEFN 7 ST=RECD,RT=;NORTH:F15.2:NULL=-9999999999.99,UNIT=m,Northing (GDA 94 MGA Zone 52)
DEFN 8 ST=RECD,RT=;DTM_AHD:F10.2:NULL=-99999.99,Digital terrain model (AUSGeoid09 datum)
DEFN 9 ST=RECD,RT=;RESI1:F10.3:NULL=-9999.999,Residual of data
DEFN 10 ST=RECD,RT=;HEIGHT:F10.2:NULL=-99999.99,UNIT=m,Laser altimeter measured height of Tx loop centre above ground
DEFN 11 ST=RECD,RT=;INVHEI:F10.2:NULL=-99999.99,UNIT=m,Calulated inversion height
DEFN 12 ST=RECD,RT=;DOI:F10.2:NULL=-99999.99,UNIT=m,Calculated depth of investigation
DEFN 13 ST=RECD,RT=;Elev:30F12.2:NULL=-9999999.99,UNIT=m,Elevation to the top of each layer
DEFN 14 ST=RECD,RT=;Con:30F15.5:NULL=-9999999.99999,UNIT=mS/m,Inverted Conductivity for each layer
DEFN 15 ST=RECD,RT=;Con_doi:30F15.5:NULL=-9999999.99999,UNIT=mS/m, Inverted conductivity for each layer, masked to the depth of investigation
DEFN 16 ST=RECD,RT=;RUnc:30F12.3:NULL=-999999.999,Relative uncertainty of conductivity layer;END DEFN
You can see the field names in the normal manner:
In [11]: gdf.field_names()
Out[11]:
['GA_Project',
'Job_No',
'Fiducial',
'DATETIME',
'LINE',
'Easting',
'NORTH',
'DTM_AHD',
'RESI1',
'HEIGHT',
'INVHEI',
'DOI',
'Elev',
'Con',
'Con_doi',
'RUnc']
Or you can see an "expanded" version of the fields, which is used for the column headings of the data table:
In [12]: gdf.column_names()
Out[12]:
['GA_Project', 'Job_No', 'Fiducial', 'DATETIME', 'LINE', 'Easting', 'NORTH', 'DTM_AHD', 'RESI1',
'HEIGHT', 'INVHEI', 'DOI', 'Elev[0]', 'Elev[1]', 'Elev[2]', 'Elev[3]', 'Elev[4]', 'Elev[5]',
'Elev[6]', 'Elev[7]', 'Elev[8]', 'Elev[9]', 'Elev[10]', 'Elev[11]', 'Elev[12]', 'Elev[13]',
'Elev[14]', 'Elev[15]', 'Elev[16]', 'Elev[17]', 'Elev[18]', 'Elev[19]', 'Elev[20]', 'Elev[21]',
'Elev[22]', 'Elev[23]', 'Elev[24]', 'Elev[25]', 'Elev[26]', 'Elev[27]', 'Elev[28]', 'Elev[29]',
'Con[0]', 'Con[1]', 'Con[2]', 'Con[3]', 'Con[4]', 'Con[5]', 'Con[6]', 'Con[7]', 'Con[8]', 'Con[9]',
'Con[10]', 'Con[11]', 'Con[12]', 'Con[13]', 'Con[14]', 'Con[15]', 'Con[16]', 'Con[17]', 'Con[18]',
'Con[19]', 'Con[20]', 'Con[21]', 'Con[22]', 'Con[23]', 'Con[24]', 'Con[25]', 'Con[26]', 'Con[27]',
'Con[28]', 'Con[29]', 'Con_doi[0]', 'Con_doi[1]', 'Con_doi[2]', 'Con_doi[3]', 'Con_doi[4]',
'Con_doi[5]', 'Con_doi[6]', 'Con_doi[7]', 'Con_doi[8]', 'Con_doi[9]', 'Con_doi[10]', 'Con_doi[11]',
'Con_doi[12]', 'Con_doi[13]', 'Con_doi[14]', 'Con_doi[15]', 'Con_doi[16]', 'Con_doi[17]',
'Con_doi[18]', 'Con_doi[19]', 'Con_doi[20]', 'Con_doi[21]', 'Con_doi[22]', 'Con_doi[23]',
'Con_doi[24]', 'Con_doi[25]', 'Con_doi[26]', 'Con_doi[27]', 'Con_doi[28]', 'Con_doi[29]', 'RUnc[0]',
'RUnc[1]', 'RUnc[2]', 'RUnc[3]', 'RUnc[4]', 'RUnc[5]', 'RUnc[6]', 'RUnc[7]', 'RUnc[8]', 'RUnc[9]',
'RUnc[10]', 'RUnc[11]', 'RUnc[12]', 'RUnc[13]', 'RUnc[14]', 'RUnc[15]', 'RUnc[16]', 'RUnc[17]',
'RUnc[18]', 'RUnc[19]', 'RUnc[20]', 'RUnc[21]', 'RUnc[22]', 'RUnc[23]', 'RUnc[24]', 'RUnc[25]',
'RUnc[26]', 'RUnc[27]', 'RUnc[28]', 'RUnc[29]']
In [13]: gdf.df().head()
Out[13]:
GA_Project Job_No Fiducial DATETIME LINE Easting NORTH \
0 1288 10013 3621109.0 42655.910984 112601 948001.6 7035223.1
1 1288 10013 3621110.0 42655.910995 112601 948001.9 7035196.8
2 1288 10013 3621111.0 42655.911007 112601 948001.5 7035169.5
3 1288 10013 3621112.0 42655.911019 112601 948000.6 7035141.6
4 1288 10013 3621113.0 42655.911030 112601 947999.1 7035113.6
DTM_AHD RESI1 HEIGHT ... RUnc[20] RUnc[21] RUnc[22] RUnc[23] \
0 354.1 1.091 40.98 ... 1.39 1.76 2.35 3.26
1 353.8 1.101 41.08 ... 1.43 1.84 2.47 3.41
2 353.7 0.813 41.03 ... 1.45 1.88 2.53 3.48
3 353.9 0.567 40.79 ... 1.45 1.87 2.53 3.49
4 354.2 0.522 40.37 ... 1.45 1.88 2.54 3.52
RUnc[24] RUnc[25] RUnc[26] RUnc[27] RUnc[28] RUnc[29]
0 4.45 5.74 6.94 8.00 8.99 98.0
1 4.62 5.90 7.09 8.15 9.15 98.0
2 4.70 5.97 7.16 8.22 9.21 98.0
3 4.71 5.98 7.16 8.21 9.20 98.0
4 4.74 6.01 7.18 8.23 9.22 98.0
[5 rows x 132 columns]
You can retrieve one of the original field arrays using get_field()
:
In [14]: gdf.get_field('Elev')
Out[14]:
array([[ 354.1, 352.1, 349.8, ..., -105.8, -171.2, -245.7],
[ 353.8, 351.8, 349.5, ..., -106.1, -171.5, -246. ],
[ 353.7, 351.7, 349.4, ..., -106.2, -171.6, -246.1],
...,
[ 510.5, 508.5, 506.2, ..., 50.6, -14.8, -89.3],
[ 510.5, 508.5, 506.2, ..., 50.6, -14.8, -89.3],
[ 510.6, 508.6, 506.3, ..., 50.7, -14.7, -89.2]])
Or one of the columns:
In [15]: gdf.get_field('Elev[0]')
Out[15]:
array([ 354.1, 353.8, 353.7, 353.9, 354.2, 354.5, 354.6, 354.7,
354.6, 354.5, 354.3, 354.1, 353.9, 353.8, 353.9, 354. ,
512.8, 512.6, 512.4, 512.3, 512.3, 512.5, 512.7, 512.9,
512.9, 512.8, 512.6, 512.4, 512. , 511.7, 511.4, 511.2,
511. , 510.6, 510.5, 510.5, 510.5, 510.6])
You can also retrieve a subset of fields and column names as a pandas.DataFrame using the usecols
keyword argument -- you don't necessarily need to retrieve the whole file at once. Note that the multidimensional 'Con'
field is expanded into the column names:
In [16]: gdf.df(usecols=['Easting', 'NORTH', 'Con']).head()
Out[16]:
Easting NORTH Con[0] Con[1] Con[2] Con[3] Con[4] \
0 948001.6 7035223.1 28.76870 31.88776 46.04052 83.68201 157.48031
1 948001.9 7035196.8 31.06555 35.47357 51.17707 92.08103 165.37126
2 948001.5 7035169.5 38.18251 42.48088 59.91612 103.59474 174.79462
3 948000.6 7035141.6 47.61905 51.84033 70.17544 114.31184 178.79492
4 947999.1 7035113.6 58.58231 61.12469 77.45933 118.04982 173.64126
Con[5] Con[6] Con[7] ... Con[20] Con[21] \
0 231.53508 242.01355 198.84669 ... 108.63661 145.39110
1 235.73786 237.41690 190.65777 ... 108.95620 144.84357
2 232.07241 225.88660 178.98693 ... 110.29006 146.09204
3 219.92523 212.44954 170.64846 ... 112.81588 148.58841
4 209.51184 204.49898 168.60563 ... 114.48197 150.03751
Con[22] Con[23] Con[24] Con[25] Con[26] Con[27] \
0 181.29079 191.60759 178.44397 162.31131 152.43902 148.38997
1 179.79144 190.36741 177.99929 162.33766 152.55530 148.47810
2 179.88847 189.35808 177.11654 161.89089 152.46227 148.54427
3 180.31013 187.68769 175.37706 161.00467 152.23017 148.65468
4 180.21265 186.35855 174.27675 160.56519 152.23017 148.83167
Con[28] Con[29]
0 147.49263 147.42739
1 147.53615 147.47087
2 147.66686 147.62327
3 147.88524 147.86337
4 148.10427 148.08233
[5 rows x 32 columns]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file aseg_gdf2-0.1.tar.gz
.
File metadata
- Download URL: aseg_gdf2-0.1.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6995e0faf4babfe5060e85b753c39abfc28cb177980d49b194bcea34922f5a49 |
|
MD5 | b3e796f67a3da8b28477cdc0c7535ba2 |
|
BLAKE2b-256 | 804c2aac42416761562f9342ede1f32a858e2652f1bdb9f8c1fa1cf253809467 |