A pathlib.Path class for dapla
Project description
dapla-path
pathlib.Path for dapla
Opprettet av: ort ort@ssb.no
Path (dapla)
import dapla as dp
import pandas as pd
from daplapath.path import Path
folder = Path('ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024')
folder
'ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024'
Fungerer som tekst
folder.startswith("ssb")
True
dp.FileClient.get_gcs_file_system().exists(folder)
True
Med metoder og attributter ala pathlib.Path
folder.exists()
True
folder.is_dir()
True
file = folder / "ABAS_kommune_utenhav_p2024_v1.parquet"
file
'ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
file.parent
'ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024'
Og noen pandas attributter
Uten å lese filen
file.columns
Index(['OBJTYPE', 'NAVN', 'KOMMUNENR', 'FYLKE', 'AREAL_GDB', 'SHAPE_Length',
'SHAPE_Area', 'geometry'],
dtype='object')
file.dtypes
OBJTYPE string
NAVN string
KOMMUNENR string
FYLKE string
AREAL_GDB double
SHAPE_Length double
SHAPE_Area double
geometry binary
dtype: object
file.shape
(481, 8)
Versjonering
file.version_number
1
print(file.versions())
timestamp mb (int)
2024-05-19 12:31:02 941 .../ABAS_kommune_utenhav_p2024.parquet
2024-08-16 16:15:10 941 .../ABAS_kommune_utenhav_p2024_v1.parquet
Name: path, dtype: object
file.latest_version()
'ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
file.highest_numbered_version()
'ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
# highest_numbered_version + 1
file.new_version()
'ssb-kart-data-delt-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v2.parquet'
# alltid False
file.new_version().exists()
False
# finner/fjerner versjonsnummer med regex-søk
file._version_pattern
'_v(\\d+)'
Branch tree
Filtre med hyperlenke. Gjør at man kopierer stien når man klikker på den.
print(
Path("ssb-kart-data-delt-prod/analyse_data/klargjorte-data").tree()
)
ssb-kart-data-delt-prod/analyse_data/klargjorte-data /
└──2000 /
└──SSB_tettsted_flate_p2000.parquet
└──SSB_tettsted_flate_p2000_v1.parquet
└──2002 /
└──SSB_tettsted_flate_p2002.parquet
└──SSB_tettsted_flate_p2002_v1.parquet
└──2003 /
└──SSB_tettsted_flate_p2003.parquet
└──SSB_tettsted_flate_p2003_v1.parquet
└──2004 /
└──SSB_tettsted_flate_p2004.parquet
└──SSB_tettsted_flate_p2004_v1.parquet
└──2005 /
└──SSB_tettsted_flate_p2005.parquet
└──SSB_tettsted_flate_p2005_v1.parquet
└──2006 /
└──SSB_tettsted_flate_p2006.parquet
└──SSB_tettsted_flate_p2006_v1.parquet
└──2007 /
└──SSB_tettsted_flate_p2007.parquet
└──SSB_tettsted_flate_p2007_v1.parquet
└──2008 /
└──SSB_tettsted_flate_p2008.parquet
└──SSB_tettsted_flate_p2008_v1.parquet
└──SSB_tettsted_ringbuffer_p2008.parquet
└──(...)
└──2009 /
└──SSB_tettsted_flate_p2009.parquet
└──SSB_tettsted_flate_p2009_v1.parquet
└──2010 /
└──SOL_arealressurs_flate_p2010.parquet
└──SOL_arealressurs_flate_p2010_v1.parquet
└──2011 /
└──SOL_Arstat_flate_p2011.parquet
└──SOL_Arstat_flate_p2011_v1.parquet
└──SSB_tettsted_flate_p2011.parquet
└──(...)
└──2012 /
└──ABAS_fylke_flate_p2012_v1.parquet
└──ABAS_fylke_linje_p2012_v1.parquet
└──ABAS_grunnkrets_flate_p2012_v1.parquet
└──(...)
└──2013 /
└──ABAS_fylke_flate_p2013_v1.parquet
└──ABAS_kommune_flate_p2013_v1.parquet
└──DEK_eiendom_flate_p2013_v1.parquet
└──(...)
└──2014 /
└──DEK_eiendom_flate_p2014_v1.parquet
└──FKB_anlegg_flate_p2014_v1.parquet
└──FKB_anlegg_linje_p2014_v1.parquet
└──(...)
└──2015 /
└──ABAS_grunnkrets_flate_p2015_v1.parquet
└──ABAS_grunnkrets_utenhav_p2015_v1.parquet
└──ABAS_kommune_flate_p2015_v1.parquet
└──(...)
└──2016 /
└──ABAS_fylke_flate_p2016_v1.parquet
└──ABAS_grunnkrets_flate_p2016_v1.parquet
└──ABAS_grunnkrets_utenhav_p2016_v1.parquet
└──(...)
└──2017 /
└──ABAS_fylke_flate_p2017_v1.parquet
└──ABAS_grunnkrets_flate_p2017_v1.parquet
└──ABAS_grunnkrets_utenhav_p2017_v1.parquet
└──(...)
└──2018 /
└──ABAS_fylke_flate_p2018_v1.parquet
└──ABAS_grunnkrets_flate_p2018_v1.parquet
└──ABAS_grunnkrets_utenhav_p2018_v1.parquet
└──(...)
└──2019 /
└──ABAS_fylke_flate_p2019_v1.parquet
└──ABAS_grunnkrets_flate_p2019_v1.parquet
└──ABAS_grunnkrets_utenhav_p2019_v1.parquet
└──(...)
└──2020 /
└──ABAS_fylke_flate_p2020_v1.parquet
└──ABAS_grunnkrets_flate_p2020_v1.parquet
└──ABAS_grunnkrets_utenhav_p2020_v1.parquet
└──(...)
└──2021 /
└──ABAS_fylke_flate_p2021_v1.parquet
└──ABAS_grunnkrets_flate_p2021_v1.parquet
└──ABAS_grunnkrets_utenhav_p2021_v1.parquet
└──(...)
└──2022 /
└──ABAS_fylke_flate_p2022_v1.parquet
└──ABAS_grunnkrets_flate_p2022_v1.parquet
└──ABAS_grunnkrets_utenhav_p2022_v1.parquet
└──(...)
└──2023 /
└──ABAS_KnrGamle_p2023_v1.parquet
└──ABAS_fylke_flate_p2023_v1.parquet
└──ABAS_grunnkrets_flate_p2023_v1.parquet
└──(...)
└──2024 /
└──ABAS_fylke_flate_p2024_v1.parquet
└──ABAS_grunnkrets_flate_p2024_v1.parquet
└──ABAS_grunnkrets_utenhav_p2024_v1.parquet
└──(...)
ls - få filstier, timestamp og størrelse
Med stier som kopieres (som ctrl + c) når man klipper på stien.
files_in_dir = file.parent.ls()
print(files_in_dir)
timestamp mb (int)
2024-04-19 11:44:12 11 .../ABAS_kommune_flate_p2024_v1.parquet
2024-04-19 11:45:47 0 .../N50_JernbaneStasjon_punkt_p2024.parquet
0 .../N50_JernbaneStasjon_punkt_p2024_v1.parquet
0 .../N50_lufthavn_punkt_p2024.parquet
0 .../N50_lufthavn_punkt_p2024_v1.parquet
...
2024-08-21 14:47:12 861 .../SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30 152 .../SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36 152 .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21 1122 .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32 740 .../NVDB_veg_linje_p2024_v1.parquet
Name: path, Length: 127, dtype: object
# subclass av pandas.Series
type(files_in_dir)
daplapath.path.PathSeries
print(files_in_dir.loc[lambda x: x.gb > 10].keep_latest_versions())
timestamp mb (int)
2024-07-18 00:13:09 17646 .../FKB_arealressurs_flate_p2024_v1.parquet
2024-08-20 14:03:16 19717 .../FKB_gronnstruktur_flate_p2024_v1.parquet
Name: path, dtype: object
# stiene er fortsatt Path
type(files_in_dir.iloc[0])
daplapath.path.Path
# velg ut filene
print(folder.ls().files)
timestamp mb (int)
2024-04-19 11:44:12 11 .../ABAS_kommune_flate_p2024_v1.parquet
2024-04-19 11:45:47 0 .../N50_JernbaneStasjon_punkt_p2024.parquet
0 .../N50_JernbaneStasjon_punkt_p2024_v1.parquet
0 .../N50_lufthavn_punkt_p2024.parquet
0 .../N50_lufthavn_punkt_p2024_v1.parquet
...
2024-08-21 14:47:12 861 .../SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30 152 .../SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36 152 .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21 1122 .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32 740 .../NVDB_veg_linje_p2024_v1.parquet
Name: path, Length: 127, dtype: object
print(folder.ls().dirs)
Series([], Name: path, dtype: object)
# samme som .loc med x.str.contains
print(folder.ls().containing("kommune"))
timestamp mb (int)
2024-04-19 11:44:12 11 .../ABAS_kommune_flate_p2024_v1.parquet
2024-05-19 12:31:02 941 .../ABAS_kommune_utenhav_p2024.parquet
2024-06-24 14:25:14 11 .../ABAS_kommune_flate_p2024.parquet
2024-08-16 16:15:10 941 .../ABAS_kommune_utenhav_p2024_v1.parquet
2024-08-23 14:59:36 152 .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21 1122 .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
Name: path, dtype: object
print(file.parent.parent.ls(recursive=True).files)
timestamp mb (int)
2024-04-19 11:43:21 0 .../2022/N50_JernbaneStasjon_punkt_p2022_v1.parquet
2024-04-19 11:43:22 0 .../2022/N50_lufthavn_punkt_p2022_v1.parquet
2024-04-19 11:43:23 0 .../2022/NVE_Vindturbin_punkt_p2022_v1.parquet
0 .../2022/NVE_Trafostasjon_punkt_p2022_v1.parquet
2024-04-19 11:43:24 0 .../2022/S100_TekniskSit_flate_p2022_v1.parquet
...
2024-08-21 14:47:12 861 .../2024/SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30 152 .../2024/SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36 152 .../2024/SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21 1122 .../2024/SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32 740 .../2024/NVDB_veg_linje_p2024_v1.parquet
Length: 1323, dtype: object
Write to testpath
testpath = Path('ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet')
# delete files first
for version in testpath.versions():
version.rm_file()
testpath.exists()
False
df = pd.DataFrame({"x": [1,2,3], "y": [*"abc"]})
dp.write_pandas(df, testpath)
testpath.exists()
True
testpath.latest_version()
'ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet'
# highest_numbered_version + 1
testpath.new_version()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[31], line 2
1 # highest_numbered_version + 1
----> 2 testpath.new_version()
File ~/daplapath/daplapath/path.py:805, in Path.new_version(self, timeout)
803 time_should_be_at_least = pd.Timestamp.now() - pd.Timedelta(minutes=timeout)
804 if timestamp[0] > time_should_be_at_least:
--> 805 raise ValueError(
806 f"Latest version of the file was updated {timestamp[0]}, which "
807 f"is less than the timeout period of {timeout} minutes. "
808 "Change the timeout argument, but be sure to not save new "
809 "versions in a loop."
810 )
812 return highest_numbered.add_to_version_number(1)
ValueError: Latest version of the file was updated 2024-08-28 15:09:47, which is less than the timeout period of 30 minutes. Change the timeout argument, but be sure to not save new versions in a loop.
dp.write_pandas(df, testpath.new_version(timeout=0.01))
print(testpath.versions())
timestamp mb (int)
2024-08-28 15:09:47 0 ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet
2024-08-28 15:09:52 0 ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v2.parquet
dtype: object
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
daplapath-1.1.0.tar.gz
(17.1 kB
view hashes)
Built Distribution
daplapath-1.1.0-py3-none-any.whl
(15.6 kB
view hashes)
Close
Hashes for daplapath-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3dbf1a0cc7735d3d18ac065dfd5405defe4b31cdc2e941e36b292345f0d5688 |
|
MD5 | 26628301a8d4020a3070acb42ec1cead |
|
BLAKE2b-256 | d202d612549e49804f64ef81c85df86394ac9363b6049247bd2658f33ddd268f |