Skip to main content

A pathlib.Path class for dapla

Project description

dapla-path

pathlib.Path for dapla

Opprettet av: ort ort@ssb.no


Path (dapla)

import dapla as dp
import pandas as pd

from daplapath.path import Path
folder = Path('ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024')
folder
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024'

Fungerer som tekst

folder.startswith("ssb")
True
dp.FileClient.get_gcs_file_system().exists(folder)
True

Med metoder og attributter ala pathlib.Path

folder.exists()
True
folder.is_dir()
True
file = folder / "ABAS_kommune_utenhav_p2024_v1.parquet"
file
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
file.parent
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024'

Og noen pandas attributter

Uten å lese filen

file.columns
Index(['objtype', 'navn', "komm_nr", "fylke_nr", 'areal_gdb', 'geometry'],
      dtype='object')
file.dtypes
objtype         string
navn            string
komm_nr       string
fylke_nr           string
areal_gdb       double
geometry        binary
dtype: object
file.shape
(481, 8)

Versjonering

file.version_number
1
print(file.versions())
timestamp            mb (int)
2024-05-19 12:31:02  941            .../ABAS_kommune_utenhav_p2024.parquet
2024-08-16 16:15:10  941         .../ABAS_kommune_utenhav_p2024_v1.parquet
Name: path, dtype: object
file.latest_version()
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
file.highest_numbered_version()
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
# highest_numbered_version + 1
file.new_version()
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v2.parquet'
# alltid False
file.new_version().exists()
False
# finner/fjerner versjonsnummer med regex-søk
file._version_pattern
'_v(\\d+)'

Branch tree

Filtre med hyperlenke. Gjør at man kopierer stien når man klikker på den.

print(
    Path("ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data").tree()
)
ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data /
    └──2000 /
        └──SSB_tettsted_flate_p2000.parquet
        └──SSB_tettsted_flate_p2000_v1.parquet
    └──2002 /
        └──SSB_tettsted_flate_p2002.parquet
        └──SSB_tettsted_flate_p2002_v1.parquet
    └──2003 /
        └──SSB_tettsted_flate_p2003.parquet
        └──SSB_tettsted_flate_p2003_v1.parquet
    └──2004 /
        └──SSB_tettsted_flate_p2004.parquet
        └──SSB_tettsted_flate_p2004_v1.parquet
    └──2005 /
        └──SSB_tettsted_flate_p2005.parquet
        └──SSB_tettsted_flate_p2005_v1.parquet
    └──2006 /
        └──SSB_tettsted_flate_p2006.parquet
        └──SSB_tettsted_flate_p2006_v1.parquet
    └──2007 /
        └──SSB_tettsted_flate_p2007.parquet
        └──SSB_tettsted_flate_p2007_v1.parquet
    └──2008 /
        └──SSB_tettsted_flate_p2008.parquet
        └──SSB_tettsted_flate_p2008_v1.parquet
        └──SSB_tettsted_ringbuffer_p2008.parquet
        └──(...)
    └──2009 /
        └──SSB_tettsted_flate_p2009.parquet
        └──SSB_tettsted_flate_p2009_v1.parquet
    └──2010 /
        └──SOL_arealressurs_flate_p2010.parquet
        └──SOL_arealressurs_flate_p2010_v1.parquet
    └──2011 /
        └──SOL_Arstat_flate_p2011.parquet
        └──SOL_Arstat_flate_p2011_v1.parquet
        └──SSB_tettsted_flate_p2011.parquet
        └──(...)
    └──2012 /
        └──ABAS_fylke_flate_p2012_v1.parquet
        └──ABAS_fylke_linje_p2012_v1.parquet
        └──ABAS_grunnkrets_flate_p2012_v1.parquet
        └──(...)
    └──2013 /
        └──ABAS_fylke_flate_p2013_v1.parquet
        └──ABAS_kommune_flate_p2013_v1.parquet
        └──DEK_eiendom_flate_p2013_v1.parquet
        └──(...)
    └──2014 /
        └──DEK_eiendom_flate_p2014_v1.parquet
        └──FKB_anlegg_flate_p2014_v1.parquet
        └──FKB_anlegg_linje_p2014_v1.parquet
        └──(...)
    └──2015 /
        └──ABAS_grunnkrets_flate_p2015_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2015_v1.parquet
        └──ABAS_kommune_flate_p2015_v1.parquet
        └──(...)
    └──2016 /
        └──ABAS_fylke_flate_p2016_v1.parquet
        └──ABAS_grunnkrets_flate_p2016_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2016_v1.parquet
        └──(...)
    └──2017 /
        └──ABAS_fylke_flate_p2017_v1.parquet
        └──ABAS_grunnkrets_flate_p2017_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2017_v1.parquet
        └──(...)
    └──2018 /
        └──ABAS_fylke_flate_p2018_v1.parquet
        └──ABAS_grunnkrets_flate_p2018_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2018_v1.parquet
        └──(...)
    └──2019 /
        └──ABAS_fylke_flate_p2019_v1.parquet
        └──ABAS_grunnkrets_flate_p2019_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2019_v1.parquet
        └──(...)
    └──2020 /
        └──ABAS_fylke_flate_p2020_v1.parquet
        └──ABAS_grunnkrets_flate_p2020_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2020_v1.parquet
        └──(...)
    └──2021 /
        └──ABAS_fylke_flate_p2021_v1.parquet
        └──ABAS_grunnkrets_flate_p2021_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2021_v1.parquet
        └──(...)
    └──2022 /
        └──ABAS_fylke_flate_p2022_v1.parquet
        └──ABAS_grunnkrets_flate_p2022_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2022_v1.parquet
        └──(...)
    └──2023 /
        └──ABAS_KnrGamle_p2023_v1.parquet
        └──ABAS_fylke_flate_p2023_v1.parquet
        └──ABAS_grunnkrets_flate_p2023_v1.parquet
        └──(...)
    └──2024 /
        └──ABAS_fylke_flate_p2024_v1.parquet
        └──ABAS_grunnkrets_flate_p2024_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2024_v1.parquet
        └──(...)

ls - få filstier, timestamp og størrelse

Med stier som kopieres (som ctrl + c) når man klipper på stien.

files_in_dir = file.parent.ls()
print(files_in_dir)
timestamp            mb (int)
2024-04-19 11:44:12  11                       .../ABAS_kommune_flate_p2024_v1.parquet
2024-04-19 11:45:47  0                    .../N50_JernbaneStasjon_punkt_p2024.parquet
                     0                 .../N50_JernbaneStasjon_punkt_p2024_v1.parquet
                     0                           .../N50_lufthavn_punkt_p2024.parquet
                     0                        .../N50_lufthavn_punkt_p2024_v1.parquet
                                                         ...                         
2024-08-21 14:47:12  861                              .../SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30  152                      .../SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36  152              .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32  740                          .../NVDB_veg_linje_p2024_v1.parquet
Name: path, Length: 127, dtype: object
# subclass av pandas.Series
type(files_in_dir)
daplapath.path.PathSeries
print(files_in_dir.loc[lambda x: x.gb > 10].keep_latest_versions())
timestamp            mb (int)
2024-07-18 00:13:09  17646        .../FKB_arealressurs_flate_p2024_v1.parquet
2024-08-20 14:03:16  19717       .../FKB_gronnstruktur_flate_p2024_v1.parquet
Name: path, dtype: object
# stiene er fortsatt Path
type(files_in_dir.iloc[0])
daplapath.path.Path
# velg ut filene
print(folder.ls().files)
timestamp            mb (int)
2024-04-19 11:44:12  11                       .../ABAS_kommune_flate_p2024_v1.parquet
2024-04-19 11:45:47  0                    .../N50_JernbaneStasjon_punkt_p2024.parquet
                     0                 .../N50_JernbaneStasjon_punkt_p2024_v1.parquet
                     0                           .../N50_lufthavn_punkt_p2024.parquet
                     0                        .../N50_lufthavn_punkt_p2024_v1.parquet
                                                         ...                         
2024-08-21 14:47:12  861                              .../SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30  152                      .../SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36  152              .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32  740                          .../NVDB_veg_linje_p2024_v1.parquet
Name: path, Length: 127, dtype: object
print(folder.ls().dirs)
Series([], Name: path, dtype: object)
# samme som .loc med x.str.contains
print(folder.ls().containing("kommune"))
timestamp            mb (int)
2024-04-19 11:44:12  11                       .../ABAS_kommune_flate_p2024_v1.parquet
2024-05-19 12:31:02  941                       .../ABAS_kommune_utenhav_p2024.parquet
2024-06-24 14:25:14  11                          .../ABAS_kommune_flate_p2024.parquet
2024-08-16 16:15:10  941                    .../ABAS_kommune_utenhav_p2024_v1.parquet
2024-08-23 14:59:36  152              .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
Name: path, dtype: object
print(file.parent.parent.ls(recursive=True).files)
timestamp            mb (int)
2024-04-19 11:43:21  0                 .../2022/N50_JernbaneStasjon_punkt_p2022_v1.parquet
2024-04-19 11:43:22  0                        .../2022/N50_lufthavn_punkt_p2022_v1.parquet
2024-04-19 11:43:23  0                      .../2022/NVE_Vindturbin_punkt_p2022_v1.parquet
                     0                    .../2022/NVE_Trafostasjon_punkt_p2022_v1.parquet
2024-04-19 11:43:24  0                     .../2022/S100_TekniskSit_flate_p2022_v1.parquet
                                                           ...                            
2024-08-21 14:47:12  861                              .../2024/SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30  152                      .../2024/SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36  152              .../2024/SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../2024/SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32  740                          .../2024/NVDB_veg_linje_p2024_v1.parquet
Length: 1323, dtype: object

Write to testpath

testpath = Path('ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet')

# delete files first
for version in testpath.versions():
    version.rm_file()

testpath.exists()
False
df = pd.DataFrame({"x": [1,2,3], "y": [*"abc"]})

dp.write_pandas(df, testpath)

testpath.exists()
True
testpath.latest_version()
'ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet'
# highest_numbered_version + 1
testpath.new_version()
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[31], line 2
      1 # highest_numbered_version + 1
----> 2 testpath.new_version()


File ~/daplapath/daplapath/path.py:805, in Path.new_version(self, timeout)
    803     time_should_be_at_least = pd.Timestamp.now() - pd.Timedelta(minutes=timeout)
    804     if timestamp[0] > time_should_be_at_least:
--> 805         raise ValueError(
    806             f"Latest version of the file was updated {timestamp[0]}, which "
    807             f"is less than the timeout period of {timeout} minutes. "
    808             "Change the timeout argument, but be sure to not save new "
    809             "versions in a loop."
    810         )
    812 return highest_numbered.add_to_version_number(1)


ValueError: Latest version of the file was updated 2024-08-28 15:09:47, which is less than the timeout period of 30 minutes. Change the timeout argument, but be sure to not save new versions in a loop.
dp.write_pandas(df, testpath.new_version(timeout=0.01))
print(testpath.versions())
timestamp            mb (int)
2024-08-28 15:09:47  0           ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet
2024-08-28 15:09:52  0           ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v2.parquet
dtype: object

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daplapath-2.1.6.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daplapath-2.1.6-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file daplapath-2.1.6.tar.gz.

File metadata

  • Download URL: daplapath-2.1.6.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.13

File hashes

Hashes for daplapath-2.1.6.tar.gz
Algorithm Hash digest
SHA256 161236a01e95ed09e35bbd4ea1f486c3e34ac370d39d8f55ad75e8826168edd1
MD5 1c3790dad7449e13d0a0168c9a0e49c3
BLAKE2b-256 69f4dc808b62a4ba3cf30b595b3f1abaeaff9fa3e1a3e551448164068955d687

See more details on using hashes here.

File details

Details for the file daplapath-2.1.6-py3-none-any.whl.

File metadata

  • Download URL: daplapath-2.1.6-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.13

File hashes

Hashes for daplapath-2.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6da8ddb589b212cfe7f1162d8c9df97dd36c6af1c71894b0f80eed0dae9f18d8
MD5 2f0b65a9fec2847216de3f10c711f8bf
BLAKE2b-256 25f2191dfc6d68ed11afb8a64e2e27a13794337c793faf8c59ac357290ebcab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page