Skip to main content

A pathlib.Path class for dapla

Project description

dapla-path

pathlib.Path for dapla

Opprettet av: ort ort@ssb.no


Path (dapla)

import dapla as dp
import pandas as pd

from daplapath.path import Path
folder = Path('ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024')
folder
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024'

Fungerer som tekst

folder.startswith("ssb")
True
dp.FileClient.get_gcs_file_system().exists(folder)
True

Med metoder og attributter ala pathlib.Path

folder.exists()
True
folder.is_dir()
True
file = folder / "ABAS_kommune_utenhav_p2024_v1.parquet"
file
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
file.parent
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024'

Og noen pandas attributter

Uten å lese filen

file.columns
Index(['objtype', 'navn', "komm_nr", "fylke_nr", 'areal_gdb', 'geometry'],
      dtype='object')
file.dtypes
objtype         string
navn            string
komm_nr       string
fylke_nr           string
areal_gdb       double
geometry        binary
dtype: object
file.shape
(481, 8)

Versjonering

file.version_number
1
print(file.versions())
timestamp            mb (int)
2024-05-19 12:31:02  941            .../ABAS_kommune_utenhav_p2024.parquet
2024-08-16 16:15:10  941         .../ABAS_kommune_utenhav_p2024_v1.parquet
Name: path, dtype: object
file.latest_version()
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
file.highest_numbered_version()
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v1.parquet'
# highest_numbered_version + 1
file.new_version()
'ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data/2024/ABAS_kommune_utenhav_p2024_v2.parquet'
# alltid False
file.new_version().exists()
False
# finner/fjerner versjonsnummer med regex-søk
file._version_pattern
'_v(\\d+)'

Branch tree

Filtre med hyperlenke. Gjør at man kopierer stien når man klikker på den.

print(
    Path("ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data").tree()
)
ssb-areal-data-delt-kart-prod/analyse_data/klargjorte-data /
    └──2000 /
        └──SSB_tettsted_flate_p2000.parquet
        └──SSB_tettsted_flate_p2000_v1.parquet
    └──2002 /
        └──SSB_tettsted_flate_p2002.parquet
        └──SSB_tettsted_flate_p2002_v1.parquet
    └──2003 /
        └──SSB_tettsted_flate_p2003.parquet
        └──SSB_tettsted_flate_p2003_v1.parquet
    └──2004 /
        └──SSB_tettsted_flate_p2004.parquet
        └──SSB_tettsted_flate_p2004_v1.parquet
    └──2005 /
        └──SSB_tettsted_flate_p2005.parquet
        └──SSB_tettsted_flate_p2005_v1.parquet
    └──2006 /
        └──SSB_tettsted_flate_p2006.parquet
        └──SSB_tettsted_flate_p2006_v1.parquet
    └──2007 /
        └──SSB_tettsted_flate_p2007.parquet
        └──SSB_tettsted_flate_p2007_v1.parquet
    └──2008 /
        └──SSB_tettsted_flate_p2008.parquet
        └──SSB_tettsted_flate_p2008_v1.parquet
        └──SSB_tettsted_ringbuffer_p2008.parquet
        └──(...)
    └──2009 /
        └──SSB_tettsted_flate_p2009.parquet
        └──SSB_tettsted_flate_p2009_v1.parquet
    └──2010 /
        └──SOL_arealressurs_flate_p2010.parquet
        └──SOL_arealressurs_flate_p2010_v1.parquet
    └──2011 /
        └──SOL_Arstat_flate_p2011.parquet
        └──SOL_Arstat_flate_p2011_v1.parquet
        └──SSB_tettsted_flate_p2011.parquet
        └──(...)
    └──2012 /
        └──ABAS_fylke_flate_p2012_v1.parquet
        └──ABAS_fylke_linje_p2012_v1.parquet
        └──ABAS_grunnkrets_flate_p2012_v1.parquet
        └──(...)
    └──2013 /
        └──ABAS_fylke_flate_p2013_v1.parquet
        └──ABAS_kommune_flate_p2013_v1.parquet
        └──DEK_eiendom_flate_p2013_v1.parquet
        └──(...)
    └──2014 /
        └──DEK_eiendom_flate_p2014_v1.parquet
        └──FKB_anlegg_flate_p2014_v1.parquet
        └──FKB_anlegg_linje_p2014_v1.parquet
        └──(...)
    └──2015 /
        └──ABAS_grunnkrets_flate_p2015_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2015_v1.parquet
        └──ABAS_kommune_flate_p2015_v1.parquet
        └──(...)
    └──2016 /
        └──ABAS_fylke_flate_p2016_v1.parquet
        └──ABAS_grunnkrets_flate_p2016_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2016_v1.parquet
        └──(...)
    └──2017 /
        └──ABAS_fylke_flate_p2017_v1.parquet
        └──ABAS_grunnkrets_flate_p2017_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2017_v1.parquet
        └──(...)
    └──2018 /
        └──ABAS_fylke_flate_p2018_v1.parquet
        └──ABAS_grunnkrets_flate_p2018_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2018_v1.parquet
        └──(...)
    └──2019 /
        └──ABAS_fylke_flate_p2019_v1.parquet
        └──ABAS_grunnkrets_flate_p2019_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2019_v1.parquet
        └──(...)
    └──2020 /
        └──ABAS_fylke_flate_p2020_v1.parquet
        └──ABAS_grunnkrets_flate_p2020_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2020_v1.parquet
        └──(...)
    └──2021 /
        └──ABAS_fylke_flate_p2021_v1.parquet
        └──ABAS_grunnkrets_flate_p2021_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2021_v1.parquet
        └──(...)
    └──2022 /
        └──ABAS_fylke_flate_p2022_v1.parquet
        └──ABAS_grunnkrets_flate_p2022_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2022_v1.parquet
        └──(...)
    └──2023 /
        └──ABAS_KnrGamle_p2023_v1.parquet
        └──ABAS_fylke_flate_p2023_v1.parquet
        └──ABAS_grunnkrets_flate_p2023_v1.parquet
        └──(...)
    └──2024 /
        └──ABAS_fylke_flate_p2024_v1.parquet
        └──ABAS_grunnkrets_flate_p2024_v1.parquet
        └──ABAS_grunnkrets_utenhav_p2024_v1.parquet
        └──(...)

ls - få filstier, timestamp og størrelse

Med stier som kopieres (som ctrl + c) når man klipper på stien.

files_in_dir = file.parent.ls()
print(files_in_dir)
timestamp            mb (int)
2024-04-19 11:44:12  11                       .../ABAS_kommune_flate_p2024_v1.parquet
2024-04-19 11:45:47  0                    .../N50_JernbaneStasjon_punkt_p2024.parquet
                     0                 .../N50_JernbaneStasjon_punkt_p2024_v1.parquet
                     0                           .../N50_lufthavn_punkt_p2024.parquet
                     0                        .../N50_lufthavn_punkt_p2024_v1.parquet
                                                         ...                         
2024-08-21 14:47:12  861                              .../SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30  152                      .../SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36  152              .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32  740                          .../NVDB_veg_linje_p2024_v1.parquet
Name: path, Length: 127, dtype: object
# subclass av pandas.Series
type(files_in_dir)
daplapath.path.PathSeries
print(files_in_dir.loc[lambda x: x.gb > 10].keep_latest_versions())
timestamp            mb (int)
2024-07-18 00:13:09  17646        .../FKB_arealressurs_flate_p2024_v1.parquet
2024-08-20 14:03:16  19717       .../FKB_gronnstruktur_flate_p2024_v1.parquet
Name: path, dtype: object
# stiene er fortsatt Path
type(files_in_dir.iloc[0])
daplapath.path.Path
# velg ut filene
print(folder.ls().files)
timestamp            mb (int)
2024-04-19 11:44:12  11                       .../ABAS_kommune_flate_p2024_v1.parquet
2024-04-19 11:45:47  0                    .../N50_JernbaneStasjon_punkt_p2024.parquet
                     0                 .../N50_JernbaneStasjon_punkt_p2024_v1.parquet
                     0                           .../N50_lufthavn_punkt_p2024.parquet
                     0                        .../N50_lufthavn_punkt_p2024_v1.parquet
                                                         ...                         
2024-08-21 14:47:12  861                              .../SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30  152                      .../SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36  152              .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32  740                          .../NVDB_veg_linje_p2024_v1.parquet
Name: path, Length: 127, dtype: object
print(folder.ls().dirs)
Series([], Name: path, dtype: object)
# samme som .loc med x.str.contains
print(folder.ls().containing("kommune"))
timestamp            mb (int)
2024-04-19 11:44:12  11                       .../ABAS_kommune_flate_p2024_v1.parquet
2024-05-19 12:31:02  941                       .../ABAS_kommune_utenhav_p2024.parquet
2024-06-24 14:25:14  11                          .../ABAS_kommune_flate_p2024.parquet
2024-08-16 16:15:10  941                    .../ABAS_kommune_utenhav_p2024_v1.parquet
2024-08-23 14:59:36  152              .../SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
Name: path, dtype: object
print(file.parent.parent.ls(recursive=True).files)
timestamp            mb (int)
2024-04-19 11:43:21  0                 .../2022/N50_JernbaneStasjon_punkt_p2022_v1.parquet
2024-04-19 11:43:22  0                        .../2022/N50_lufthavn_punkt_p2022_v1.parquet
2024-04-19 11:43:23  0                      .../2022/NVE_Vindturbin_punkt_p2022_v1.parquet
                     0                    .../2022/NVE_Trafostasjon_punkt_p2022_v1.parquet
2024-04-19 11:43:24  0                     .../2022/S100_TekniskSit_flate_p2022_v1.parquet
                                                           ...                            
2024-08-21 14:47:12  861                              .../2024/SSB_hav_flate_p2024.parquet
2024-08-23 14:59:30  152                      .../2024/SSB_tettsted_flate_p2024_v1.parquet
2024-08-23 14:59:36  152              .../2024/SSB_tettsted_kommune_flate_p2024_v1.parquet
2024-08-23 15:34:21  1122        .../2024/SSB_tettsted_kommune_ringbuffer_p2024_v1.parquet
2024-08-23 17:11:32  740                          .../2024/NVDB_veg_linje_p2024_v1.parquet
Length: 1323, dtype: object

Write to testpath

testpath = Path('ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet')

# delete files first
for version in testpath.versions():
    version.rm_file()

testpath.exists()
False
df = pd.DataFrame({"x": [1,2,3], "y": [*"abc"]})

dp.write_pandas(df, testpath)

testpath.exists()
True
testpath.latest_version()
'ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet'
# highest_numbered_version + 1
testpath.new_version()
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[31], line 2
      1 # highest_numbered_version + 1
----> 2 testpath.new_version()


File ~/daplapath/daplapath/path.py:805, in Path.new_version(self, timeout)
    803     time_should_be_at_least = pd.Timestamp.now() - pd.Timedelta(minutes=timeout)
    804     if timestamp[0] > time_should_be_at_least:
--> 805         raise ValueError(
    806             f"Latest version of the file was updated {timestamp[0]}, which "
    807             f"is less than the timeout period of {timeout} minutes. "
    808             "Change the timeout argument, but be sure to not save new "
    809             "versions in a loop."
    810         )
    812 return highest_numbered.add_to_version_number(1)


ValueError: Latest version of the file was updated 2024-08-28 15:09:47, which is less than the timeout period of 30 minutes. Change the timeout argument, but be sure to not save new versions in a loop.
dp.write_pandas(df, testpath.new_version(timeout=0.01))
print(testpath.versions())
timestamp            mb (int)
2024-08-28 15:09:47  0           ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v1.parquet
2024-08-28 15:09:52  0           ssb-areal-data-produkt-prod/arealstat/temp/test_df_p2023_v2.parquet
dtype: object

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daplapath-2.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daplapath-2.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file daplapath-2.1.0.tar.gz.

File metadata

  • Download URL: daplapath-2.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for daplapath-2.1.0.tar.gz
Algorithm Hash digest
SHA256 4e443f0f5a0cf4032e541c6b591abb5231a589b294e010ca95537d4c313bbec8
MD5 7c530aa7bff615f62b3b02ccaf26650e
BLAKE2b-256 555c599f8c929e24dd2caf544d3cbe740383d55281eb3d29336303eb469fe51d

See more details on using hashes here.

File details

Details for the file daplapath-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: daplapath-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.10

File hashes

Hashes for daplapath-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d32aecd7779561421b8f1e4efc91e72e8a86bac6b27a65b8be98f9fa29a4a40
MD5 b06cc2e08a3687f92789af7a313da429
BLAKE2b-256 7d8bd0a4fd5d04ee2c5ccc8166c1a91c5e126e5f00b607540ea622e5eb266a36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page