Package 'kwb.geoportal'

Title: R Package for Getting Spatial Data from Berlin Geoportal
Description: R Package for getting spatial data from Berlin Geoportal (https://gdi.berlin.de/geonetwork/srv/ger/catalog.search#/search).
Authors: Michael Rustler [aut, cre] (ORCID: <https://orcid.org/0000-0003-0647-7726>), AD4GD [fnd], Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph]
Maintainer: Michael Rustler <[email protected]>
License: MIT + file LICENSE
Version: 0.0.0.9000
Built: 2026-06-14 19:39:21 UTC
Source: https://github.com/KWB-R/kwb.geoportal

Help Index


Read GeoNetwork service metadata (one row per service)

Description

This function reads a GeoNetwork XML search response (as delivered by ⁠https://gdi.berlin.de/geonetwork/srv/ger/q?...⁠) and converts it into a tidy tibble with one row per metadata record. All ⁠<link>⁠ elements of a record are kept together in a list column called links, where each entry is itself a tibble created by parse_gn_link().

Usage

read_metadata(path_xml)

Arguments

path_xml

Path or URL to the GeoNetwork XML document. This can be a local file (e.g. "geoportal_metadaten.xml") or a remote URL such as "https://gdi.berlin.de/geonetwork/srv/ger/q?...".

Details

This structure is convenient when you want to keep the dataset-level information (title, abstract, uuid, ...) together, but still be able to inspect or unnest all service/download/view links later on.

The function assumes a GeoNetwork-style XML with ⁠<metadata>⁠ elements and the namespace ⁠geonet:⁠ available for the info block. It is tailored to the GDI Berlin instance but should work for other similar GeoNetwork responses that use the same link encoding (| separated).

If a record has no ⁠<link>⁠ elements, the links column will contain a single-row tibble with all NA values. This preserves the 1:1 alignment between records and rows.

Value

A tibble with one row per metadata record and the columns:

geonet_uuid

UUID from ⁠<geonet:info><uuid>⁠ (character).

geonet_id

Internal GeoNetwork id from ⁠<geonet:info><id>⁠ (character).

title

Dataset/service title.

abstract

Dataset/service abstract/description.

serviceType

Service type, if present (e.g. WMS).

types

Semicolon-separated ⁠<type>⁠ elements.

source_logo

Logo path, if present.

links

List column; each element is a tibble with the parsed links.

Examples

## Not run: 
df <- read_geonetwork_services(
  "https://gdi.berlin.de/geonetwork/srv/ger/q?facet.q=type/service&resultType=details&sortBy=changeDate&from=1&to=100&fast=index"
)
dplyr::glimpse(df)
df$links[[1]]

## End(Not run)

Read all GeoNetwork / Geoportal metadata in chunks

Description

This function uses read_metadata() repeatedly to fetch all available metadata records from a GeoNetwork endpoint that supports from / to pagination (like the GDI Berlin instance).

Usage

read_metadata_all(
  base_url =
    "https://gdi.berlin.de/geonetwork/srv/ger/q?facet.q=type/service&resultType=details&sortBy=changeDate&fast=index",
  chunk_size = 100
)

Arguments

base_url

Base GeoNetwork query URL without from and to parameters. Must return an XML with a ⁠<summary count="...">⁠ node. Defaults to the GDI Berlin service search.

chunk_size

Number of records per request. Default: 100.

Details

It first downloads the initial XML, reads the ⁠<summary count="...">⁠ attribute to know how many records exist, then iterates in chunks (default: 100) until all records are read.

Value

A tibble with one row per metadata record, identical in structure to the return value of read_metadata(), but for all pages.

See Also

read_metadata()

Examples

## Not run: 
all_md <- read_metadata_all()
nrow(all_md)

## End(Not run)