| Title: | R Package for Getting Spatial Data from Berlin Geoportal |
|---|---|
| Description: | R Package for getting spatial data from Berlin Geoportal (https://gdi.berlin.de/geonetwork/srv/ger/catalog.search#/search). |
| Authors: | Michael Rustler [aut, cre] (ORCID: <https://orcid.org/0000-0003-0647-7726>), AD4GD [fnd], Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph] |
| Maintainer: | Michael Rustler <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-06-14 19:39:21 UTC |
| Source: | https://github.com/KWB-R/kwb.geoportal |
GeoNetwork often encodes links as a single string separated by |, e.g.:
|Darstellungsdienst (WMS)|https://...|OGC:WMS|||.
This helper splits such a string into named columns and pads missing parts
up to 6 elements.
parse_gn_link(x)parse_gn_link(x)
x |
Character string as found inside a |
The order used here is:
link name
link description
link URL
link protocol (e.g. "OGC:WMS")
MIME type
order
Note: In many Berlin GDI records the first field (link name) is empty, and the actual meaningful text is in the second field (description).
A one-row tibble with columns:
link_name
link_desc
link_url
link_protocol
link_mime
link_order
parse_gn_link("|Darstellungsdienst (WMS)|https://example.org/wms?|OGC:WMS|||")parse_gn_link("|Darstellungsdienst (WMS)|https://example.org/wms?|OGC:WMS|||")
This function reads a GeoNetwork XML search response (as delivered by
https://gdi.berlin.de/geonetwork/srv/ger/q?...) and converts it into a
tidy tibble with one row per metadata record.
All <link> elements of a record are kept together in a list column
called links, where each entry is itself a tibble created by
parse_gn_link().
read_metadata(path_xml)read_metadata(path_xml)
path_xml |
Path or URL to the GeoNetwork XML document. This can be a
local file (e.g. |
This structure is convenient when you want to keep the dataset-level information (title, abstract, uuid, ...) together, but still be able to inspect or unnest all service/download/view links later on.
The function assumes a GeoNetwork-style XML with <metadata> elements and
the namespace geonet: available for the info block.
It is tailored to the GDI Berlin instance but should work for other similar
GeoNetwork responses that use the same link encoding (| separated).
If a record has no <link> elements, the links column will contain
a single-row tibble with all NA values. This preserves the 1:1 alignment
between records and rows.
A tibble with one row per metadata record and the columns:
UUID from <geonet:info><uuid> (character).
Internal GeoNetwork id from <geonet:info><id> (character).
Dataset/service title.
Dataset/service abstract/description.
Service type, if present (e.g. WMS).
Semicolon-separated <type> elements.
Logo path, if present.
List column; each element is a tibble with the parsed links.
## Not run: df <- read_geonetwork_services( "https://gdi.berlin.de/geonetwork/srv/ger/q?facet.q=type/service&resultType=details&sortBy=changeDate&from=1&to=100&fast=index" ) dplyr::glimpse(df) df$links[[1]] ## End(Not run)## Not run: df <- read_geonetwork_services( "https://gdi.berlin.de/geonetwork/srv/ger/q?facet.q=type/service&resultType=details&sortBy=changeDate&from=1&to=100&fast=index" ) dplyr::glimpse(df) df$links[[1]] ## End(Not run)
This function uses read_metadata() repeatedly to fetch all available
metadata records from a GeoNetwork endpoint that supports from / to
pagination (like the GDI Berlin instance).
read_metadata_all( base_url = "https://gdi.berlin.de/geonetwork/srv/ger/q?facet.q=type/service&resultType=details&sortBy=changeDate&fast=index", chunk_size = 100 )read_metadata_all( base_url = "https://gdi.berlin.de/geonetwork/srv/ger/q?facet.q=type/service&resultType=details&sortBy=changeDate&fast=index", chunk_size = 100 )
base_url |
Base GeoNetwork query URL without |
chunk_size |
Number of records per request. Default: 100. |
It first downloads the initial XML, reads the <summary count="...">
attribute to know how many records exist, then iterates in chunks
(default: 100) until all records are read.
A tibble with one row per metadata record, identical in structure
to the return value of read_metadata(), but for all pages.
## Not run: all_md <- read_metadata_all() nrow(all_md) ## End(Not run)## Not run: all_md <- read_metadata_all() nrow(all_md) ## End(Not run)