| Title: | R Package for Automated Monitoring of German Public Procurement Portals (Vergabeportale) for KWB-Relevant Tenders |
|---|---|
| Description: | Logs into public procurement portals (starting with Vergabemarktplatz Brandenburg), scrapes published tenders, scores them for relevance to KWB research topics (e.g. groundwater) and renders an overview report. |
| Authors: | Michael Rustler [aut, cre] (ORCID: <https://orcid.org/0000-0003-0647-7726>), Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph] |
| Maintainer: | Michael Rustler <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-06-16 18:07:17 UTC |
| Source: | https://github.com/KWB-R/kwb.tenders |
Drops tenders that are not a fit for a research institute, two ways:
title contains a building/maintenance term (see
tender_excludes()) and no strong water keyword rescues it (so a
"Grundwasser..." title is kept);
CPV shows a works / maintenance / cleaning code (45...
Bau, 50... Reparatur/Wartung, 9046/9047/9061/9064/9091...
Reinigung) without an engineering-services code (71...); hard veto,
so even "Neubau Klaeranlage" or "Reinigung Faulbehaelter" is dropped while
"Ingenieurleistungen ..." stays.
Sets is_relevant = FALSE and records the reason in an excluded column.
Matching folds umlauts / is case-insensitive.
apply_title_excludes( df, title_cols = c("Kurzbezeichnung", "Bezeichnung", "Titel"), keywords = tender_keywords(), excludes = tender_excludes() )apply_title_excludes( df, title_cols = c("Kurzbezeichnung", "Bezeichnung", "Titel"), keywords = tender_keywords(), excludes = tender_excludes() )
df |
A scored tibble (must contain |
title_cols |
Candidate title columns (those present are used). |
keywords |
Keyword groups, for the strong-keyword rescue (default
|
excludes |
Exclusion list (default |
df with vetoed rows' is_relevant set FALSE and an excluded column.
Reads the Berlin notices (berlin.de, iTWO tender backend) over HTTP and scores
them (score_layered()). The paginated HTML list (?start=N) is the primary
source: it covers the full look-back window and carries the iTWO detail link
per notice in a data-href attribute, with a date-based early stop. If the
HTML cannot be parsed it falls back to the RSS feed (latest ~50), which is also
used to backfill any missing links. No browser and no login required.
berlin_tenders( keywords = tender_keywords(), cpv_map = tender_cpv_map(), since_days = 30, max_pages = 60, relevant_only = TRUE, verbose = TRUE )berlin_tenders( keywords = tender_keywords(), cpv_map = tender_cpv_map(), since_days = 30, max_pages = 60, relevant_only = TRUE, verbose = TRUE )
keywords |
Keyword groups (default |
cpv_map |
CPV-to-group map (default |
since_days |
Stop paging once a page is entirely older than this many days
(the list is newest-first; default |
max_pages |
Safety cap on pages fetched (default |
relevant_only |
Return only relevant tenders (default |
verbose |
Print progress (default |
A scored tibble with Plattform = "Vergabeplattform Berlin".
## Not run: berlin_tenders(since_days = 30) ## End(Not run)## Not run: berlin_tenders(since_days = 30) ## End(Not run)
Convenience wrapper around vmp_bb_tenders() that also writes the overview
report. For the combined multi-portal run see screen_all_portals().
check_tenders( dir = "reports", headless = TRUE, login = FALSE, max_pages = Inf, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", screen_details = TRUE, max_detail = Inf, screen_notice = FALSE, max_notice = Inf, username = Sys.getenv("VMP_BB_USERNAME"), password = Sys.getenv("VMP_BB_PASSWORD"), keywords = tender_keywords() )check_tenders( dir = "reports", headless = TRUE, login = FALSE, max_pages = Inf, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", screen_details = TRUE, max_detail = Inf, screen_notice = FALSE, max_notice = Inf, username = Sys.getenv("VMP_BB_USERNAME"), password = Sys.getenv("VMP_BB_PASSWORD"), keywords = tender_keywords() )
dir |
Output directory for the report and caches (default |
headless |
Run chromote headless (default |
login |
Log in before scraping (default |
max_pages |
Maximum number of result pages to scrape (default |
publication_types, contracting_rules
|
Search filter passed to
|
screen_details |
Detail-page layer (default |
max_detail |
Maximum number of detail pages to screen (default |
screen_notice |
Notice-PDF layer (default |
max_notice |
Maximum number of new notice PDFs to read (default |
username, password
|
Credentials when |
keywords |
Keyword list for relevance scoring (default |
Invisibly, the scored tibble of all tenders.
## Not run: check_tenders() # public search, all pages check_tenders(max_pages = 2) # quick test ## End(Not run)## Not run: check_tenders() # public search, all pages check_tenders(max_pages = 2) # quick test ## End(Not run)
Row-binds the per-portal results, filling columns absent in some sources with
NA, and guarantees a Plattform column. Each input should be a scored tibble
(see score_relevance()) as returned by a portal connector.
combine_tenders(tenders_list)combine_tenders(tenders_list)
tenders_list |
A list of data frames (one per portal). |
One combined data frame (an empty data frame if all inputs are empty).
a <- data.frame(Plattform = "A", Kurzbezeichnung = "x", stringsAsFactors = FALSE) b <- data.frame(Plattform = "B", cpv = "71351500-8", stringsAsFactors = FALSE) combine_tenders(list(a, b))a <- data.frame(Plattform = "A", Kurzbezeichnung = "x", stringsAsFactors = FALSE) b <- data.frame(Plattform = "B", cpv = "71351500-8", stringsAsFactors = FALSE) combine_tenders(list(a, b))
Shared engine behind vmp_bb_tenders(), vmp_nrw_tenders() and
dtvp_tenders(): opens a chromote session, optionally logs in, scrapes the
extended-search results, scores them (score_relevance()), enriches via the
detail (and optional notice) layers, applies the title/CPV exclusions
(apply_title_excludes()) and tags Plattform = plattform. The detail and
notice caches are namespaced by slug, so several portals can share one
cache_dir without clobbering each other.
cosinex_tenders( base_url, plattform, slug, mount = "VMPCenter", keywords = tender_keywords(), login = FALSE, max_pages = Inf, since_days = NULL, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", screen_details = TRUE, max_detail = Inf, screen_notice = FALSE, max_notice = Inf, username = "", password = "", cache_dir = "reports", relevant_only = FALSE, headless = TRUE )cosinex_tenders( base_url, plattform, slug, mount = "VMPCenter", keywords = tender_keywords(), login = FALSE, max_pages = Inf, since_days = NULL, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", screen_details = TRUE, max_detail = Inf, screen_notice = FALSE, max_notice = Inf, username = "", password = "", cache_dir = "reports", relevant_only = FALSE, headless = TRUE )
base_url |
Portal host, e.g. |
plattform |
Display name written to the |
slug |
Short id used for the per-portal cache files (e.g. |
mount |
cosinex mount segment: |
keywords |
Keyword list for relevance scoring (default |
login |
Log in before scraping (default |
max_pages |
Maximum number of result pages to scrape (default |
since_days |
If set, stop paging once a result page is entirely older
than this many days (the search is sorted newest-first). Bounds the scrape
for large portals/award histories; |
publication_types, contracting_rules
|
Search filter passed to
|
screen_details |
Detail-page layer (default |
max_detail |
Maximum number of detail pages to screen (default |
screen_notice |
Notice-PDF layer (default |
max_notice |
Maximum number of new notice PDFs to read (default |
username, password
|
Credentials when |
cache_dir |
Directory for the detail/notice caches (default |
relevant_only |
Return only relevant tenders (default |
headless |
Run chromote headless (default |
A scored tibble with a Plattform column.
## Not run: cosinex_tenders("https://www.evergabe.nrw.de", "Vergabemarktplatz NRW", slug = "vmp_nrw", max_pages = 2) ## End(Not run)## Not run: cosinex_tenders("https://www.evergabe.nrw.de", "Vergabemarktplatz NRW", slug = "vmp_nrw", max_pages = 2) ## End(Not run)
Reads the bundled CPV label table (inst/extdata/cpv_labels.csv, columns
code, name). Edit/extend that file (or drop in the full official CPV list)
to cover more codes.
cpv_labels( path = system.file("extdata", "cpv_labels.csv", package = "kwb.tenders") )cpv_labels( path = system.file("extdata", "cpv_labels.csv", package = "kwb.tenders") )
path |
CSV file with columns |
A named character vector (names = CPV codes, values = German labels).
head(cpv_labels())head(cpv_labels())
Aggregates the CPV codes collected by enrich_with_details() into a table:
one row per code (cpv_id) with its German label (cpv_name, via
cpv_labels()), the number of tenders it appears in (n_tenders) and the KWB
research group(s) it maps to (groups). Used as the "CPV" sheet of the report.
cpv_summary( tenders, cpv_map = tender_cpv_map(), keywords = tender_keywords(), labels = cpv_labels() )cpv_summary( tenders, cpv_map = tender_cpv_map(), keywords = tender_keywords(), labels = cpv_labels() )
tenders |
A tibble with a |
cpv_map |
CPV-to-group mapping (default |
keywords |
Keyword groups, for group display names (default
|
labels |
CPV code -> name lookup (default |
A data.frame with columns cpv_id, cpv_name, n_tenders, groups,
sorted by descending frequency.
cpv_summary(data.frame(cpv = c("90700000-4, 90733000-4", "90700000-4")))cpv_summary(data.frame(cpv = c("90700000-4, 90733000-4", "90700000-4")))
The same tender is often syndicated across sources (a federal tender in the
Datenservice and in TED, a Land tender on its cosinex marketplace and the
Datenservice, ...). Rows whose normalised title matches are collapsed to one,
keeping the highest-priority platform's record (Datenservice > TED > cosinex >
Berlin) and listing every source in Plattform; the relevance groups are
unioned. Only titles with >= 20 normalised characters are matched, so short
generic titles are never merged.
dedupe_tenders(tenders, verbose = TRUE)dedupe_tenders(tenders, verbose = TRUE)
tenders |
A combined scored tibble (see |
verbose |
Print how many rows were merged (default |
tenders with cross-portal duplicates merged (fewer or equal rows).
a <- data.frame(Kurzbezeichnung = "Erneuerung Schaltanlage Wasserwerk Lodmannshagen", Plattform = "TED (EU)", groups = "Grundwasser", stringsAsFactors = FALSE) b <- data.frame(Kurzbezeichnung = "Erneuerung Schaltanlage Wasserwerk Lodmannshagen", Plattform = "Oeffentliche Vergabe (Bund)", groups = "Grundwasser", stringsAsFactors = FALSE) dedupe_tenders(combine_tenders(list(a, b)))a <- data.frame(Kurzbezeichnung = "Erneuerung Schaltanlage Wasserwerk Lodmannshagen", Plattform = "TED (EU)", groups = "Grundwasser", stringsAsFactors = FALSE) b <- data.frame(Kurzbezeichnung = "Erneuerung Schaltanlage Wasserwerk Lodmannshagen", Plattform = "Oeffentliche Vergabe (Bund)", groups = "Grundwasser", stringsAsFactors = FALSE) dedupe_tenders(combine_tenders(list(a, b)))
Thin wrapper around cosinex_tenders() for the Deutsches Vergabeportal
(dtvp.de). DTVP uses the "Center" mount; its published search is
login-free (registration is only needed to submit bids).
dtvp_tenders(keywords = tender_keywords(), ...)dtvp_tenders(keywords = tender_keywords(), ...)
keywords |
Keyword groups (default |
... |
Further arguments passed to |
A scored tibble with Plattform = "Deutsches Vergabeportal (DTVP)".
## Not run: dtvp_tenders(max_pages = 2) ## End(Not run)## Not run: dtvp_tenders(max_pages = 2) ## End(Not run)
For ongoing tenders that are not yet in cache, renders the public detail
page via session, matches the keyword groups against its full text and maps
its CPV codes to groups. Cached tenders are reused without re-fetching. The
matching group(s) are merged into groups/is_relevant; adds columns
detail_groups, cpv, cpv_groups, match_source. The updated cache is
returned as attr(result, "detail_cache").
enrich_with_details( session, tenders, keywords = tender_keywords(), cpv_map = tender_cpv_map(), ongoing_only = TRUE, max_detail = Inf, delay = 0.2, cache = NULL )enrich_with_details( session, tenders, keywords = tender_keywords(), cpv_map = tender_cpv_map(), ongoing_only = TRUE, max_detail = Inf, delay = 0.2, cache = NULL )
session |
A session from |
tenders |
A scored tibble (see |
keywords |
Keyword groups (default |
cpv_map |
CPV-to-group mapping (default |
ongoing_only |
Only screen tenders whose deadline has not passed
(default |
max_detail |
Maximum number of new detail pages to render per call
(default |
delay |
Seconds between detail pages (politeness; default |
cache |
Detail cache from a previous run (see |
tenders with the detail layer merged in; the updated cache is in
attr(result, "detail_cache").
For ongoing tenders not yet cached, reads the published announcement PDF(s)
via the logged-in session and matches the keyword groups against the text.
Adds a notice_groups column, merges it into groups/is_relevant and adds
the notice source to match_source. Requires a logged-in session. The
updated cache is returned as attr(result, "notice_cache").
enrich_with_notice( session, tenders, keywords = tender_keywords(), ongoing_only = TRUE, max_notice = Inf, delay = 0.3, cache = NULL )enrich_with_notice( session, tenders, keywords = tender_keywords(), ongoing_only = TRUE, max_notice = Inf, delay = 0.3, cache = NULL )
session |
A logged-in session from |
tenders |
A tibble (typically already passed through
|
keywords |
Keyword groups (default |
ongoing_only |
Only screen ongoing tenders (default |
max_notice |
Maximum number of new notice PDFs to read (default |
delay |
Seconds between tenders (default |
cache |
Notice cache from a previous run (see |
tenders with the notice layer merged in.
Login-free connector: downloads the OCDS notice export for the last days
days, parses each notice and scores it with score_layered() (title full
rule, description strong-only, CPV mapped). Returns relevant tenders with a
Plattform column, ready for combine_tenders() / write_tender_report().
oeffentlichevergabe_tenders( keywords = tender_keywords(), cpv_map = tender_cpv_map(), days = 7, end = Sys.Date(), relevant_only = TRUE, verbose = TRUE )oeffentlichevergabe_tenders( keywords = tender_keywords(), cpv_map = tender_cpv_map(), days = 7, end = Sys.Date(), relevant_only = TRUE, verbose = TRUE )
keywords |
Keyword groups (default |
cpv_map |
CPV-to-group map (default |
days |
Number of past days to fetch (default |
end |
Most recent date to consider (default |
relevant_only |
Keep only relevant tenders (default |
verbose |
Print per-day progress (default |
A scored tibble of (relevant) tenders; empty data frame if none.
## Not run: oeffentlichevergabe_tenders(days = 3) ## End(Not run)## Not run: oeffentlichevergabe_tenders(days = 3) ## End(Not run)
The cache (one row per already-screened tender) lets the scheduled job screen only new tenders and reuse earlier results; persisted with the report so it survives across runs.
read_detail_cache(path) write_detail_cache(cache, path)read_detail_cache(path) write_detail_cache(cache, path)
path |
Cache file path ( |
cache |
A cache data.frame (columns |
read_detail_cache() returns the cache data.frame (empty if absent);
write_detail_cache() returns path invisibly.
Read / write the notice-screening cache
read_notice_cache(path) write_notice_cache(cache, path)read_notice_cache(path) write_notice_cache(cache, path)
path |
Cache file path ( |
cache |
A cache data.frame ( |
read_notice_cache() a data.frame (empty if absent);
write_notice_cache() returns path invisibly.
Scores a tender tibble the way the VMP-BB pipeline does, but in one call for
connectors that already ship a description and CPV codes (e.g. the API
portals): title_cols use the full rule (>=1 strong OR >=2 supporting),
text_cols (long free text) are matched STRONG-only (incidental supporting
hits in long text are noise), and cpv_col codes are mapped to groups. The
three group sets are merged into groups, with match_source
(title/detail/cpv), cpv_groups, score and is_relevant.
score_layered( df, title_cols, text_cols = character(), cpv_col = NULL, keywords = tender_keywords(), cpv_map = tender_cpv_map(), exclude = TRUE )score_layered( df, title_cols, text_cols = character(), cpv_col = NULL, keywords = tender_keywords(), cpv_map = tender_cpv_map(), exclude = TRUE )
df |
A data frame of tenders. |
title_cols |
Columns scored with the full rule (e.g. the title). |
text_cols |
Columns scored strong-only (e.g. description); default none. |
cpv_col |
Name of a comma/space-separated CPV column, or |
keywords |
Keyword groups (default |
cpv_map |
CPV-to-group map (default |
exclude |
Apply |
df with groups, cpv_groups, match_source, score,
is_relevant added, sorted by descending score.
Case-insensitive substring matching of each group's keywords against all
character columns. A tender matches a group if it contains at least one
strong keyword or at least two supporting keywords; it is relevant if it
matches at least one group.
score_relevance(tenders, keywords = tender_keywords())score_relevance(tenders, keywords = tender_keywords())
tenders |
A data frame / tibble of tenders (e.g. from
|
keywords |
Keyword groups (default |
tenders with added columns groups (matching group names, comma
separated), matched_keywords, score and is_relevant, sorted by
descending score.
df <- data.frame( Bezeichnung = c("Grundwassermonitoring Brunnen", "Kanalsanierung Sensorik"), stringsAsFactors = FALSE ) res <- score_relevance(df) res[, c("Bezeichnung", "groups", "score")]df <- data.frame( Bezeichnung = c("Grundwassermonitoring Brunnen", "Kanalsanierung Sensorik"), stringsAsFactors = FALSE ) res <- score_relevance(df) res[, c("Bezeichnung", "groups", "score")]
Convenience entry point (used by the scheduled GitHub Action): wires the
built-in connectors – the cosinex marketplaces Vergabemarktplatz Brandenburg
(vmp_bb_tenders()), Vergabemarktplatz NRW (vmp_nrw_tenders()) and DTVP
(dtvp_tenders()), Vergabeplattform Berlin (berlin_tenders()), the federal
Datenservice (oeffentlichevergabe_tenders()) and TED (ted_tenders()) –
and runs them through screen_portals(). The
searches are login-free (only VMP-BB optionally logs in for the notice layer),
and a portal that fails is skipped (the others still produce the report).
screen_all_portals( dir = "reports", vmp_bb = TRUE, nrw = TRUE, dtvp = TRUE, berlin = TRUE, oeffentlichevergabe = TRUE, ted = TRUE, vmp_bb_login = FALSE, vmp_bb_notice = FALSE, nrw_login = FALSE, nrw_notice = FALSE, since_days = 30, cosinex_contracting_rules = "VOL", keywords = tender_keywords(), verbose = TRUE )screen_all_portals( dir = "reports", vmp_bb = TRUE, nrw = TRUE, dtvp = TRUE, berlin = TRUE, oeffentlichevergabe = TRUE, ted = TRUE, vmp_bb_login = FALSE, vmp_bb_notice = FALSE, nrw_login = FALSE, nrw_notice = FALSE, since_days = 30, cosinex_contracting_rules = "VOL", keywords = tender_keywords(), verbose = TRUE )
dir |
Output directory (default |
vmp_bb, nrw, dtvp, berlin, oeffentlichevergabe, ted
|
Enable each source (all |
vmp_bb_login, vmp_bb_notice
|
Log in / read notice PDFs for VMP-BB
(default |
nrw_login, nrw_notice
|
Log in / read notice PDFs for Vergabemarktplatz NRW
(default |
since_days |
Unified look-back window in days, applied to every portal by
publication date (default |
cosinex_contracting_rules |
Procurement regulations (Vergabeart) for the
cosinex portals (Brandenburg/NRW/DTVP), default |
keywords |
Keyword groups (default |
verbose |
Print progress (default |
Invisibly, the combined scored tibble.
## Not run: screen_all_portals(vmp_bb_login = TRUE, vmp_bb_notice = TRUE) ## End(Not run)## Not run: screen_all_portals(vmp_bb_login = TRUE, vmp_bb_notice = TRUE) ## End(Not run)
Calls each source connector (a function returning a scored tender tibble),
tagging it with a Plattform, combines the results with combine_tenders()
and writes one report via write_tender_report(). A source that errors is
logged and skipped, so one portal failing does not abort the run.
screen_portals( sources, dir = "reports", portal = "tenders", keywords = tender_keywords(), keep_types = c("Ausschreibung", "Geplante Ausschreibung", "Vergebener Auftrag"), since_days = NULL, dedupe = TRUE, verbose = TRUE )screen_portals( sources, dir = "reports", portal = "tenders", keywords = tender_keywords(), keep_types = c("Ausschreibung", "Geplante Ausschreibung", "Vergebener Auftrag"), since_days = NULL, dedupe = TRUE, verbose = TRUE )
sources |
A named list of functions, each returning a scored tibble
(e.g. |
dir |
Output directory (default |
portal |
File-name id for the combined report (default |
keywords |
Passed to connectors that take it (currently informational). |
keep_types |
Keep only these |
since_days |
If set, keep only notices whose |
dedupe |
Merge cross-portal duplicates with |
verbose |
Print progress (default |
Invisibly, the combined scored tibble.
## Not run: screen_portals(list( "Oeffentliche Vergabe" = function() oeffentlichevergabe_tenders(days = 7), "TED" = function() ted_tenders() )) ## End(Not run)## Not run: screen_portals(list( "Oeffentliche Vergabe" = function() oeffentlichevergabe_tenders(days = 7), "TED" = function() ted_tenders() )) ## End(Not run)
Login-free EU connector. Full-text queries terms (German water terms)
restricted to countries, fetches matching notices and scores them with
score_layered(). Returns relevant tenders with a Plattform column.
ted_tenders( keywords = tender_keywords(), cpv_map = tender_cpv_map(), terms = ted_default_terms(), countries = "DEU", since_days = 90, scope = "ACTIVE", max_pages = 5, page_size = 100, relevant_only = TRUE, verbose = TRUE )ted_tenders( keywords = tender_keywords(), cpv_map = tender_cpv_map(), terms = ted_default_terms(), countries = "DEU", since_days = 90, scope = "ACTIVE", max_pages = 5, page_size = 100, relevant_only = TRUE, verbose = TRUE )
keywords |
Keyword groups (default |
cpv_map |
CPV-to-group map (default |
terms |
Full-text query terms (default: a built-in water-term set). |
countries |
Place-of-performance country codes (default |
since_days |
Only notices published within the last N days (default |
scope |
Notice scope ( |
max_pages, page_size
|
Pagination caps (default |
relevant_only |
Keep only relevant tenders (default |
verbose |
Print progress (default |
A scored tibble of (relevant) tenders; empty data frame if none.
## Not run: ted_tenders(max_pages = 1) ## End(Not run)## Not run: ted_tenders(max_pages = 1) ## End(Not run)
CPV-code to research-group mapping
tender_cpv_map( path = system.file("extdata", "cpv_groups.yml", package = "kwb.tenders") )tender_cpv_map( path = system.file("extdata", "cpv_groups.yml", package = "kwb.tenders") )
path |
YAML file mapping CPV prefixes to group slugs
( |
A list of entries, each a list with prefix and groups.
str(tender_cpv_map())str(tender_cpv_map())
Navigates the (JavaScript-rendered) public detail page via the chromote session and reads the rendered text. No login required.
tender_detail_text(session, url, wait = 10)tender_detail_text(session, url, wait = 10)
session |
A session from |
url |
Project detail URL (the |
wait |
Maximum seconds to wait for the page to render (default |
A list with text (rendered page text) and cpv (character vector).
## Not run: session <- vmp_bb_session() tender_detail_text(session, tenders$Aktion[1]) ## End(Not run)## Not run: session <- vmp_bb_session() tender_detail_text(session, tenders$Aktion[1]) ## End(Not run)
Reads inst/extdata/keywords_exclude.yml – terms that mark a tender as not
relevant when they appear in its title (and no strong water keyword does). Used
by apply_title_excludes(). This file is deliberately ignored by
tender_keywords(); it is not a research group.
tender_excludes( path = system.file("extdata", "keywords_exclude.yml", package = "kwb.tenders") )tender_excludes( path = system.file("extdata", "keywords_exclude.yml", package = "kwb.tenders") )
path |
YAML file with a |
A list with name and terms (character vector).
Reads the keyword lists for all KWB research groups shipped with the package
(inst/extdata/keywords.yml). Each group has a display name and strong /
supporting keyword vectors.
tender_keywords(dir = system.file("extdata", package = "kwb.tenders"))tender_keywords(dir = system.file("extdata", package = "kwb.tenders"))
dir |
Directory holding the per-group keyword files
( |
A named list of groups (named by slug), each a list with name,
strong, supporting.
names(tender_keywords())names(tender_keywords())
Opens the (logged-in) detail page, finds the published Bekanntmachung PDF
link(s) and returns their combined extracted text. Requires a logged-in
session (see vmp_bb_login()); no bidder registration needed.
tender_notice_text(session, detail_url, max_pdfs = 3)tender_notice_text(session, detail_url, max_pdfs = 3)
session |
A logged-in session from |
detail_url |
The tender's detail URL (the |
max_pdfs |
Maximum number of PDFs to read per tender (default |
The combined PDF text (empty string if none/!accessible).
## Not run: s <- vmp_bb_session(); vmp_bb_login(s) tender_notice_text(s, tenders$Aktion[1]) ## End(Not run)## Not run: s <- vmp_bb_session(); vmp_bb_login(s) tender_notice_text(s, tenders$Aktion[1]) ## End(Not run)
Logs in via the Keycloak SSO form. Note: the public tender search works
without login (see vmp_bb_scrape_tenders()), so logging in is optional.
vmp_bb_login( session, username = Sys.getenv("VMP_BB_USERNAME"), password = Sys.getenv("VMP_BB_PASSWORD"), auth_url = VMP_BB_AUTH_URL )vmp_bb_login( session, username = Sys.getenv("VMP_BB_USERNAME"), password = Sys.getenv("VMP_BB_PASSWORD"), auth_url = VMP_BB_AUTH_URL )
session |
A session from |
username, password
|
Credentials (default env vars |
auth_url |
Login (Keycloak SSO) URL (default the Brandenburg one; other
cosinex portals pass their own via |
The session, invisibly. Errors if the login is rejected.
## Not run: session <- vmp_bb_session() vmp_bb_login(session) ## End(Not run)## Not run: session <- vmp_bb_session() vmp_bb_login(session) ## End(Not run)
Applies a filter via the portal's deep-link (the search state is a base64 JSON in the URL hash) and scrapes the result table across pages. Works without login (the search is public).
vmp_bb_scrape_tenders( session, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", max_pages = Inf, search_url = VMP_BB_SEARCH_URL, stop_before = NULL )vmp_bb_scrape_tenders( session, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", max_pages = Inf, search_url = VMP_BB_SEARCH_URL, stop_before = NULL )
session |
A session from |
publication_types |
Publication types to include. Default
|
contracting_rules |
Procurement regulations to include. Default |
max_pages |
Maximum number of result pages to scrape (default |
search_url |
Extended-search URL (default the Brandenburg one; other
cosinex portals pass their own via |
stop_before |
Optional |
A tibble with one row per tender (all pages combined). The Aktion
column holds the project detail URL; the Veroeffentlichungstyp column
labels each row ("Ausschreibung" / "Geplante Ausschreibung").
## Not run: session <- vmp_bb_session() tenders <- vmp_bb_scrape_tenders(session, max_pages = 2) ## End(Not run)## Not run: session <- vmp_bb_session() tenders <- vmp_bb_scrape_tenders(session, max_pages = 2) ## End(Not run)
Creates a headless Chrome session via chromote. The portal performs
cross-origin SSO redirects, which a direct chromote session handles reliably.
vmp_bb_session(headless = TRUE)vmp_bb_session(headless = TRUE)
headless |
Kept for API compatibility. The chromote backend always runs
headless; |
A chromote::ChromoteSession object.
## Not run: session <- vmp_bb_session() ## End(Not run)## Not run: session <- vmp_bb_session() ## End(Not run)
The VMP-BB connector for screen_portals() / screen_all_portals(): a thin
wrapper around cosinex_tenders() pinned to Vergabemarktplatz Brandenburg. It
opens a chromote session, optionally logs in, scrapes tenders, scores them
(score_relevance()), enriches via the detail and (optional) notice layers,
applies the title exclusions (apply_title_excludes()) and tags
Plattform = "Vergabemarktplatz Brandenburg". Returns the scored tibble (it writes no
report); the detail/notice screening caches are read/written under cache_dir.
vmp_bb_tenders( keywords = tender_keywords(), login = FALSE, max_pages = Inf, since_days = NULL, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", screen_details = TRUE, max_detail = Inf, screen_notice = FALSE, max_notice = Inf, username = Sys.getenv("VMP_BB_USERNAME"), password = Sys.getenv("VMP_BB_PASSWORD"), cache_dir = "reports", relevant_only = FALSE, headless = TRUE )vmp_bb_tenders( keywords = tender_keywords(), login = FALSE, max_pages = Inf, since_days = NULL, publication_types = c("ExAnte", "Tender"), contracting_rules = "VOL", screen_details = TRUE, max_detail = Inf, screen_notice = FALSE, max_notice = Inf, username = Sys.getenv("VMP_BB_USERNAME"), password = Sys.getenv("VMP_BB_PASSWORD"), cache_dir = "reports", relevant_only = FALSE, headless = TRUE )
keywords |
Keyword list for relevance scoring (default |
login |
Log in before scraping (default |
max_pages |
Maximum number of result pages to scrape (default |
since_days |
If set, stop scraping pages older than this many days
(results are newest-first); |
publication_types, contracting_rules
|
Search filter passed to
|
screen_details |
Detail-page layer (default |
max_detail |
Maximum number of detail pages to screen (default |
screen_notice |
Notice-PDF layer (default |
max_notice |
Maximum number of new notice PDFs to read (default |
username, password
|
Credentials when |
cache_dir |
Directory for the detail/notice caches (default |
relevant_only |
Return only relevant tenders (default |
headless |
Run chromote headless (default |
A scored tibble with a Plattform column.
## Not run: vmp_bb_tenders(max_pages = 2) ## End(Not run)## Not run: vmp_bb_tenders(max_pages = 2) ## End(Not run)
Thin wrapper around cosinex_tenders() for Vergabemarktplatz NRW
(evergabe.nrw.de). The published search is login-free; an optional login
(login = TRUE, or screen_notice = TRUE for the Bekanntmachung-PDF layer)
uses the same cosinex Keycloak flow as Brandenburg and needs an NRW account.
vmp_nrw_tenders( keywords = tender_keywords(), username = Sys.getenv("VMP_NRW_USERNAME"), password = Sys.getenv("VMP_NRW_PASSWORD"), ... )vmp_nrw_tenders( keywords = tender_keywords(), username = Sys.getenv("VMP_NRW_USERNAME"), password = Sys.getenv("VMP_NRW_PASSWORD"), ... )
keywords |
Keyword groups (default |
username, password
|
NRW credentials for the optional login (default env
vars |
... |
Further arguments passed to |
A scored tibble with Plattform = "Vergabemarktplatz NRW".
## Not run: vmp_nrw_tenders(max_pages = 2) ## End(Not run)## Not run: vmp_nrw_tenders(max_pages = 2) ## End(Not run)
Writes a dated Excel workbook (sheets "Relevant", "Alle", "Neu"), a
latest.md summary, a browsable latest.html (for GitHub Pages) and a small
state file used to flag tenders that are new since the previous run.
write_tender_report( tenders, dir = "reports", portal = "vmp-bb", date = Sys.time() )write_tender_report( tenders, dir = "reports", portal = "vmp-bb", date = Sys.time() )
tenders |
A scored tibble (see |
dir |
Output directory (created if needed). Default |
portal |
Short portal id used in file names. Default |
date |
Report timestamp (default |
Invisibly, a list with the written file paths and counts.
## Not run: tenders <- score_relevance(vmp_bb_scrape_tenders(session)) write_tender_report(tenders) ## End(Not run)## Not run: tenders <- score_relevance(vmp_bb_scrape_tenders(session)) write_tender_report(tenders) ## End(Not run)