---
title: "tutorial"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{tutorial}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)
```
```{r setup}
library(kwb.tenders)
```
## Overview
`kwb.tenders` automates checking German public procurement portals
("Vergabeportale") for tenders relevant to KWB research topics. The first
supported portal is **Vergabemarktplatz Brandenburg** (VMP-BB).
Pipeline: open a browser → scrape all published tenders → score them
for relevance (groundwater keywords) → write an Excel + Markdown report
that flags what is new since the previous run. The browser is driven directly
via `chromote` (headless), which works locally and on headless CI runners.
## One-shot run
```{r}
check_tenders() # public search, all pages
check_tenders(max_pages = 2) # quick test (first 2 pages)
```
This writes `reports/vmp-bb_.xlsx` (sheets *Relevant* / *Alle* / *Neu*)
and `reports/latest.md`.
## Login is optional
The public tender search returns results **without** logging in, so
`check_tenders()` does not log in by default. If you have valid credentials and
want to log in (env vars `VMP_BB_USERNAME` / `VMP_BB_PASSWORD`, e.g. in
`~/.Renviron`):
```{r}
check_tenders(login = TRUE)
```
## Step by step
```{r}
session <- vmp_bb_session()
# vmp_bb_login(session) # optional
tenders <- vmp_bb_scrape_tenders(session, max_pages = 2)
scored <- score_relevance(tenders)
write_tender_report(scored)
session$close()
# session$view() # open a live view of the headless session in your browser
```
## Research groups & keywords
Tenders are scored against **all KWB research groups** and each relevant tender
is tagged (column `groups`) with the matching group(s). The keyword lists live
in `inst/extdata/keywords_.yml` -- one file per group, each with a display
`name` and `strong` / `supporting` vectors. A tender matches a group if it
contains at least one `strong` keyword OR at least two `supporting` keywords, and
is relevant if it matches at least one group. Matching is case-insensitive and
folds umlauts (so "Klärschlamm" and "Klaerschlamm" both match).
```{r}
kw <- tender_keywords()
names(kw) # the research-group slugs
str(kw$groundwater)
# Score against a custom subset (e.g. only two groups):
scored <- score_relevance(tenders, keywords = kw[c("groundwater", "water-risk")])
```
Edit the `inst/extdata/keywords_.yml` files to tune the keywords, or add a
new file to add a group -- no code change needed.
## Two relevance layers
Beyond the result-table title (layer 1), `check_tenders(screen_details = TRUE)`
(the default) opens each *ongoing* tender's public detail page and matches the
full description text **plus** the CPV procurement codes (mapped to groups via
`inst/extdata/cpv_groups.yml`). The report's `match_source` column shows which
layer flagged each tender (`title` / `detail` / `cpv`). This needs no login (the
detail page is public). The scheduled job caches detail results (on `gh-pages`)
and only deep-screens **new** tenders, so daily runs stay cheap while coverage
grows; cap the per-run fetches with `max_detail`.
```{r}
check_tenders(screen_details = TRUE, max_detail = 50)
```
## Automation (GitHub Actions)
The workflow `.github/workflows/check-tenders.yaml` runs `check_tenders()` on a
schedule (weekdays, 05:00 UTC by default), commits the updated report to the
repository and uploads the Excel file as a build artifact. Change the `cron:`
expression to adjust the frequency.