| Title: | Join Gridded Weather Data to Event Tables |
|---|---|
| Description: | High-level tools to attach gridded weather data from the NASA POWER Project to event-based datasets. The package plans efficient spatio-temporal API calls via the 'nasapower' R package, caches downloaded segments locally, and joins weather variables back to the input table using exact or rolling joins. This package is not affiliated with or endorsed by NASA. |
| Authors: | Przemek Dolowy [aut, cre] (affiliation: Harper Adams University) |
| Maintainer: | Przemek Dolowy <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.3 |
| Built: | 2026-06-04 06:50:41 UTC |
| Source: | https://github.com/hauae/weatherjoin |
Attach gridded weather variables from NASA POWER to rows of an event table. The function:
standardizes/validates time input (single timestamp column or multiple time columns),
plans efficient provider calls by clustering locations (default) and splitting sparse time ranges,
caches downloaded weather segments locally and reuses them,
joins weather back to events using exact or rolling joins.
join_weather( x, params, time, lat_col = "lat", lon_col = "lon", time_api = c("guess", "hourly", "daily"), tz = "UTC", roll = c("nearest", "last", "none"), roll_max_hours = NULL, spatial_mode = c("cluster", "exact", "by_group"), group_col = NULL, cluster_radius_m = 250, site_elevation = c("constant", "auto"), elev_constant = 100, elev_fun = NULL, community = "ag", cache_scope = c("user", "project"), cache_dir = NULL, verbose = FALSE, ... )join_weather( x, params, time, lat_col = "lat", lon_col = "lon", time_api = c("guess", "hourly", "daily"), tz = "UTC", roll = c("nearest", "last", "none"), roll_max_hours = NULL, spatial_mode = c("cluster", "exact", "by_group"), group_col = NULL, cluster_radius_m = 250, site_elevation = c("constant", "auto"), elev_constant = 100, elev_fun = NULL, community = "ag", cache_scope = c("user", "project"), cache_dir = NULL, verbose = FALSE, ... )
x |
A data.frame/data.table with event rows. |
params |
Character vector of NASA POWER parameter codes (e.g. |
time |
A single column name containing time (POSIXct/Date/character/numeric) OR
a character vector of column names used to assemble a timestamp (e.g. |
lat_col, lon_col
|
Column names for latitude and longitude (decimal degrees). |
time_api |
One of |
tz |
Time zone used to interpret/construct input timestamps (default |
roll |
Join behaviour when matching timestamps: |
roll_max_hours |
Maximum allowed time distance (hours) for a rolling match. If NULL, a safe default is used: 1 hour for hourly joins and 24 hours for daily joins. |
spatial_mode |
How to reduce many points to representative locations before calling POWER:
|
group_col |
Grouping column used when |
cluster_radius_m |
Clustering radius in meters when |
site_elevation |
Elevation strategy for POWER calls: |
elev_constant |
Constant elevation (meters) used when |
elev_fun |
Optional function |
community |
Passed to |
cache_scope |
Where to store cache by default: |
cache_dir |
Optional explicit cache directory. If NULL, determined by |
verbose |
If TRUE, print progress messages. |
... |
Passed through to |
A data.table with weather columns appended. Rows with missing/invalid inputs keep their original values and receive NA weather.
wj_cache_list, wj_cache_clear, weatherjoin_options
Most users will not need to change package options. Advanced configuration can be
controlled via options().
weatherjoin.cache_max_age_days Cache entries older than this (days) are considered stale (default 60).
weatherjoin.cache_refresh When to refetch: one of "if_missing", "if_stale", "always" (default "if_missing").
weatherjoin.cache_match_mode Cache matching mode: "cover" (cached window covers requested) or "exact" (default "cover").
weatherjoin.cache_param_match Parameter matching for cache reuse: "superset" or "exact" (default "superset").
weatherjoin.cache_pkg Internal namespace used when cache_scope="user" (default "weatherjoin").
These options control how sparse time series are split into separate provider calls. They are primarily performance controls; incorrect values will not change the meaning of returned weather values, only how much data is downloaded and cached.
weatherjoin.split_penalty_hours Gap threshold (hours). Larger values yield fewer, wider time windows (default 72).
weatherjoin.pad_hours Padding (hours) added to both ends of each planned time window (default 0).
weatherjoin.max_parts Maximum number of planned time windows per representative location (default 50).
weatherjoin.dummy_hour Hour used when constructing daily timestamps (default 12).
weatherjoin.keep_rep_cols If TRUE, keep representative-location diagnostics (rep_lon/rep_lat, distance, elevation) in outputs (default FALSE).
Use withr for temporary changes:
withr::local_options(list( weatherjoin.split_penalty_hours = 168, weatherjoin.max_parts = 25 ))
Deletes cached files and (optionally) removes rows from the cache index.
wj_cache_clear( cache_dir = NULL, cache_scope = c("user", "project"), pkg = "weatherjoin", filter = NULL, keep_index = FALSE, dry_run = FALSE, verbose = TRUE )wj_cache_clear( cache_dir = NULL, cache_scope = c("user", "project"), pkg = "weatherjoin", filter = NULL, keep_index = FALSE, dry_run = FALSE, verbose = TRUE )
cache_dir |
Optional explicit cache directory. |
cache_scope |
Where to store cache by default: |
pkg |
Package name used for |
filter |
Optional expression evaluated within the cache index to select entries to remove. |
keep_index |
If |
dry_run |
If |
verbose |
If |
Invisibly returns the rows selected for deletion.
Returns the cache index (one row per cached segment).
wj_cache_list( cache_dir = NULL, cache_scope = c("user", "project"), pkg = "weatherjoin" )wj_cache_list( cache_dir = NULL, cache_scope = c("user", "project"), pkg = "weatherjoin" )
cache_dir |
Optional explicit cache directory. |
cache_scope |
Where to store cache by default: |
pkg |
Package name used for |
A data.table index of cached segments.
Ensures the cache index contains required columns and correct types.
wj_cache_upgrade_index( cache_dir = NULL, cache_scope = c("user", "project"), pkg = "weatherjoin", verbose = TRUE )wj_cache_upgrade_index( cache_dir = NULL, cache_scope = c("user", "project"), pkg = "weatherjoin", verbose = TRUE )
cache_dir |
Optional explicit cache directory. |
cache_scope |
Where to store cache by default: |
pkg |
Package name used for |
verbose |
If |
The upgraded cache index.