Recently the ABC had an article that covered the high temperatures for Perth in May from 1897 to current with a chart showing the values over time. I thought it would be fun to use {weatherOz} to recreate this and maybe dig a little deeper.
To start, you will need to install {weatherOz}, which currently was not on CRAN (when this was originally written) and {hrbrthemes} from GitHub as it is ahead of the CRAN version.
|
|
Now, the first question I had from reading the article was, “which station in Perth is this”? As that is not clear, we will see what is available from {weatherOz} keeping in mind that we also have Western Australia’s Department of Primary Industries and Regional Development’s (DPIRD) weather stations available to us for later years as well and that the SILO data are derived from BOM.
With that in mind, let’s see what stations are available for Perth.
Finding
This turned out to be more difficult than I expected.
{weatherOz} provides a find_stations_in() function1 that we can use with either an sf object or a bounding box to find stations in a geographic area of interest.
For this post, I have opted to use an sf object that I created by downloading “Greater Capital City Statistical Areas - 2021 - Shapefile” from the Australian Bureau of Statistics (ABS).
|
|
## Reading layer `GCCSA_2021_AUST_GDA2020' from data source
## `/vsizip///var/folders/vz/txwj1tx51txgw7zv_b5c5_3m0000gn/T//RtmpKQCrMw/GCCSA_2021_AUST_SHP_GDA2020.zip/GCCSA_2021_AUST_GDA2020.shp'
## using driver `ESRI Shapefile'
## replacing null geometries with empty geometries
## Simple feature collection with 35 features and 10 fields (with 19 geometries empty)
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 96.81695 ymin: -43.7405 xmax: 167.998 ymax: -9.142163
## Geodetic CRS: GDA2020
|
|
## Simple feature collection with 35 features and 10 fields (with 19 geometries empty)
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 96.81695 ymin: -43.7405 xmax: 167.998 ymax: -9.142163
## Geodetic CRS: GDA2020
## First 10 features:
## GCC_CODE21 GCC_NAME21 CHG_FLAG21 CHG_LBL21
## 1 1GSYD Greater Sydney 0 No change
## 2 1RNSW Rest of NSW 0 No change
## 3 19499 No usual address (NSW) 0 No change
## 4 19799 Migratory - Offshore - Shipping (NSW) 0 No change
## 5 2GMEL Greater Melbourne 0 No change
## 6 2RVIC Rest of Vic. 0 No change
## 7 29499 No usual address (Vic.) 0 No change
## 8 29799 Migratory - Offshore - Shipping (Vic.) 0 No change
## 9 3GBRI Greater Brisbane 0 No change
## 10 3RQLD Rest of Qld 0 No change
## STE_CODE21 STE_NAME21 AUS_CODE21 AUS_NAME21 AREASQKM21
## 1 1 New South Wales AUS Australia 12368.686
## 2 1 New South Wales AUS Australia 788428.973
## 3 1 New South Wales AUS Australia NA
## 4 1 New South Wales AUS Australia NA
## 5 2 Victoria AUS Australia 9992.608
## 6 2 Victoria AUS Australia 217503.640
## 7 2 Victoria AUS Australia NA
## 8 2 Victoria AUS Australia NA
## 9 3 Queensland AUS Australia 15842.007
## 10 3 Queensland AUS Australia 1714329.214
## LOCI_URI21
## 1 http://linked.data.gov.au/dataset/asgsed3/GCCSA/1GSYD
## 2 http://linked.data.gov.au/dataset/asgsed3/GCCSA/1RNSW
## 3 http://linked.data.gov.au/dataset/asgsed3/GCCSA/19499
## 4 http://linked.data.gov.au/dataset/asgsed3/GCCSA/19799
## 5 http://linked.data.gov.au/dataset/asgsed3/GCCSA/2GMEL
## 6 http://linked.data.gov.au/dataset/asgsed3/GCCSA/2RVIC
## 7 http://linked.data.gov.au/dataset/asgsed3/GCCSA/29499
## 8 http://linked.data.gov.au/dataset/asgsed3/GCCSA/29799
## 9 http://linked.data.gov.au/dataset/asgsed3/GCCSA/3GBRI
## 10 http://linked.data.gov.au/dataset/asgsed3/GCCSA/3RQLD
## geometry
## 1 MULTIPOLYGON (((151.2816 -3...
## 2 MULTIPOLYGON (((159.0623 -3...
## 3 MULTIPOLYGON EMPTY
## 4 MULTIPOLYGON EMPTY
## 5 MULTIPOLYGON (((144.8883 -3...
## 6 MULTIPOLYGON (((146.2929 -3...
## 7 MULTIPOLYGON EMPTY
## 8 MULTIPOLYGON EMPTY
## 9 MULTIPOLYGON (((153.2321 -2...
## 10 MULTIPOLYGON (((142.5314 -1...
|
|
Finding Weather Stations in the Greater Perth Area
With that available, let’s find stations that fall within the greater Perth area as defined by the ABS. As we need API keys to access the data from {weatherOz}, I’ll use {keyring} to safely handle credentials. The query as defined here includes closed stations, since many early records may be from closed stations.
|
|
## station_code station_name start end
## 009000 : 1 Length:87 Min. :1852-01-01 Min. :1930-01-01
## 009001 : 1 Class :character 1st Qu.:1910-01-01 1st Qu.:2001-01-01
## 009003 : 1 Mode :character Median :1962-01-01 Median :2026-01-24
## 009007 : 1 Mean :1952-09-12 Mean :2012-06-18
## 009009 : 1 3rd Qu.:1986-01-01 3rd Qu.:2026-01-24
## 009010 : 1 Max. :2026-01-01 Max. :2026-01-24
## (Other):81
## latitude longitude state elev_m
## Min. :-32.70 Min. :115.5 Length:87 Min. : 3.00
## 1st Qu.:-32.22 1st Qu.:115.8 Class :character 1st Qu.: 17.75
## Median :-32.01 Median :116.0 Mode :character Median : 40.50
## Mean :-32.06 Mean :115.9 Mean :108.37
## 3rd Qu.:-31.90 3rd Qu.:116.1 3rd Qu.:207.75
## Max. :-31.56 Max. :116.3 Max. :384.00
## NA's :7
## source status wmo
## Length:87 Length:87 Min. :94602
## Class :character Class :character 1st Qu.:94609
## Mode :character Mode :character Median :94613
## Mean :95025
## 3rd Qu.:95607
## Max. :95614
## NA's :75
This query returns 87 stations, but some are listed with “CAUTION: Do not use these observations”.
Others are from a portable station, which may not be reliable.
Let’s drop these stations.
|
|
Now there are 82 stations that start on 1852-01-01 and run to current.
Map the Stations
We can map the stations within the greater Perth area to see where they fall using {ggplot2}, who the station owner is and see if they are open or closed.
I’ve chosen to add a new column to use for the legend to indicate the station data source (owner) in a cleaner fashion using the abbreviation only to save space in the plot legend.
I’ve used the excellent Okabe and Ito palette for this colour scheme as it is colour-blind friendly and easy to see here along with Bob Rudis’s excellent {hrbrthemes}.
Lastly, using {ggdark} I’ve inverted the colour theme to use a dark background with theme_ipsum.
|
|

Visualise Station Availability
To visualise the data availability for the stations’ start and end dates, we will use {ggplot2} again.
|
|

Get Weather Data for Perth Stations
Using both the DPIRD and SILO APIs we can get weather from the mid-1800s to current.
Get DPIRD Stations’ Data
We will start with the DPIRD and DPIRD-hosted station data by creating a vector of station_code values for stations that are NOT BOM stations using subset() and then use droplevels() to drop unused factor levels from the vector.
|
|
There can be some hiccups when querying station data.
It’s a good idea to wrap it in a tryCatch() or something like purrr::possibly().
I’m a fan of using possibly() for these sorts of issues.
It makes it easy to spot stations with errors and investigate, but more importantly, it means you can use purrr::map(), lapply() or a for loop and if there is an error, you get the message you have specified and the operation carries on with the rest of the stations rather than stopping.
|
|
Now we will fill the list with weather data for the stations located in Greater Perth in the DPIRD network.
|
|
## Length Class Mode
## DFES-J Stake Hill BCoE 1 -none- character
## Floreat Park 1 -none- character
## Glen Eagle 1 -none- character
## Jarrahdale East 1 -none- character
## Jarrahdale South 1 -none- character
## Kings Park 1 -none- character
## Medina 1 -none- character
## Pinjar 1 -none- character
## Pinjar Tower CLOSED 1 -none- character
## Pinjarra 1 -none- character
## South Perth 1 -none- character
## Wanneroo 1 -none- character
## Wattleup 1 -none- character
Next, bind the list into a single data frame object using data.table::rbindlist().
|
|
## data.table 1.18.0 using 5 threads (see ?getDTthreads). Latest news: r-datatable.com
##
## Attaching package: 'data.table'
##
## The following object is masked from 'package:purrr':
##
## transpose
|
|
## Error in `rbindlist()`:
## ! Item 1 of input is not a data.frame, data.table or list
Get BOM Stations’ Data
|
|
Following the same method as with the DPIRD station network, use possibly() to ensure that there are no interruptions when fetching data.
|
|
Now we will fill the list with weather data for the stations located in Greater Perth in the BOM network.
|
|
## Length Class Mode
## Araluen 1 -none- character
## Armadale 1 -none- character
## Beenyup 1 -none- character
## Belvoir 1 -none- character
## Bentley Curtin 1 -none- character
## Bickley 1 -none- character
## Canning River Weir 1 -none- character
## Cardup 1 -none- character
## Chidlow 1 -none- character
## Churchman Brook 1 -none- character
## Elsfield 1 -none- character
## Fairbridge 1 -none- character
## Floreat Park 1 -none- character
## Fremantle 1 -none- character
## Fremantle 1 -none- character
## Garden Island 1 -none- character
## Garden Island Hsf 1 -none- character
## Gidgegannup 1 -none- character
## Gosnells City 1 -none- character
## Greenmount 1 -none- character
## Guildford Post Office 1 -none- character
## Halls Head 1 -none- character
## Herne Hill 1 -none- character
## Jandakot Aero 1 -none- character
## Jarrahdale 1 -none- character
## Kalamunda 1 -none- character
## Karnet 1 -none- character
## Karragullen 1 -none- character
## Karragullen North 1 -none- character
## Kwinana Bp Refinery 1 -none- character
## Lake Leschenaultia 1 -none- character
## Lower Chittering 1 -none- character
## Mandurah 1 -none- character
## Mandurah 1 -none- character
## Medina Research Centre 1 -none- character
## Melville 1 -none- character
## Midland 1 -none- character
## Mosman Park 1 -none- character
## Mount Victoria 1 -none- character
## Mundaring 1 -none- character
## Mundaring Weir 1 -none- character
## Nedlands Uwa 1 -none- character
## Pearce RAAF 1 -none- character
## Perth Airport 1 -none- character
## Perth Gardens 1 -none- character
## Perth Metro 1 -none- character
## Perth Regional Office 1 -none- character
## Pinjarra 1 -none- character
## Rockingham Post Office 1 -none- character
## Roleystone 1 -none- character
## Rottnest Island 1 -none- character
## Rottnest Island Lighthouse 1 -none- character
## Serpentine 1 -none- character
## Serpentine Main Dam 1 -none- character
## Springdale 1 -none- character
## Stoneville Research Stn 1 -none- character
## Subiaco 1 -none- character
## Subiaco Treatment Plant 1 -none- character
## Swanbourne 1 -none- character
## Upper Swan Research Station 1 -none- character
## Victoria Dam 1 -none- character
## Walliston 1 -none- character
## Wanneroo 1 -none- character
## Wanneroo Calm 1 -none- character
## West Swan 1 -none- character
## Whitby Falls 1 -none- character
## Wooroloo 1 -none- character
## Wungong Dam 1 -none- character
## Yanchep 1 -none- character
Unsurprisingly there were a few issues along the way.
As there are a few stations with issues, I’ve chosen to use grep() to locate the faulty stations and drop them from the list while binding it into a single data fram object.
|
|
Create Data Set for Plotting
Now that we have clean sets of data we can bind the two sets of data.
{weatherOz} intentionally sets the column names to be the same, regardless of the API queried to make this easy to do.
Using data.table::bindrows() option, fill = TRUE makes this easier than dropping columns that are not shared.
|
|
## DFES-J Stake Hill BCoE Floreat Park Glen Eagle
## Length:1 Length:1 Length:1
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Jarrahdale East Jarrahdale South Kings Park Medina
## Length:1 Length:1 Length:1 Length:1
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## Pinjar Pinjar Tower CLOSED Pinjarra South Perth
## Length:1 Length:1 Length:1 Length:1
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## Wanneroo Wattleup
## Length:1 Length:1
## Class :character Class :character
## Mode :character Mode :character
Surprisingly, there are a few NA values in the data, so we will drop those.
|
|
## Error:
## ! object 'air_tmax' not found
|
|
## DFES-J Stake Hill BCoE Floreat Park Glen Eagle
## Length:1 Length:1 Length:1
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Jarrahdale East Jarrahdale South Kings Park Medina
## Length:1 Length:1 Length:1 Length:1
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## Pinjar Pinjar Tower CLOSED Pinjarra South Perth
## Length:1 Length:1 Length:1 Length:1
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## Wanneroo Wattleup
## Length:1 Length:1
## Class :character Class :character
## Mode :character Mode :character
We are only interested in the month of May, so we will remove all the other months’ data by selecting only May (month = 5).
|
|
## Error in `month == 5`:
## ! comparison (==) is possible only for atomic and list types
Now we need to find the highest May temperatures for each station of each year.
Using .SD from {data.table}, we get this.
|
|
## Error:
## ! object 'station_code' not found
Next, create the annual mean.
|
|
## Error:
## ! object 'perth_highs' not found
Plotting May High Temperatures
Now we have a good data set for plotting.
Using {ggplot2} we can plot the high temperature for each station in May in light grey in the background with the mean stations’ maximum temperatures in the foreground. We can use {ggiraph} to make the plot interactive, similar to the ABC’s interactive version but with the extra data and greater ability to see the data points of all the stations each year.
|
|
## Error:
## ! object 'perth_highs' not found
|
|
## Error:
## ! object 'p' not found
Obviously, this data does not match ABC’s exactly, it does go back farther into the 1800s, 1852, with the inclusion of Fremantle and likely has more data in the more recent years with the inclusion of the DPIRD network data. Still, it is a fun exercise to work through getting the data, subsetting it and visualising.
-
Actually this function only now exists because I wanted to write this blog post and needed to find stations in the Perth metro area. ↩︎