COVID-19 visualisations


The recent COVID-19 outbreak has caused much disruptions to people’s daily lives. As the policy of self-isolation gets adopted by many countries around the world, many people took to social media to share important resources and data visualisation illustrating the severity of COVID-19. I want to quite clear about my intentions behind this blog post: I am not an epidemiologist/biologist/medical doctor. I will refrain from making any inferences from this data (ironic for a statistician) because it would be irresponsible for me to make commentaries on an ongoing public health crisis in which I am not an expert on. I am only here to show you some interesting R coding and data visualisations.

Between mid-February 2020 and mid-March 2020, I was in Cornell University (New York state) and observing the spread of COVID-19 quite closely. I was increasingly worried about the dramatic increase in the number of cases in US and the potential shutdown of the Australian border. I was on the brink of re-booking all my flights before it is too late. It is around the same time that I was asked by my supervisor back in Australia to design a lecture in Shiny apps, so I thought it will be useful for me to write an app and other visualisations to answer the following questions:

  1. What do the confirmed cases for each country looks like? What is the days-lagged in confirmed cases for each compare when compare to China (i.e. cross-correlation)?
  2. What are the Sydney-bound flights that had confirmed cases? Is there a route that is safer than others?

Shiny app for confirmed cases and added cases

This app attemps to answer the first question, code: I can’t afford a server at this point, so you will need to run this app locally by reading through the instructions in the README of that repo.

Based on my simple visualisations at the time (~15 March 2020), I estimated that US’s major outbreak lags behind that of China by about 45 days or so. So it wasn’t so dangerous for me when I was in Cornell around mid-March, however, it was definitely not ideal as the county I was in already had two confirmed cases. Any delays in my departure could spell trouble. This is unfortunately true since at the time of writing, US has overtaken China in confirmed cases and New York state shares the biggest percentage of those confirmed cases.

The structure of the app is quite simple:

  • The COVID-19 data is fetched using the nCov2019 package using this line of code here
  • Cumulative confirmed cases are extracted here and time series plot is made here and the cross-correlation plot is made here.
  • Similarly, the plots for added cases are here and here.

Interactive animation of flights with confirmed case

This is a standalone RMarkdown document:

The New South Wales Health website publishes a list of flights with confirmed cases of COVID-19.

The coding beind this visualisation is also quite straight-forward:

  • The data are scrapped from the NSW Health website using the xml2 and rvest packages. I particularly like the elegance of the coding style using tidyver to scrap this data, though some inspiration came from this StackOverflow thread
url = ""
raw = xml2::read_html(url)

raw_flights_tbl = raw %>%
  rvest::html_node(xpath = ".//div[@id='ctl00_PlaceHolderMain_contentc1__ControlWrapper_RichHtmlField']/table") %>%
  rvest::html_table() %>% 
  as_tibble() %>% 
  • The geographical locations are then queried through Google Maps API for their longitude and latitudes.
all_geocode = tibble(
  location = c(flights_tbl$origin, flights_tbl$destination) %>% unique,
  geocode = purrr::map(location, ggmap::geocode))
Kevin Y.X. Wang
Kevin Y.X. Wang
Senior Data Scientist

Senior Data scientist at Illumina. PhD in Statistics.