Is Swimming with Sharks Dangerous?

Introduction

We recently went swimming with sharks with a Marine Biologist. We asked the question “Is swimming with sharks dangerous?” in the dive brief about how to swim with sharks.

This is a spatial analysis of historical shark attacks.

# Load packages
library(tidyverse)
library(sf)
library(rnaturalearth)

Data

We will use the International Shark Attack File (ISAF) global shark attack csv dataset from 1580 until July 26, 2018.

url <- "https://www.engineeringbigdata.com/wp-content/uploads/global_shark_attacks.csv"
sharks <- read_csv(url)
## Parsed with column specification:
## cols(
##   Country = col_character(),
##   Total = col_integer()
## )

Next retrieve the basemap data.

# Get the world sf dataset from rnaturalearth package
world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)
## [1] "sf"         "data.frame"
# Join the basemap and sharks data
sharks_full <- left_join(sharks,world,by=c("Country"="name"))

Data Summary

Let’s take a look at summary of the joined datasets.

# View the joined datasets by descending Totals
sharks_full %>%  
      arrange(desc(Total)) 
## # A tibble: 90 x 65
##    Country Total scalerank featurecla labelrank sovereignt sov_a3 adm0_dif
##    <chr>   <int>     <int> <chr>          <dbl> <chr>      <chr>     <dbl>
##  1 USA      1407        NA <NA>              NA <NA>       <NA>         NA
##  2 Austra~   621         1 Admin-0 c~         2 Australia  AU1           1
##  3 Republ~   252        NA <NA>              NA <NA>       <NA>         NA
##  4 Brazil    104         1 Admin-0 c~         2 Brazil     BRA           0
##  5 New Ze~    51         1 Admin-0 c~         2 New Zeala~ NZ1           1
##  6 Papua ~    48         1 Admin-0 c~         2 Papua New~ PNG           0
##  7 Mascar~    42        NA <NA>              NA <NA>       <NA>         NA
##  8 Mexico     40         1 Admin-0 c~         2 Mexico     MEX           0
##  9 Bahama~    30        NA <NA>              NA <NA>       <NA>         NA
## 10 Iran       23         1 Admin-0 c~         2 Iran       IRN           0
## # ... with 80 more rows, and 57 more variables: level <dbl>, type <chr>,
## #   admin <chr>, adm0_a3 <chr>, geou_dif <dbl>, geounit <chr>,
## #   gu_a3 <chr>, su_dif <dbl>, subunit <chr>, su_a3 <chr>, brk_diff <dbl>,
## #   name_long <chr>, brk_a3 <chr>, brk_name <chr>, brk_group <chr>,
## #   abbrev <chr>, postal <chr>, formal_en <chr>, formal_fr <chr>,
## #   note_adm0 <chr>, note_brk <chr>, name_sort <chr>, name_alt <chr>,
## #   mapcolor7 <dbl>, mapcolor8 <dbl>, mapcolor9 <dbl>, mapcolor13 <dbl>,
## #   pop_est <dbl>, gdp_md_est <dbl>, pop_year <dbl>, lastcensus <dbl>,
## #   gdp_year <dbl>, economy <chr>, income_grp <chr>, wikipedia <dbl>,
## #   fips_10 <chr>, iso_a2 <chr>, iso_a3 <chr>, iso_n3 <chr>, un_a3 <chr>,
## #   wb_a2 <chr>, wb_a3 <chr>, woe_id <dbl>, adm0_a3_is <chr>,
## #   adm0_a3_us <chr>, adm0_a3_un <dbl>, adm0_a3_wb <dbl>, continent <chr>,
## #   region_un <chr>, subregion <chr>, region_wb <chr>, name_len <dbl>,
## #   long_len <dbl>, abbrev_len <dbl>, tiny <dbl>, homepart <dbl>,
## #   geometry <MULTIPOLYGON [°]>

It seems like some of the countries didn’t match up. Let’s extract the NA’s.

# Filter the NA countries from the join
sharks_full %>% 
      filter(is.na(scalerank))
## # A tibble: 21 x 65
##    Country Total scalerank featurecla labelrank sovereignt sov_a3 adm0_dif
##    <chr>   <int>     <int> <chr>          <dbl> <chr>      <chr>     <dbl>
##  1 USA      1407        NA <NA>              NA <NA>       <NA>         NA
##  2 Republ~   252        NA <NA>              NA <NA>       <NA>         NA
##  3 Mascar~    42        NA <NA>              NA <NA>       <NA>         NA
##  4 Bahama~    30        NA <NA>              NA <NA>       <NA>         NA
##  5 Fiji I~    22        NA <NA>              NA <NA>       <NA>         NA
##  6 Solomo~    11        NA <NA>              NA <NA>       <NA>         NA
##  7 Marsha~     8        NA <NA>              NA <NA>       <NA>         NA
##  8 Canary~     6        NA <NA>              NA <NA>       <NA>         NA
##  9 French~     5        NA <NA>              NA <NA>       <NA>         NA
## 10 Federa~     4        NA <NA>              NA <NA>       <NA>         NA
## # ... with 11 more rows, and 57 more variables: level <dbl>, type <chr>,
## #   admin <chr>, adm0_a3 <chr>, geou_dif <dbl>, geounit <chr>,
## #   gu_a3 <chr>, su_dif <dbl>, subunit <chr>, su_a3 <chr>, brk_diff <dbl>,
## #   name_long <chr>, brk_a3 <chr>, brk_name <chr>, brk_group <chr>,
## #   abbrev <chr>, postal <chr>, formal_en <chr>, formal_fr <chr>,
## #   note_adm0 <chr>, note_brk <chr>, name_sort <chr>, name_alt <chr>,
## #   mapcolor7 <dbl>, mapcolor8 <dbl>, mapcolor9 <dbl>, mapcolor13 <dbl>,
## #   pop_est <dbl>, gdp_md_est <dbl>, pop_year <dbl>, lastcensus <dbl>,
## #   gdp_year <dbl>, economy <chr>, income_grp <chr>, wikipedia <dbl>,
## #   fips_10 <chr>, iso_a2 <chr>, iso_a3 <chr>, iso_n3 <chr>, un_a3 <chr>,
## #   wb_a2 <chr>, wb_a3 <chr>, woe_id <dbl>, adm0_a3_is <chr>,
## #   adm0_a3_us <chr>, adm0_a3_un <dbl>, adm0_a3_wb <dbl>, continent <chr>,
## #   region_un <chr>, subregion <chr>, region_wb <chr>, name_len <dbl>,
## #   long_len <dbl>, abbrev_len <dbl>, tiny <dbl>, homepart <dbl>,
## #   geometry <MULTIPOLYGON [°]>

Data Cleaning

Use the stringr R package to clean the country names.

# Replace the mismatched country names where possible, create a new Country2 variable
sharks$Country2 <- sharks$Country %>% 
      str_replace_all( c("USA"="United States",
                         "Republic of South Africa"="South Africa",
                         #"Mascarene Islands (Reunion Island)"= ,
                         "Bahama Islands"="Bahamas",
                         "Fiji Islands"="Fiji",
                         "Solomon Islands"="Solomon Is.",
                         "Marshall Islands"="Marshall Is.",
                         #"Canary Islands"=,
                         "French Polynesia"="Fr. Polynesia",
                         "Federated States of Micronesia"="Micronesia"))

# Join the sharks and world datasets again
sharks_full2 <- left_join(world,sharks,by=c("name"="Country2"))
# Check the class of this new jined object is sf
class(sharks_full2 )
## [1] "sf"         "data.frame"

Replace the NA Totals with 0 for the mapping.

# Replace the NA Totals  with 0
sharks_full2$Total <- sharks_full2$Total %>% 
      replace_na(replace = 0)

Exploratory Data Analysis

Since the data are sf objects, we can use ggplot2 to create quick initial plots.

Sharks data

Let’s first take a look at the data using ggplot2.

 ggplot(data=sharks_full2)+
      geom_sf(aes(fill=Total))

This plot uses the default ggplot colour palette, which we will change to an oranges brewer palette with three breaks. We will also add labels to provide the context.

# Creat a ggplot map of the sharks dataset with labels, colours and breaks
 ggplot(data=sharks_full2)+
      geom_sf(aes(fill = cut_interval(sharks_full2$Total, 3)), 
          color = NA) +
      scale_fill_brewer(palette = "Oranges")+
      labs(x = NULL, 
         y = NULL, 
         title = "Total Global Shark Attacks 1580 to 2018", 
         caption = "Data Source: International Shark Attack File (ISAF) \n https://www.floridamuseum.ufl.edu/",
         fill="Total")  

Let’s use the viridis R package for palette selection, and the lesser known cividis palette which ranges from a blue to yellow. See these colorspace charts for comparison.

# Creat a ggplot map of the sharks dataset with labels, colours and breaks
 ggplot(data=sharks_full2)+
      geom_sf(aes(fill = cut_interval(sharks_full2$Total, 4)), 
          color = NA) +
      scale_fill_viridis_d(option = "cividis")+
      labs(x = NULL, 
         y = NULL, 
         title = "Total Global Shark Attacks 1580 to 2018", 
         caption = "Data Source: International Shark Attack File (ISAF) \n https://www.floridamuseum.ufl.edu/",
         fill="Total")  

Three breaks doesn’t seem to be enough to capture the South African attacks. We will use four breaks in the final plot.

Data Visualisation

Now we will create our final viz. We will change the legend so that it looks neater with rounded four breaks on a single row on the bottom left. The theme will also be changed to a solid background to make it presentation quality.

# Creat a ggplot map of the dataset
g <- sharks_full2  %>% 
  ggplot() + 
    geom_sf(aes(fill=Total),color=NA) +
      labs(x = NULL, 
         y = NULL, 
         title = "Global Shark Attacks ", 
         subtitle= "For years up to 2018",
         caption = "Data Source: International Shark Attack File (ISAF) \n https://www.floridamuseum.ufl.edu/") +
      # Dark background theme inspired by https://www.sharpsightlabs.com/blog/mapping-france-night/
    theme(text = element_text(family = "Gill Sans", color = "#E1E1E1"),
          plot.title = element_text(size = 18, color = "#E1E1E1",hjust = 0.5), # centre plot title
          plot.subtitle = element_text(size = 10,hjust = 0.5),
          plot.caption = element_text(size = 6),
          plot.background = element_rect(fill = "#000222"),
          panel.background = element_rect(fill = "#000222"),
          panel.grid.major = element_line(color = "#000222"),
          axis.text = element_blank(),
          axis.ticks = element_blank(),
          legend.background = element_blank(),
          legend.position = c(0.12, 0.32)
          )+ 
      # use the ggplot2 scale_fill_gradientn function as this sets the limits and a rounded interval rather than just splitting the range into 4 cut_interval. Extract the hex colours using the cividis function
    scale_fill_gradientn(colours = viridis::cividis(n=4),
                         limits=c(0,1500),
                         guide =   guide_legend( keyheight = unit(2, units = "mm"), 
                                                 keywidth=unit(4, units = "mm"), 
                                                 label.position = "bottom", 
                                                 title.position = 'top',
                                                 nrow=1))
g

# Save ggplot for blogpost
# ggsave(filename = "sharksmap.jpg",g)

Conclusions

It turns out that people are more dangerous than sharks as historic reported shark numbers are much lower than one would expect from shark attack news reports.

According to to the Sustainable Ocean Society:

Sharks are apex predators, which help to keep a balance in the ocean, this balance has been created over millions of years of evolution. Sharks are under threat world wide.

References

# Package citations as seen in freerangestats.info/blog/2019/03/30/afl-elo-adjusted
thankr::shoulders() %>% 
  mutate(maintainer = str_squish(gsub("<.+>", "", maintainer))) %>%
  group_by(maintainer) %>%
  summarise(`Number packages` = sum(no_packages),
            packages = paste(packages, collapse = ", ")) %>%
  knitr::kable() 
maintainer Number packages packages
Achim Zeileis 1 colorspace
Andy South 2 rnaturalearth, rnaturalearthdata
Baptiste Auguie 1 gridExtra
Brian Ripley 1 class
Brodie Gaslam 1 fansi
Charlotte Wickham 1 munsell
David Meyer 1 e1071
David Robinson 1 broom
Deepayan Sarkar 1 lattice
Dirk Eddelbuettel 2 digest, Rcpp
Edzer Pebesma 3 sf, sp, units
Erich Neuwirth 1 RColorBrewer
Gábor Csárdi 3 cli, crayon, pkgconfig
Hadley Wickham 15 assertthat, dplyr, forcats, ggplot2, gtable, haven, httr, lazyeval, modelr, plyr, rvest, scales, stringr, tidyr, tidyverse
James Hester 1 xml2
Jennifer Bryan 2 cellranger, readxl
Jeremy Stephens 1 yaml
Jeroen Ooms 2 curl, jsonlite
Jim Hester 3 glue, withr, readr
Joe Cheng 1 htmltools
Justin Talbot 1 labeling
Kevin Ushey 1 rstudioapi
Kirill Müller 4 DBI, hms, pillar, tibble
Lionel Henry 3 purrr, rlang, tidyselect
Marek Gagolewski 1 stringi
Michel Lang 1 backports
Patrick O. Perry 1 utf8
R-core 1 nlme
R Core Team 10 base, compiler, datasets, graphics, grDevices, grid, methods, stats, tools, utils
Roger Bivand 2 classInt, rgeos
Simon Garnier 2 viridis, viridisLite
Stefan Milton Bache 1 magrittr
Vitalie Spinu 1 lubridate
Winston Chang 1 R6
Yihui Xie 6 blogdown, bookdown, evaluate, knitr, rmarkdown, xfun