Which Countries have Large Scale Fishing Vessels?

05 Jun 2019

R / Viz

Introduction
Data
Data Summary
Exploratory Data Analysis
Data Cleaning
Data Visualisation
Conclusions
References

Introduction

I was inspired to analyse and visualise global fishing open data by the story produced by Jon Olav Eikenes.

This is also a sequel to my previous shark analysis post Is Swimming with Sharks Dangerous?, digging further into the potential reduction in shark numbers due to commercial fishing.

# Load packages
library(tidyverse)
library(sf)
library(rnaturalearth)
library(biscale)
library(cowplot)

Data

We will use the Global Fishing Watch Vessel Identity open data available with the Vessels metadata available under the Creative Commons Attribution-ShareAlike 4.0 International license.

Description: This table includes all mmsi that were matched to a vessel regsitry, were identified through manual review or web searchers, or were classified by the neural network. MMSI that are not included did not have enough activity during our time period (2012 to 2016) to be classified by the neural network (vessels had to have at least 500 positions over a six month period). If an mmsi matched to multiple vessels, that mmsi may be repeated in this table.

In order to download the data we need to register and sign into the Global Fishing Watch website. Therefore this data has been downloaded and saved offline.

First read in a sample using readxl package then read in the large csv file using the data.table package.

vessels_sample <- read_csv("vessels.csv",n_max = 30)

## Parsed with column specification:
## cols(
##   mmsi = col_integer(),
##   shipname = col_character(),
##   callsign = col_character(),
##   flag = col_character(),
##   imo = col_character(),
##   registry_geartype = col_character(),
##   inferred_geartype = col_character(),
##   inferred_geartype_score = col_character(),
##   inferred_subgeartype = col_character(),
##   inferred_subgeartype_score = col_character(),
##   registry_length = col_integer(),
##   inferred_length = col_double(),
##   registry_tonnage = col_integer(),
##   inferred_tonnage = col_double(),
##   registry_engine_power = col_integer(),
##   inferred_engine_power = col_double(),
##   source = col_character()
## )

vessels <- data.table::fread("vessels.csv",data.table = FALSE)

Next retrieve the basemap data from the rnaturalearth R package. We will use the medium scale for zoomed out maps of countries. This dataset includes population and GDP values for 2016-2017.

# Get the world sf dataset from rnaturalearth package
world <- ne_countries(scale = "medium", returnclass = "sf")
# Check the class of world
class(world)

## [1] "sf"         "data.frame"

Data Summary

Let’s take a look at summary of the vessel dataset.

# View the vessels datasets 
vessels_sample %>%  
      glimpse()

## Observations: 30
## Variables: 17
## $ mmsi                       <int> 235050379, 412002468, 412068795, 41...
## $ shipname                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ callsign                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ flag                       <chr> "GBR", "CHN", "CHN", "CHN", "CHN", ...
## $ imo                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ registry_geartype          <chr> NA, "fishing", "fishing", "fishing"...
## $ inferred_geartype          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ inferred_geartype_score    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ inferred_subgeartype       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ inferred_subgeartype_score <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ registry_length            <int> NA, 40, 30, NA, NA, NA, NA, NA, NA,...
## $ inferred_length            <dbl> 24.30837, NA, NA, NA, NA, NA, NA, N...
## $ registry_tonnage           <int> NA, NA, NA, 322, 362, 362, 470, 470...
## $ inferred_tonnage           <dbl> 95.89842, NA, NA, NA, NA, NA, NA, N...
## $ registry_engine_power      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ inferred_engine_power      <dbl> 344.6199, NA, NA, NA, NA, NA, NA, N...
## $ source                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,...

From the metadata it appears that the flag refers to to the country codes.

flag: an iso3 value for the flag state of the vessel. Only for vessels that have been matched to registries or have known values.

# Reduce the variables to only required variables
vessels<- vessels %>% 
      select(mmsi, flag, registry_geartype, inferred_geartype, registry_tonnage, inferred_tonnage)

View the vessels count by country.

# Count the vessels by countries 
vessels %>% 
      group_by(flag) %>% 
      count() %>% 
      arrange(desc(n))

## # A tibble: 232 x 2
## # Groups:   flag [232]
##    flag       n
##    <chr>  <int>
##  1 CHN   137089
##  2 USA    25519
##  3 ""     17522
##  4 GBR    17009
##  5 NLD    16465
##  6 DEU    10645
##  7 PAN     9778
##  8 FRA     9316
##  9 NOR     8968
## 10 RUS     6292
## # ... with 222 more rows

The countries with the largest number of vessels appear to be China, USA, Great Britain, Netherlands and Germany.

This table also shows the missing flags ie not matched to a vessel registry of the total 379693.

We will also take a look at the GDP figures by country. According to the Natural Earth release notes:

Admin-0 population and GDP values have also been updated to 2016/2017 vintage (primarily from CIA World Factbook).

Typically GDP is reported per capita (ppp).

world %>% 
      select(admin,gdp_md_est) %>% 
      st_geometry()

## Geometry set for 241 features 
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 5 geometries:

## MULTIPOLYGON (((-69.89912 12.452, -69.8957 12.4...

## MULTIPOLYGON (((74.89131 37.23164, 74.84023 37....

## MULTIPOLYGON (((14.19082 -5.875977, 14.39863 -5...

## MULTIPOLYGON (((-63.00122 18.22178, -63.16001 1...

## MULTIPOLYGON (((20.06396 42.54727, 20.10352 42....

Data Cleaning

In the vessels data there are two variables, inferred and registry for each of the tonnage and the length of the vessels. We will assume that the actual values are with the registry values or the inferred if the registry is not available. The data quality is dependent on the data collection and neural network so we will also assume in this analysis that the data given is correct.

vessels <- vessels %>%  
      mutate(geartype=ifelse(is.na(registry_geartype),inferred_geartype,registry_geartype)) %>%   
      mutate(registry_tonnage=replace_na(registry_tonnage,0)) %>%   
      mutate(inferred_tonnage=replace_na(inferred_tonnage,0)) %>% 
      mutate(tonnage=registry_tonnage+inferred_tonnage) %>% 
      select(mmsi,flag, geartype,tonnage)

Summarise the geartypes by count to view the categories.

# Count the vessels by countries 
vessels %>% 
      group_by(geartype) %>% 
      count() %>% 
      arrange(desc(n))

## # A tibble: 231 x 2
## # Groups:   geartype [231]
##    geartype               n
##    <chr>              <int>
##  1 ""                236924
##  2 passenger          48239
##  3 cargo              18185
##  4 tug                11828
##  5 fishing            11640
##  6 tanker             10757
##  7 cargo_or_tanker     9953
##  8 trawlers            5547
##  9 non_fishing         4374
## 10 other_not_fishing   3994
## # ... with 221 more rows

There are many different geartype categories. Let’s remove passenger, tankers and other non fishing vessels to focusing on fishing vessels. Since this is a large dataset this step also reduces our dataset size. This data cleaning is manually performed after visual inspection of the above data summaries so this is done on a best efforts basis.

# Remove vessels with no flag or geartype and passenger or cargo types to narrow down identifiable fishing vessels
vessels <- vessels %>% 
      filter(!is.na(flag)) %>% 
      filter(!is.na(geartype)) %>% 
      filter(!geartype %in% c("passenger","cargo_or_tanker","other_not_fishing","non_fishing","gear","supply_vessel","tanker","cargo","research","","patrol_vessel","tug","dredge_non_fishing","cargo|passenger","seismic_vessel","non_fishing|trawlers","other_not_fishing|trawlers","bunker_or_tanker","reefer","bunker","cargo|passenger|supply_vessel","specialized_reefer","cargo_or_reefer","passenger|supply_vessel","cargo_or_tanker|tug","passenger|tug","other_not_fishing|supply_vessel","cargo|supply_vessel","submarine","cargo|reefer","cargo_or_tanker|trawlers","container_reefer","reefer|fish_tender","other_not_fishing|passenger","other_not_fishing|tanker","bunker|tanker","passenger|research","fish_tender|reefer","cargo|other_not_fishing|passenger","tanker|tug","fish_tender","other_not_fishing|tug","supply_vessel|tug","cargo_or_tanker|passenger","cargo_or_tanker|research","cargo|other_not_fishing","dive_vessel","helicopter","non_fishing|patrol_vessel","other_not_fishing|patrol_vessel","supply_vessel|tanker","research|supply_vessel","passenger|tanker","other_not_fishing|research"))

Now let’s check the geartypes after cleaning out non fishing geartypes.

# Count the vessels by countries 
vessels %>% 
      group_by(geartype) %>% 
      count() %>% 
      arrange(desc(n))

## # A tibble: 177 x 2
## # Groups:   geartype [177]
##    geartype                             n
##    <chr>                            <int>
##  1 fishing                          11640
##  2 trawlers                          5547
##  3 drifting_longlines                1568
##  4 tuna_purse_seines                  835
##  5 purse_seines                       647
##  6 squid_jigger                       451
##  7 drifting_longlines|set_longlines   405
##  8 set_gillnets                       343
##  9 pole_and_line                      211
## 10 purse_seines|trawlers              208
## # ... with 167 more rows

Join the vessels and the world dataframes to create an sf object for mapping.

# Join the basemap and vessels data using dplyr verb inner_join. This deactivates the geometry although there is a joined geometry column
vessels_world <- left_join(vessels,world %>%
                                select(iso_a3, admin),
                           by= c("flag"="iso_a3"))
# Reactivate the geometry using st_sf
vessels_world <- vessels_world %>% 
      st_sf()                           
# Check vessels_world is an sf object
class(vessels_world)

## [1] "sf"         "data.frame"

Now we will aggregate our joined vessels and world data by finding the total tonnage by the country flag.

# Use dplyr to aggregate the vessels_world data
vessels_world_sum <- vessels_world %>% 
      #slice(1:200) %>% 
      group_by(flag,geartype) %>% 
      summarise(tonnage_sum=sum(tonnage)) 
# rm(vessels_world)
# Save aggregated data for GitHub
# write_csv(vessels_world_sum,"vessels_world_sum.csv")

Exploratory Data Analysis

Since the data are sf objects, we can use ggplot2 to create quick initial plots.

Vessel counts

Since we have two categorical variables and frequency counts we can view the vessel counts as a heatmap. I am using cyan and blue colours as a watery theme.

# First create a frequency table
vessels_table <- as.data.frame(table(vessels$flag,vessels$geartype))
# Use ggplot to plot the heatmap with geom_tile.
ggplot(vessels_table %>% filter(Freq>500), aes(Var1,Var2)) + 
      geom_tile(aes(fill = Freq), colour = "black") + scale_fill_gradient(low = "cyan", high = "steelblue") +
      labs(x="Country Flag",y="Vessel Gear Type")

Here we see that China, Norway, USA and Iceland have the largest number of fishing vessels for the time period, with trawlers as the second largest category.

Vessel tonnage total

Now take a look at the vessels total tonnage data using ggplot2, first as a barchart.

# Sort the tonnage_sum
vessels_world_sum %>% 
      arrange(desc(tonnage_sum))

## Simple feature collection with 765 features and 3 fields (with 21 geometries empty)
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -55.8917 xmax: 180 ymax: 83.59961
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 765 x 4
## # Groups:   flag [114]
##    flag  geartype   tonnage_sum                                    geometry
##    <chr> <chr>            <dbl>                          <MULTIPOLYGON [°]>
##  1 RUS   trawlers      1941954. (((146.0456 43.40933, 146.0323 43.40713, 1~
##  2 CHN   fishing        998685. (((110.8888 19.99194, 110.9383 19.94756, 1~
##  3 NOR   fishing        483235. (((5.08584 60.30757, 5.089063 60.18877, 4.~
##  4 CHN   squid_jig~     435090. (((110.8888 19.99194, 110.9383 19.94756, 1~
##  5 ARG   fishing        380758. (((-64.54917 -54.71621, -64.43882 -54.7393~
##  6 USA   fishing        372312. (((-155.5813 19.01201, -155.6256 18.96392,~
##  7 GBR   trawlers       309395. (((-1.065576 50.69023, -1.149365 50.65571,~
##  8 ESP   trawlers       303893. (((-17.88794 27.80957, -17.98477 27.64639,~
##  9 FRO   fishing        278015. (((-6.699463 61.44463, -6.679688 61.41431,~
## 10 RUS   fish_fact~     277884. (((146.0456 43.40933, 146.0323 43.40713, 1~
## # ... with 755 more rows

# Create a barchart using geom_col
vessels_world_sum %>%
      # Filter the data as there are too many countries to view on a barchart
      filter(tonnage_sum>100000) %>% 
      ggplot()+
      geom_col(aes(flag,tonnage_sum))+
      coord_flip()

Using raw values distorts the data as larger countries will typically have a larger number of fishing vessels. Now we will normalise the tonnage data to map a chloropleth as a proportion, with the total tonnage per capita. GDP is already per capita available from the world dataset from Natural Earth, but we will also change to per million.

tonnage_prop <- left_join(vessels_world_sum, 
                          world %>%
                                select(iso_a3, admin, pop_est, gdp_md_est) %>%  
                                st_drop_geometry(),
                           by= c("flag"="iso_a3")) %>% 
      mutate(tonnage_per_mil=(tonnage_sum/pop_est)*1000000) %>% 
      mutate(gdp_per_mil=(gdp_md_est)/1000000)

Data Visualisation

Now we will create our final viz. With the new biscale R package that will classify the prepared data into mapping classes and add a bivariate legend, and a predefined bi_theme.

# Create mapping classes
bi_data <- bi_class(tonnage_prop, x = tonnage_per_mil, y = gdp_per_mil, style = "quantile", dim = 3)
# Create a legend with the per million of population abbreviation pmp
legend <- bi_legend(pal = "DkCyan",
                    dim = 3,
                    xlab = "Total Vessel Tonnage pmp",
                    ylab = "GDP pmp",
                    size = 8)
# Create a bivariate map
map <- ggplot() +
  geom_sf(data = bi_data %>% 
                filter(bi_class !="NA-NA"), 
          aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
  bi_scale_fill(pal = "DkCyan", dim = 3) +
  labs(x = NULL, 
         y = NULL, 
         title = "Global Fishing Vessels", 
         subtitle= "For years 2012-2016",
         caption = "Data Sources:\n https://www.naturalearthdata.com/ \n https://globalfishingwatch.org/") +
  bi_theme()+
      theme(plot.caption = element_text(size = 12,hjust=0.75))
# combine map with legend
cowplot::ggdraw() +
  draw_plot(map, 0, 0, 1, 1) +
  draw_plot(legend, 0.1, .1, 0.2, 0.2)

# Save ggplot for blogpost
# ggsave(filename = "vesselsmap.jpg") # default to last plot displayed

Conclusions

China, Great Britain, France, Spain and the USA appear to be big fishers. These big fishing developed countries are represented by the darker colour combination on the scale. Surprisingly, lower income countries such as Greenland, Botswana and Libya also have large fishing vessels fleets.

This is an initial view of big fisher countries who have the propensity to consume and or sell fish. It doesn’t however show in which waters they are actively fishing or what fish life are being caught. It could be useful to obtain shark numbers to see how many are being affected by fishing activities.

References

Bivariate maps with ggplot2 and sf tutorial by CC-BY-SA license.
Biscale package vignette.

# Package citations as seen in freerangestats.info/blog/2019/03/30/afl-elo-adjusted
thankr::shoulders() %>% 
  mutate(maintainer = str_squish(gsub("<.+>", "", maintainer))) %>%
  group_by(maintainer) %>%
  summarise(`Number packages` = sum(no_packages),
            packages = paste(packages, collapse = ", ")) %>%
  knitr::kable()

maintainer	Number packages	packages
Achim Zeileis	1	colorspace
Andy South	2	rnaturalearth, rnaturalearthdata
Brian Ripley	2	class, KernSmooth
Brodie Gaslam	1	fansi
Charlotte Wickham	1	munsell
Christopher Prener	1	biscale
Claus O. Wilke	1	cowplot
David Meyer	1	e1071
David Robinson	1	broom
Deepayan Sarkar	1	lattice
Dirk Eddelbuettel	2	digest, Rcpp
Edzer Pebesma	3	sf, sp, units
Gábor Csárdi	3	cli, crayon, pkgconfig
Hadley Wickham	16	assertthat, dplyr, forcats, ggplot2, gtable, haven, httr, lazyeval, modelr, plyr, rvest, scales, stringr, tidyr, tidyverse, vctrs
James Hester	1	xml2
Jennifer Bryan	2	cellranger, readxl
Jeremy Stephens	1	yaml
Jeroen Ooms	1	jsonlite
Jim Hester	3	glue, withr, readr
Joe Cheng	1	htmltools
Justin Talbot	1	labeling
Kevin Ushey	1	rstudioapi
Kirill Müller	4	DBI, hms, pillar, tibble
Lionel Henry	3	purrr, rlang, tidyselect
Marek Gagolewski	1	stringi
Matt Dowle	1	data.table
Michel Lang	1	backports
Nathan Teetor	1	zeallot
Patrick O. Perry	1	utf8
R-core	1	nlme
R Core Team	10	base, compiler, datasets, graphics, grDevices, grid, methods, stats, tools, utils
Roger Bivand	2	classInt, rgeos
Stefan Milton Bache	1	magrittr
Vitalie Spinu	1	lubridate
Winston Chang	1	R6
Yihui Xie	6	blogdown, bookdown, evaluate, knitr, rmarkdown, xfun