Food Label Analysis

library(tidyverse)
library(reactable)
library(htmltools)

Introduction

Food labels can be confusing and hard to read, with the small numbers and text in the Nutrition Label Information table. Additionally ingredients lists can be long, and also written in small text. This information can be difficult to process on the go and without a reference point.

According to the Heart Foundation NZ, this table is a guide for what to look for on the ‘nutrition information panel’ on food labels.

Heart Foundation NZ label

As a side project, I am creating a website is to visualise food labeling data from food packaging with JavaScript libraries Node.js and D3, React and possibly using a MongoDB database.

This data analysis helps to understand the data behind the web application using visual interactive tables.

TL;DR jump to the Data Visualisation.

Data

Food and nutrition data is available from the Ministry of Health.

Plant & Food Research and the Ministry of Health jointly own the New Zealand Food Composition Database. This database source provides a comprehensive collection of nutrition information panel data as seen on food managing.

The FOODfiles™ Data is available subject to the FOODfiles™ Data Licensing terms.

I tried to download various files from the website but the easiest data file to use for this analysis is the Standard DATA.AP which contains data in a table format.

Ideally information panel data would be available in csv format available as a link in the foodcomposition website for direct import for better reproduceability.

standard <- readxl::read_xlsx("Standard DATA.AP.xlsx",skip = 1)

Data Cleaning

Let’s extract the nutrient columns related to the nutrition information panels.

standard_nip <- standard %>% 
  select(`Food Name`,Chapter,`Energy, total metabolisable, carbohydrate by difference, FSANZ (kJ)`,`Protein, total; calculated from total nitrogen`,`Fat, total`,`Fatty acids, total saturated`,`Sugars, total`,`Fibre, total dietary`,Sodium) %>% 
  slice(-1) %>% 
  mutate_at(vars(3:9), as.numeric)

We can extract the units of these nutrients.

units <- standard %>% 
  select(`Food Name`,Chapter,`Energy, total metabolisable, carbohydrate by difference, FSANZ (kJ)`,`Protein, total; calculated from total nitrogen`,`Fat, total`,`Fatty acids, total saturated`,`Sugars, total`,`Fibre, total dietary`,Sodium) %>% 
  slice(1)

We can rename the columns with a snake case naming convention.

names(standard_nip) <- c("food_name","chapter","energy","protein","fat_total","fat_saturated","sugars","fibre","sodium")
names(units) <- c("food_name","chapter","energy","protein","fat_total","fat_saturated","sugars","fibre","sodium")

Exploratory Data Analysis

There are 2768 rows and 89 columns.

Now take a look at summary statistics with the skimr R package of the standard_nip dataset.

standard_nip %>% 
  skimr::skim()
(#tab:skim standard_nip)Data summary
Name Piped data
Number of rows 2767
Number of columns 9
_______________________
Column type frequency:
character 2
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
food_name 0 1 4 172 0 2767 0
chapter 0 1 1 1 0 22 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
energy 0 1 868.64 712.83 0 299.33 683.43 1269.20 3700.00 ▇▅▂▁▁
protein 0 1 9.62 9.63 0 1.38 6.11 16.69 84.36 ▇▂▁▁▁
fat_total 0 1 9.97 16.69 0 0.47 3.26 12.19 100.00 ▇▁▁▁▁
fat_saturated 0 1 3.59 6.99 0 0.08 0.87 4.10 94.01 ▇▁▁▁▁
sugars 0 1 7.68 14.82 0 0.00 1.80 7.60 100.70 ▇▁▁▁▁
fibre 0 1 2.38 4.95 0 0.00 0.90 2.75 70.10 ▇▁▁▁▁
sodium 0 1 322.65 1566.46 0 9.29 65.00 340.00 38700.00 ▇▁▁▁▁

Now the units:

units %>% 
  glimpse()
## Observations: 1
## Variables: 9
## $ food_name     <chr> "Food Name"
## $ chapter       <chr> "Chapter"
## $ energy        <chr> "kJ/100g"
## $ protein       <chr> "g/100g"
## $ fat_total     <chr> "g/100g"
## $ fat_saturated <chr> "g/100g"
## $ sugars        <chr> "g/100g"
## $ fibre         <chr> "g/100g"
## $ sodium        <chr> "mg/100g"

Since they have different units, the g/100g nutrients could be compared as a group, whereas the other units would be compared individually.

In order to check that we have extracted the same nutrition information panel data as in the labels, compare the Butter, unsalted to the following:

standard_nip %>% 
  filter(str_detect(food_name,"Butter, unsalted"))
## # A tibble: 1 x 9
##   food_name   chapter energy protein fat_total fat_saturated sugars fibre sodium
##   <chr>       <chr>    <dbl>   <dbl>     <dbl>         <dbl>  <dbl> <dbl>  <dbl>
## 1 Butter, un~ F        3110.    0.32      83.6          54.1   0.54     0    6.9

Now let’s view some tables using the kableExtra R package.

What is the food with most energy?

standard_nip %>% 
  slice(which.max(energy)) %>% 
  select(food_name,energy) %>% 
  kable() %>%
  kable_styling()
food_name energy
Oil, sesame 3700

What is the food with most protein?

standard_nip %>% 
  slice(which.max(protein)) %>% 
  select(food_name,protein) %>% 
   kable() %>%   kable_styling()
food_name protein
Gelatin 84.36

What is the food with most saturated fat ( the measure of total fat brings back oils with 100g of 100g total fat)?

standard_nip %>% 
  slice(which.max(fat_saturated)) %>% 
  select(food_name,fat_saturated) %>% 
   kable() %>%   kable_styling()
food_name fat_saturated
Shortening, vegetable, Kremelta 94.014

What is the food with the most sugar?

standard_nip %>% 
  slice(which.max(sugars)) %>% 
  select(food_name,sugars) %>% 
   kable() %>%   kable_styling()
food_name sugars
Flavoured drink, raspberry, dry powder 100.7

That raspberry drink looks high on sugar?

What is the saltiest food?

standard_nip %>% 
  slice(which.max(sodium)) %>% 
  select(food_name,sodium) %>% 
   kable() %>%   kable_styling()
food_name sodium
Salt, block 38700

Take a look at the miscellaneous food group, which includes herbs and condiments. What food items have the most energy, protein, saturated fat, sugars, fibre and sodium?

We can create a function unquoting the column names as a step with rlang R package and tidyeval to get extract the top 5 foods by nutrient.

top5 <- function(nutrient) {
  require("dplyr")
  output <- standard_nip %>% 
  filter(chapter=="P") %>% 
  arrange(desc(!! rlang::sym(nutrient))) %>% 
  select(food_name,!!nutrient) %>% 
  slice(1:5)  
  return(output)
}

top5(nutrient= "energy") %>%
  kable() %>%   
  kable_styling()
food_name energy
Seed, sesame 2607.82
Coffee whitener, powder 2327.72
Seed, mustard, yellow 2237.41
Seed, poppy, composite 2106.41
Spice, nutmeg, ground 2093.31
top5(nutrient= "protein") %>%
  kable() %>%   
  kable_styling()
food_name protein
Gelatin 84.36
Yeast, baker’s, dried 39.50
Stock, Oxo cubes 39.31
Seed, mustard, yellow 29.38
Powder, mustard 28.88
top5(nutrient= "fat_saturated") %>%
  kable() %>%   
  kable_styling()
food_name fat_saturated
Coffee whitener, powder 32.500
Spice, nutmeg, ground 25.900
Spice, mace, ground 9.510
Herb, rosemary, dried 8.398
Seed, sesame 7.672
top5(nutrient= "sugars") %>%
  kable() %>%   
  kable_styling()
food_name sugars
Coffee whitener, powder 54.90
Powder, onion 47.91
Spice, pepper, black 46.00
Spice, allspice, ground 45.51
Spice pepper, white 43.60
top5(nutrient= "fibre") %>%
  kable() %>%   
  kable_styling()
food_name fibre
Spice, cinnamon, ground 54.3
Seaweed, dried 49.2
Savory, ground 45.7
Seed, coriander 41.9
Herb, sage, ground 40.3
top5(nutrient= "sodium") %>%
  kable() %>%   
  kable_styling()
food_name sodium
Salt, block 38700
Salt, table, iodised 38400
Salt, table, uniodised 38100
Baking soda 27400
Baking powder 11800

Data Visualisation

Now create HTML bar charts with the reactable and htmltools R packages.

I chose this fun colour palette to distinguish the units.

# Set global theme
options(reactable.theme = reactableTheme(
style = list(fontFamily = "-apple-system, BlinkMacSystemFont, Segoe UI, Helvetica, Arial, sans-serif"),
  color = "hsl(233, 9%, 87%)",
  backgroundColor = "hsl(233, 9%, 19%)",
  borderColor = "hsl(233, 9%, 22%)",
  stripedColor = "hsl(233, 12%, 22%)",
  highlightColor = "hsl(233, 12%, 24%)",
  inputStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
  selectStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
  pageButtonHoverStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
  pageButtonActiveStyle = list(backgroundColor = "hsl(233, 9%, 28%)")
))
# Render a bar chart with a label on the left
bar_chart <- function(label, width = "100%", height = "16px", fill = "#00bfc4", background = NULL) {
  bar <- div(style = list(background = fill, width = width, height = height))
  chart <- div(style = list(flexGrow = 1, marginLeft = "8px", background = background), bar)
  div(style = list(display = "flex", alignItems = "center"), label, chart)
}

reactable(standard_nip %>% select(-chapter), 
          columns = list(
  food_name = colDef(name = "Food Name", align = "left"),
  energy = colDef(name = "Energy (kJ/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$energy) * 100, "%")
    bar_chart(round(value,0), width = width,fill = "#E3A8CB", background = "#999999")
  }),
  protein = colDef(name = "Protein (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$protein) * 100, "%")
    bar_chart(round(value,0), width = width,fill = "#E98E10", background = "#999999")
  }),
  fat_total = colDef(name = "Fat Total (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$fat_total) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  fat_saturated = colDef(name = "Saturated Fat (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$fat_saturated) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  sugars = colDef(name = "Sugars (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$sugars) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  fibre = colDef(name = "Fibre (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$fibre) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  sodium = colDef(name = "Sodium (mg/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$sodium) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#A2DC84", background = "#999999")
  })
),  
  filterable = TRUE,
  showPageSizeOptions = TRUE,
  striped = TRUE,
  highlight = TRUE)

Conclusion

This reactable is a great interactive tool to summarise and explore the nutrition information panel data. It is possible to sort and filter, and also view the value of the nutrient relative to the range of the nutrient values across all foods.

I would like to explore the Miscellaneous food group more in the data visualisation, since the herbs and spices nutrient levels vary with some relatively high levels, based on these tables.

As a note, the reactable html output doesn't show up in the blogdown output html so I saved the html output and added this code snippet:

<iframe src="../img/reactable.html" width="100%" height="500" ></iframe>