library(tidyverse)
library(reactable)
library(htmltools)
Introduction
Food labels can be confusing and hard to read, with the small numbers and text in the Nutrition Label Information table. Additionally ingredients lists can be long, and also written in small text. This information can be difficult to process on the go and without a reference point.
According to the Heart Foundation NZ, this table is a guide for what to look for on the ‘nutrition information panel’ on food labels.
As a side project, I am creating a website is to visualise food labeling data from food packaging with JavaScript libraries Node.js and D3, React and possibly using a MongoDB database.
This data analysis helps to understand the data behind the web application using visual interactive tables.
TL;DR jump to the Data Visualisation.
Data
Food and nutrition data is available from the Ministry of Health.
Plant & Food Research and the Ministry of Health jointly own the New Zealand Food Composition Database. This database source provides a comprehensive collection of nutrition information panel data as seen on food managing.
The FOODfiles™ Data is available subject to the FOODfiles™ Data Licensing terms.
I tried to download various files from the website but the easiest data file to use for this analysis is the Standard DATA.AP
which contains data in a table format.
Ideally information panel data would be available in csv format available as a link in the foodcomposition website for direct import for better reproduceability.
standard <- readxl::read_xlsx("Standard DATA.AP.xlsx",skip = 1)
Data Cleaning
Let’s extract the nutrient columns related to the nutrition information panels.
standard_nip <- standard %>%
select(`Food Name`,Chapter,`Energy, total metabolisable, carbohydrate by difference, FSANZ (kJ)`,`Protein, total; calculated from total nitrogen`,`Fat, total`,`Fatty acids, total saturated`,`Sugars, total`,`Fibre, total dietary`,Sodium) %>%
slice(-1) %>%
mutate_at(vars(3:9), as.numeric)
We can extract the units of these nutrients.
units <- standard %>%
select(`Food Name`,Chapter,`Energy, total metabolisable, carbohydrate by difference, FSANZ (kJ)`,`Protein, total; calculated from total nitrogen`,`Fat, total`,`Fatty acids, total saturated`,`Sugars, total`,`Fibre, total dietary`,Sodium) %>%
slice(1)
We can rename the columns with a snake case naming convention.
names(standard_nip) <- c("food_name","chapter","energy","protein","fat_total","fat_saturated","sugars","fibre","sodium")
names(units) <- c("food_name","chapter","energy","protein","fat_total","fat_saturated","sugars","fibre","sodium")
Exploratory Data Analysis
There are 2768 rows and 89 columns.
Now take a look at summary statistics with the skimr R package of the standard_nip
dataset.
standard_nip %>%
skimr::skim()
Name | Piped data |
Number of rows | 2767 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 7 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
food_name | 0 | 1 | 4 | 172 | 0 | 2767 | 0 |
chapter | 0 | 1 | 1 | 1 | 0 | 22 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
energy | 0 | 1 | 868.64 | 712.83 | 0 | 299.33 | 683.43 | 1269.20 | 3700.00 | ▇▅▂▁▁ |
protein | 0 | 1 | 9.62 | 9.63 | 0 | 1.38 | 6.11 | 16.69 | 84.36 | ▇▂▁▁▁ |
fat_total | 0 | 1 | 9.97 | 16.69 | 0 | 0.47 | 3.26 | 12.19 | 100.00 | ▇▁▁▁▁ |
fat_saturated | 0 | 1 | 3.59 | 6.99 | 0 | 0.08 | 0.87 | 4.10 | 94.01 | ▇▁▁▁▁ |
sugars | 0 | 1 | 7.68 | 14.82 | 0 | 0.00 | 1.80 | 7.60 | 100.70 | ▇▁▁▁▁ |
fibre | 0 | 1 | 2.38 | 4.95 | 0 | 0.00 | 0.90 | 2.75 | 70.10 | ▇▁▁▁▁ |
sodium | 0 | 1 | 322.65 | 1566.46 | 0 | 9.29 | 65.00 | 340.00 | 38700.00 | ▇▁▁▁▁ |
Now the units
:
units %>%
glimpse()
## Observations: 1
## Variables: 9
## $ food_name <chr> "Food Name"
## $ chapter <chr> "Chapter"
## $ energy <chr> "kJ/100g"
## $ protein <chr> "g/100g"
## $ fat_total <chr> "g/100g"
## $ fat_saturated <chr> "g/100g"
## $ sugars <chr> "g/100g"
## $ fibre <chr> "g/100g"
## $ sodium <chr> "mg/100g"
Since they have different units, the g/100g
nutrients could be compared as a group, whereas the other units would be compared individually.
In order to check that we have extracted the same nutrition information panel data as in the labels, compare the Butter, unsalted to the following:
standard_nip %>%
filter(str_detect(food_name,"Butter, unsalted"))
## # A tibble: 1 x 9
## food_name chapter energy protein fat_total fat_saturated sugars fibre sodium
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Butter, un~ F 3110. 0.32 83.6 54.1 0.54 0 6.9
Now let’s view some tables using the kableExtra R package.
What is the food with most energy?
standard_nip %>%
slice(which.max(energy)) %>%
select(food_name,energy) %>%
kable() %>%
kable_styling()
food_name | energy |
---|---|
Oil, sesame | 3700 |
What is the food with most protein?
standard_nip %>%
slice(which.max(protein)) %>%
select(food_name,protein) %>%
kable() %>% kable_styling()
food_name | protein |
---|---|
Gelatin | 84.36 |
What is the food with most saturated fat ( the measure of total fat brings back oils with 100g of 100g total fat)?
standard_nip %>%
slice(which.max(fat_saturated)) %>%
select(food_name,fat_saturated) %>%
kable() %>% kable_styling()
food_name | fat_saturated |
---|---|
Shortening, vegetable, Kremelta | 94.014 |
What is the food with the most sugar?
standard_nip %>%
slice(which.max(sugars)) %>%
select(food_name,sugars) %>%
kable() %>% kable_styling()
food_name | sugars |
---|---|
Flavoured drink, raspberry, dry powder | 100.7 |
That raspberry drink looks high on sugar?
What is the saltiest food?
standard_nip %>%
slice(which.max(sodium)) %>%
select(food_name,sodium) %>%
kable() %>% kable_styling()
food_name | sodium |
---|---|
Salt, block | 38700 |
Take a look at the miscellaneous food group, which includes herbs and condiments. What food items have the most energy, protein, saturated fat, sugars, fibre and sodium?
We can create a function unquoting the column names as a step with rlang R package and tidyeval to get extract the top 5 foods by nutrient.
top5 <- function(nutrient) {
require("dplyr")
output <- standard_nip %>%
filter(chapter=="P") %>%
arrange(desc(!! rlang::sym(nutrient))) %>%
select(food_name,!!nutrient) %>%
slice(1:5)
return(output)
}
top5(nutrient= "energy") %>%
kable() %>%
kable_styling()
food_name | energy |
---|---|
Seed, sesame | 2607.82 |
Coffee whitener, powder | 2327.72 |
Seed, mustard, yellow | 2237.41 |
Seed, poppy, composite | 2106.41 |
Spice, nutmeg, ground | 2093.31 |
top5(nutrient= "protein") %>%
kable() %>%
kable_styling()
food_name | protein |
---|---|
Gelatin | 84.36 |
Yeast, baker’s, dried | 39.50 |
Stock, Oxo cubes | 39.31 |
Seed, mustard, yellow | 29.38 |
Powder, mustard | 28.88 |
top5(nutrient= "fat_saturated") %>%
kable() %>%
kable_styling()
food_name | fat_saturated |
---|---|
Coffee whitener, powder | 32.500 |
Spice, nutmeg, ground | 25.900 |
Spice, mace, ground | 9.510 |
Herb, rosemary, dried | 8.398 |
Seed, sesame | 7.672 |
top5(nutrient= "sugars") %>%
kable() %>%
kable_styling()
food_name | sugars |
---|---|
Coffee whitener, powder | 54.90 |
Powder, onion | 47.91 |
Spice, pepper, black | 46.00 |
Spice, allspice, ground | 45.51 |
Spice pepper, white | 43.60 |
top5(nutrient= "fibre") %>%
kable() %>%
kable_styling()
food_name | fibre |
---|---|
Spice, cinnamon, ground | 54.3 |
Seaweed, dried | 49.2 |
Savory, ground | 45.7 |
Seed, coriander | 41.9 |
Herb, sage, ground | 40.3 |
top5(nutrient= "sodium") %>%
kable() %>%
kable_styling()
food_name | sodium |
---|---|
Salt, block | 38700 |
Salt, table, iodised | 38400 |
Salt, table, uniodised | 38100 |
Baking soda | 27400 |
Baking powder | 11800 |
Data Visualisation
Now create HTML bar charts with the reactable and htmltools R packages.
I chose this fun colour palette to distinguish the units.
# Set global theme
options(reactable.theme = reactableTheme(
style = list(fontFamily = "-apple-system, BlinkMacSystemFont, Segoe UI, Helvetica, Arial, sans-serif"),
color = "hsl(233, 9%, 87%)",
backgroundColor = "hsl(233, 9%, 19%)",
borderColor = "hsl(233, 9%, 22%)",
stripedColor = "hsl(233, 12%, 22%)",
highlightColor = "hsl(233, 12%, 24%)",
inputStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
selectStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
pageButtonHoverStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
pageButtonActiveStyle = list(backgroundColor = "hsl(233, 9%, 28%)")
))
# Render a bar chart with a label on the left
bar_chart <- function(label, width = "100%", height = "16px", fill = "#00bfc4", background = NULL) {
bar <- div(style = list(background = fill, width = width, height = height))
chart <- div(style = list(flexGrow = 1, marginLeft = "8px", background = background), bar)
div(style = list(display = "flex", alignItems = "center"), label, chart)
}
reactable(standard_nip %>% select(-chapter),
columns = list(
food_name = colDef(name = "Food Name", align = "left"),
energy = colDef(name = "Energy (kJ/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$energy) * 100, "%")
bar_chart(round(value,0), width = width,fill = "#E3A8CB", background = "#999999")
}),
protein = colDef(name = "Protein (g/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$protein) * 100, "%")
bar_chart(round(value,0), width = width,fill = "#E98E10", background = "#999999")
}),
fat_total = colDef(name = "Fat Total (g/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$fat_total) * 100, "%")
bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
}),
fat_saturated = colDef(name = "Saturated Fat (g/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$fat_saturated) * 100, "%")
bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
}),
sugars = colDef(name = "Sugars (g/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$sugars) * 100, "%")
bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
}),
fibre = colDef(name = "Fibre (g/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$fibre) * 100, "%")
bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
}),
sodium = colDef(name = "Sodium (mg/100g)", align = "left", cell = function(value) {
width <- paste0(value / max(standard_nip$sodium) * 100, "%")
bar_chart(round(value,0), width = width, fill = "#A2DC84", background = "#999999")
})
),
filterable = TRUE,
showPageSizeOptions = TRUE,
striped = TRUE,
highlight = TRUE)
Conclusion
This reactable
is a great interactive tool to summarise and explore the nutrition information panel data. It is possible to sort and filter, and also view the value of the nutrient relative to the range of the nutrient values across all foods.
I would like to explore the Miscellaneous food group more in the data visualisation, since the herbs and spices nutrient levels vary with some relatively high levels, based on these tables.
As a note, the reactable html output doesn't show up in the blogdown output html so I saved the html output and added this code snippet: