library(tidyverse)
library(janitor)
library(scales)
library(htmltools)
library(rvest)
library(gt)
library(countrycode)
Introduction
Purpose
In this project, I wanted to experiment with the gt::
package to create a beautiful table using R. This code will walk through scraping a table from Wikipedia of the top international goal scorers in men’s soccer history.
Setup and data preparation
Required packages:
Data.world has a dataset that consists of country names and URLs associated with .svg
images of their national flags, which I’d like to include within the table for extra style.
<- read.csv("https://query.data.world/s/cnheo22w5mwowz2tfxrbb3z2i3mzya?dws=00000", header=TRUE, stringsAsFactors=FALSE) %>%
flag_db #Convert country names into 3-letter country codes
mutate(Code = countrycode(sourcevar = Country, origin = "country.name", destination = "iso3c", warn = FALSE)) %>%
select(Country, flag_URL = ImageURL)
Scrape Wikipedia Data
<- 'https://en.wikipedia.org/wiki/List_of_men%27s_footballers_with_50_or_more_international_goals' url_goals
<- url_goals %>%
(raw ::read_html() %>%
rvest::html_nodes(xpath='//*[@id="mw-content-text"]/div[1]/table[2]') %>%
rvesthtml_table() %>%
1]] %>%
.[[clean_names())
# A tibble: 80 × 10
rank player nation confederation goals caps goalsper_match career_span
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Cristiano … Portu… UEFA 130 212 0.61 2003–
2 2 Lionel Mes… Argen… CONMEBOL 109 187 0.58 2005–
3 3 Ali Daei Iran AFC 108[… 148[… 0.73 1993–2006
4 4 Sunil Chhe… India AFC 94 151[… 0.62 2005–2024
5 5 Mokhtar Da… Malay… AFC 89 142 0.63 1972–1985
6 6 Ali Mabkho… Unite… AFC 85 115 0.74 2009–
7 6 Romelu Luk… Belgi… UEFA 85 119 0.71 2010–
8 8 Ferenc Pus… Hunga… UEFA 84 89 0.94 1945–1962
9 9 Robert Lew… Poland UEFA 83 152 0.55 2008–
10 10 Godfrey Ch… Zambia CAF 79 111 0.71 1968–1980
# ℹ 70 more rows
# ℹ 2 more variables: date_of_50th_goal <chr>, ref <chr>
For data cleaning of this Wiki table, we need to: * Remove all footnotes within the table * Convert all chr columns to numeric values * Re-code one of the confederation values so that it’s clean for the by confederation table
<- raw %>%
(raw #Remove extra spaces in nation to clean this column
mutate(nation = str_replace_all(str_trim(nation), "\\s+", " ")) %>%
mutate(nation = case_when(
str_detect(nation, "^Hungary\\s+Spain$") ~ "Hungary",
#re-coding this as United Kingdom to bring in flag
str_detect(nation, "^England\\s+England\\s+amateurs$") ~ "United Kingdom",
TRUE ~ nation
%>%
)) mutate(confederation = case_when(
== "AFC / OFC[h]" ~ "AFC",
confederation TRUE ~ confederation)) %>%
#remove footnotes and transform columns to numeric values
mutate_at(vars(player, career_span, goals, caps, goalsper_match),
~ str_remove(., "\\[.*\\]")) %>%
mutate_at(vars(goals, caps, goalsper_match),
%>%
as.numeric) left_join(flag_db, by = c('nation' = 'Country')) %>%
select(flag_URL, everything()))
# A tibble: 80 × 11
flag_URL rank player nation confederation goals caps goalsper_match
<chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 https://upload.… 1 Crist… Portu… UEFA 130 212 0.61
2 https://upload.… 2 Lione… Argen… CONMEBOL 109 187 0.58
3 https://upload.… 3 Ali D… Iran AFC 108 148 0.73
4 https://upload.… 4 Sunil… India AFC 94 151 0.62
5 https://upload.… 5 Mokht… Malay… AFC 89 142 0.63
6 https://upload.… 6 Ali M… Unite… AFC 85 115 0.74
7 https://upload.… 6 Romel… Belgi… UEFA 85 119 0.71
8 https://upload.… 8 Feren… Hunga… UEFA 84 89 0.94
9 https://upload.… 9 Rober… Poland UEFA 83 152 0.55
10 https://upload.… 10 Godfr… Zambia CAF 79 111 0.71
# ℹ 70 more rows
# ℹ 3 more variables: career_span <chr>, date_of_50th_goal <chr>, ref <chr>
Now that I have the data scraped and cleaned, I’m interested in making some tables using the ::gt()
package. Tables are great for communicating summary information, so I’ll first build a graphic of the top 12 international goal scorers of all time.
<- raw %>%
df_top_scorers slice(1:12)
Then, I’ll extract the min and max values from the table for conditional formatting of the table.
<- df_top_scorers$goals %>% min()
min_goals <- df_top_scorers$goals %>% max()
max_goals
<- col_numeric(c("lightgreen", "darkgreen"),
goals_palette domain = c(min_goals, max_goals),
alpha = .75)
For the first table, I’ll put the columns and styling in place first. I’m going to build a table that includes the flag_url
column – which is messy at first – and then I’ll use a function to render the flag graphics from those URLs. This blog post is excellent – it showed me how to do this step-by-step.
<- df_top_scorers %>%
(tbl_top_scorers select(rank, player, career_span, flag_URL, nation, goals, caps, goalsper_match) %>%
gt() %>%
#rename columns
cols_label(rank = 'Rank',
player = 'Name',
career_span = 'Career Span',
nation = 'Country',
goals = 'Total Goals Scored',
caps = 'Matches',
goalsper_match = 'Goals per Match') %>%
#add table title
tab_header(title = md("**Total Goals Scored in Men's International Soccer Matches**")) %>%
tab_source_note(source_note = "Data from Wikipedia") %>%
#apply new style to all column headers
tab_style(
locations = cells_column_labels(columns = everything()),
style = list(
#thick border
cell_borders(sides = "bottom", weight = px(3)),
#make text bold
cell_text(weight = "bold")
)%>%
) #apply different style to title
tab_style(locations = cells_title(groups = "title"),
style = list(
cell_text(weight = "bold", size = 24)
%>%
)) data_color(columns = c(goals),
colors = goals_palette) %>%
opt_all_caps() %>%
opt_table_font(
font = list(
google_font("Chivo"),
default_fonts()
)%>%
) tab_options(
#remove border between column headers and title
column_labels.border.top.width = px(3),
column_labels.border.top.color = "transparent",
#remove border around the table
table.border.top.color = "transparent",
table.border.bottom.color = "transparent",
#adjust font sizes and alignment
source_notes.font.size = 12,
heading.align = "left"
))
Total Goals Scored in Men’s International Soccer Matches | |||||||
---|---|---|---|---|---|---|---|
Rank | Name | Career Span | flag_URL | Country | Total Goals Scored | Matches | Goals per Match |
1 | Cristiano Ronaldo | 2003– | https://upload.wikimedia.org/wikipedia/commons/5/5c/Flag_of_Portugal.svg | Portugal | 130 | 212 | 0.61 |
2 | Lionel Messi | 2005– | https://upload.wikimedia.org/wikipedia/commons/1/1a/Flag_of_Argentina.svg | Argentina | 109 | 187 | 0.58 |
3 | Ali Daei | 1993–2006 | https://upload.wikimedia.org/wikipedia/commons/c/ca/Flag_of_Iran.svg | Iran | 108 | 148 | 0.73 |
4 | Sunil Chhetri | 2005–2024 | https://upload.wikimedia.org/wikipedia/en/4/41/Flag_of_India.svg | India | 94 | 151 | 0.62 |
5 | Mokhtar Dahari | 1972–1985 | https://upload.wikimedia.org/wikipedia/commons/6/66/Flag_of_Malaysia.svg | Malaysia | 89 | 142 | 0.63 |
6 | Ali Mabkhout | 2009– | https://upload.wikimedia.org/wikipedia/commons/c/cb/Flag_of_the_United_Arab_Emirates.svg | United Arab Emirates | 85 | 115 | 0.74 |
6 | Romelu Lukaku | 2010– | https://upload.wikimedia.org/wikipedia/commons/9/92/Flag_of_Belgium_%28civil%29.svg | Belgium | 85 | 119 | 0.71 |
8 | Ferenc Puskás | 1945–1962 | https://upload.wikimedia.org/wikipedia/commons/c/c1/Flag_of_Hungary.svg | Hungary | 84 | 89 | 0.94 |
9 | Robert Lewandowski | 2008– | https://upload.wikimedia.org/wikipedia/en/1/12/Flag_of_Poland.svg | Poland | 83 | 152 | 0.55 |
10 | Godfrey Chitalu | 1968–1980 | https://upload.wikimedia.org/wikipedia/commons/0/06/Flag_of_Zambia.svg | Zambia | 79 | 111 | 0.71 |
10 | Neymar | 2010– | https://upload.wikimedia.org/wikipedia/en/0/05/Flag_of_Brazil.svg | Brazil | 79 | 128 | 0.62 |
12 | Hussein Saeed | 1977–1990 | https://upload.wikimedia.org/wikipedia/commons/f/f6/Flag_of_Iraq.svg | Iraq | 78 | 137 | 0.57 |
Data from Wikipedia |
<- tbl_top_scorers %>%
(tbl_top_scorers text_transform(
#Apply a function to a column
locations = cells_body(c(flag_URL)),
fn = function(x) {
#Return an image of set dimensions
web_image(
url = x,
height = 12
)
}%>%
) #Hide column header flag_URL and reduce width
cols_width(c(flag_URL) ~ px(30)) %>%
cols_label(flag_URL = ""))
Total Goals Scored in Men’s International Soccer Matches | |||||||
---|---|---|---|---|---|---|---|
Rank | Name | Career Span | Country | Total Goals Scored | Matches | Goals per Match | |
1 | Cristiano Ronaldo | 2003– | Portugal | 130 | 212 | 0.61 | |
2 | Lionel Messi | 2005– | Argentina | 109 | 187 | 0.58 | |
3 | Ali Daei | 1993–2006 | Iran | 108 | 148 | 0.73 | |
4 | Sunil Chhetri | 2005–2024 | India | 94 | 151 | 0.62 | |
5 | Mokhtar Dahari | 1972–1985 | Malaysia | 89 | 142 | 0.63 | |
6 | Ali Mabkhout | 2009– | United Arab Emirates | 85 | 115 | 0.74 | |
6 | Romelu Lukaku | 2010– | Belgium | 85 | 119 | 0.71 | |
8 | Ferenc Puskás | 1945–1962 | Hungary | 84 | 89 | 0.94 | |
9 | Robert Lewandowski | 2008– | Poland | 83 | 152 | 0.55 | |
10 | Godfrey Chitalu | 1968–1980 | Zambia | 79 | 111 | 0.71 | |
10 | Neymar | 2010– | Brazil | 79 | 128 | 0.62 | |
12 | Hussein Saeed | 1977–1990 | Iraq | 78 | 137 | 0.57 | |
Data from Wikipedia |
Top scorers by confederation
Next, I’d like to see who are the top-5 all-time goal scorers by confederation.
<- raw %>%
(df_scorers_by_confed filter(!is.na(goals)) %>%
group_by(confederation) %>%
slice(1:5))
# A tibble: 25 × 11
# Groups: confederation [5]
flag_URL rank player nation confederation goals caps goalsper_match
<chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 https://upload.… 3 Ali D… Iran AFC 108 148 0.73
2 https://upload.… 4 Sunil… India AFC 94 151 0.62
3 https://upload.… 5 Mokht… Malay… AFC 89 142 0.63
4 https://upload.… 6 Ali M… Unite… AFC 85 115 0.74
5 https://upload.… 12 Husse… Iraq AFC 78 137 0.57
6 https://upload.… 10 Godfr… Zambia CAF 79 111 0.71
7 https://upload.… 19 Kinna… Malawi CAF 71 117 0.61
8 https://upload.… 25 Hossa… Egypt CAF 69 177 0.39
9 https://upload.… 31 Didie… Ivory… CAF 65 105 0.62
10 https://upload.… 45 Moham… Egypt CAF 57 100 0.57
# ℹ 15 more rows
# ℹ 3 more variables: career_span <chr>, date_of_50th_goal <chr>, ref <chr>
#re-run the min_goals and max_goals functions
<- df_scorers_by_confed$goals %>% min(na.rm = TRUE)
min_goals <- df_scorers_by_confed$goals %>% max(na.rm = TRUE)
max_goals
<- col_numeric(c("lightgreen", "darkgreen"),
goals_palette domain = c(min_goals, max_goals),
alpha = .75)
<- df_scorers_by_confed %>%
(tbl_scorers_by_confed select(rank, player, career_span, flag_URL, nation, goals, caps, goalsper_match, confederation) %>%
gt(groupname_col = "confederation") %>%
cols_label(rank = 'Global Rank',
player = 'Name',
career_span = 'Career Span',
nation = 'Country',
goals = 'Total Goals Scored',
caps = 'Matches',
goalsper_match = 'Goals per Match') %>%
#add table title
tab_header(title = md("**Total Goals Scored in Men's International Soccer Matches**")) %>%
tab_source_note(source_note = "Data from Wikipedia") %>%
#apply new style to all column headers
tab_style(
locations = cells_column_labels(columns = everything()),
style = list(
#thick border
cell_borders(sides = "bottom", weight = px(3)),
#make text bold
cell_text(weight = "bold")
)%>%
) #apply different style to title
tab_style(locations = cells_title(groups = "title"),
style = list(
cell_text(weight = "bold", size = 24)
%>%
)) data_color(columns = c(goals),
colors = goals_palette) %>%
opt_all_caps() %>%
opt_table_font(
font = list(
google_font("Chivo"),
default_fonts()
)%>%
) tab_options(
#remove border between column headers and title
column_labels.border.top.width = px(3),
column_labels.border.top.color = "transparent",
#remove border around the table
table.border.top.color = "transparent",
table.border.bottom.color = "transparent",
#adjust font sizes and alignment
source_notes.font.size = 12,
heading.align = "left"
%>%
) text_transform(
#Apply a function to a column
locations = cells_body(c(flag_URL)),
fn = function(x) {
#Return an image of set dimensions
web_image(
url = x,
height = 12
)
}%>%
) #Hide column header flag_URL and reduce width
cols_width(c(flag_URL) ~ px(30)) %>%
cols_label(flag_URL = ""))
Total Goals Scored in Men’s International Soccer Matches | |||||||
---|---|---|---|---|---|---|---|
Global Rank | Name | Career Span | Country | Total Goals Scored | Matches | Goals per Match | |
AFC | |||||||
3 | Ali Daei | 1993–2006 | Iran | 108 | 148 | 0.73 | |
4 | Sunil Chhetri | 2005–2024 | India | 94 | 151 | 0.62 | |
5 | Mokhtar Dahari | 1972–1985 | Malaysia | 89 | 142 | 0.63 | |
6 | Ali Mabkhout | 2009– | United Arab Emirates | 85 | 115 | 0.74 | |
12 | Hussein Saeed | 1977–1990 | Iraq | 78 | 137 | 0.57 | |
CAF | |||||||
10 | Godfrey Chitalu | 1968–1980 | Zambia | 79 | 111 | 0.71 | |
19 | Kinnah Phiri | 1973–1981 | Malawi | 71 | 117 | 0.61 | |
25 | Hossam Hassan | 1985–2006 | Egypt | 69 | 177 | 0.39 | |
31 | Didier Drogba | 2002–2014 | Ivory Coast | 65 | 105 | 0.62 | |
45 | Mohamed Salah | 2011– | Egypt | 57 | 100 | 0.57 | |
CONCACAF | |||||||
22 | Stern John | 1995–2012 | Trinidad and Tobago | 70 | 115 | 0.61 | |
27 | Carlos Ruiz | 1998–2016 | Guatemala | 68 | 133 | 0.51 | |
45 | Carlos Pavón | 1993–2010 | Honduras | 57 | 101 | 0.56 | |
45 | Clint Dempsey | 2004–2018 | United States | 57 | 141 | 0.40 | |
45 | Landon Donovan | 2000–2014 | United States | 57 | 157 | 0.36 | |
CONMEBOL | |||||||
2 | Lionel Messi | 2005– | Argentina | 109 | 187 | 0.58 | |
10 | Neymar | 2010– | Brazil | 79 | 128 | 0.62 | |
13 | Pelé | 1957–1971 | Brazil | 77 | 92 | 0.84 | |
25 | Luis Suárez | 2007– | Uruguay | 69 | 142 | 0.49 | |
35 | Ronaldo | 1994–2011 | Brazil | 62 | 98 | 0.63 | |
UEFA | |||||||
1 | Cristiano Ronaldo | 2003– | Portugal | 130 | 212 | 0.61 | |
6 | Romelu Lukaku | 2010– | Belgium | 85 | 119 | 0.71 | |
8 | Ferenc Puskás | 1945–1962 | Hungary | 84 | 89 | 0.94 | |
9 | Robert Lewandowski | 2008– | Poland | 83 | 152 | 0.55 | |
14 | Vivian Woodward | 1903–1914 | United Kingdom | 75 | 53 | 1.42 | |
Data from Wikipedia |