My visual CV
TLDR; I made a visual timeline of my career (CV) completely in R. Here I will show how it was done, and in the process how to hack ggplot
objects, add images to plots, and use pretty much any font in your ggplots.
This is the final result, and if you are interested on how it was generated, read on.
Motivation
Recently I gave a presentation with an overview of my career, focusing on data analysis / bioinformatics. As my path to data analysis has not been straightforward, I wanted walk the audience through my career as we went along the several stages. Because I am someone who generally takes pride on giving good presentations, and this was about me after all, and I wanted it to be:
- Clean and with an elegant design
- Rich in features
- Playful
- With a wink a nod to data analysis and visualization
Basically me in a plot.
On top of that, I wanted to showcase my coding skills, so whatever I presented had to be done 100% with code. No post-production in Inkscape. In keeping with this mantra, the slide-deck itself was done with LaTeX
.
Because the plot is so intrinsically linked with the presentation itself, as I explain the how the plot was done, I will include my though process on certain choices and how that is linked with the presentation1.
Framework
Important for me, due to the way my career as paned out so far, and the goal of the presentation was to 2:
- depict my journey from wet lab to data analysis, with increasing amounts of time spent analyzing data.
- include a rough timeline should be present
- highlight milestones such as academic publications and degrees
- demonstrate how international the path has been
Initially I thought of using a Gantt chart as those are the go to visual representations for timelines and there are several packages available 3. However, after prototyping with pen and paper, I decided it would be too hard to incorporate all the features I wanted to have without cluttering the plot. Basically, it didn’t feel right.
Luckily inspiration was at hand in one of my favorite blogs, by an excellent data scientist and communicator, which just happened to have a similar path to data science as my own. Her visual CV is hand drawn, and looks great, but I made it a challenge to re-create that playfulness, along with the richness of information, in code (also I am terrible at drawing/painting).
Set-up
First step is to load the libraries I’ll be using:
tidyverse
, for general data wranglinglubridate
, fancy functions to work with datesdata.table
, rolling joins and general awesomenessggplot2
, plots et al.ggthemes
, I don’t think this was actually used.scales
, functions to add some spice to ggplot axiscowplot
, add images to plotsextrafont
, use of system fonts
Code
library("tidyverse")
library("lubridate")
library("data.table")
library("ggplot2")
library("ggthemes")
library("cowplot")
library("extrafont")
::loadfonts(quiet = TRUE)
extrafontlibrary("bib2df")
library("rcrossref")
My CV in a table
The data itself is structured in a straightforward way with variables for my roles, dates, location, and a mock data analysis ratio (how much data analysis was part of my role). Because this is supposed to a cartoonish overview of my CV, not a rigorous representation, the data analysis ratio is close to reality but I won’t swear it is strictly correct.
Code
<- fread("/home/antonio/Documents/personal/CV/visual_cv/timeline_cv.tsv") %>%
antonio melt(measure.vars = c("Start", "End"), variable.name = "Timpepoint",
value.name = "Date") %>%
:= as.IDate(Date)]
.[, Date
%>%
antonio head() %>%
::kable() knitr
Role | Place | Type | Country | da_ratio | Year | Timpepoint | Date |
---|---|---|---|---|---|---|---|
BSc | UA | Student | Portugal | 0.2 | 2001 | Start | 1997-09-01 |
MSc | UA | Student | Portugal | 0.3 | 2005 | Start | 2002-09-01 |
Technician | CNC | Service | Portugal | 0.1 | 2006 | Start | 2005-05-01 |
PhD | University of Leicester | Student | UK | 0.4 | 2009 | Start | 2006-02-01 |
Postdoc | MPI-CBG | Researcher | Germany | 0.6 | 2013 | Start | 2009-10-01 |
Outreach | Internet | Package | World | 1.0 | 2020 | Start | 2020-07-15 |
The Canvas
With the events of my CV in tabular format I could get started with plotting. Key features of the plot:
- A theme with as few visual elements as possible, and attention to detail
- The font carefully selected (luckily I have dozens of fonts installed in my system)
- Addition of small pictograms
The first and second items were achieved with a custom theme and the package extrafont
. extrafont
allows us to use pretty much any system font in R plots. There a couple of steps that need to be done before system font can be used:
- The first is
extrafont::font_import()
to detect and import system fonts. This needs to be done only once after installing the package, or any time a new font installed. This can take several minutes depending on how many system fonts your system has. Mine has a LOT. - The second step is to load the system fonts in the current
R
session to make them available,extrafont::loadfonts(quiet = TRUE)
Code
::font_import()
extrafont::loadfonts(quiet = TRUE)
extrafont
fonttable() %>%
filter(stringr::str_detect(FullName, "Annie"))
You can also use the third snippet of code make sure the desired font is indeed available in R
.
The theme is a variation of theme_light
to which I removed as many elements as I thought necessary give it that clean aspect. Mind you this is not for data visualization but rather to be aesthetically pleasing (ideally one should be able to combine both). The is also at this stage that the custom font was defined. By creating a theme in this manner we don’t need to add these lines to every plot.
Code
theme_set(theme_light())
theme_update(text = element_text(family = "Annie Use Your Telescope",
size = 12), legend.title = element_text(size = 6), legend.text = element_text(size = 6),
axis.title.y = element_blank(), axis.title.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_line(size = 1, linetype = "solid",
arrow = arrow(length = unit(0.1, "inches"), type = "closed")),
panel.grid = element_line(colour = "black", linetype = "dashed",
size = 0.5), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(), panel.grid.major.y = element_line(colour = "gray",
linetype = "dotted", size = 0.5), panel.border = element_blank())
With the theme set I could produce the first, and very basic, plot:
Code
<- ggplot(antonio, aes(x = Date, y = da_ratio)) + geom_smooth(method = "loess",
p color = "black", size = 1.5, span = 1, se = FALSE) + xlim(min(antonio$Date),
as.IDate("2020-12-01")) + ylim(0, 1) + geom_hline(yintercept = 0.5,
linetype = "dashed", color = "#c72e29", size = 1) + annotate(geom = "text",
x = min(antonio$Date) %m+% months(6), y = 0.35, label = "Wet lab",
color = "red", size = 5, family = "Annie Use Your Telescope") +
annotate(geom = "text", x = as.IDate("2012-12-01"), y = 0.9,
label = "Data analysis", color = "red", size = 5, family = "Annie Use Your Telescope")
p
The data science trajectory line is a smooth line, result of a linear model fitted to the time points in the CV. The model was chosen based on the very accurate statistical measure of “being easy on the eye” - smooth and in an upward trajectory after a particular time point.
To add the small icons, I ended up using magick
(via cowplot
) which I could only install with conda (well, mamba), due to missing dependencies. I did try the R way with install.packages("magick")
but gave up after the second error. Life is too short.
The icons themselves were downloaded from the Noun Project which has tons of free to use icons (with attribution). I ended up paying for these because I liked them so much, but nevertheless I will credit the author, Tippawan Sookruay.
As these images will be included in several versions of the plot, I wrapped the code in a function to save typing:
Code
<- function(gg) {
add_img ggdraw(gg) + draw_image("/home/antonio/Documents/personal/CV/visual_cv/figure/noun_pipette_3384602.svg.png",
scale = 0.1, x = 0.09, y = 0.001, hjust = 0.52, vjust = 0.25) +
draw_image("/home/antonio/Documents/personal/CV/visual_cv/figure/noun_chart_3384933.svg.png",
scale = 0.1, x = 0.4, y = 0.25)
}
<- add_img(p)
p_save
ggsave("timeline_icons.png", p_save, dpi = 600, width = 10, height = 6,
units = "cm")
So now the plots has the basics:
- A timeline on the x-axis
- rough % of data analysis as part of my job
- and a couple of cool looking icons to show how it all started and it is going
In technical terms, the most difficult part was to add the icons, and in particular placing them in the right location. Not only did it take a lot of trial an error, it was only just right once saved as a file (pdf). I wanted to have a wide plot to fit with the slides, and the plotting window is square, so some eye-balling was required. I tried a few other packages (patchwork for example), but in the end magick
+ cowplot
was the best option for my purposes.
But we are not done yet: career milestones and cities I lived in are still missing.
Publications
Until now I have worked in fairly academic milieu where one’s achievements are measured mostly by publications. whilst I don’t particularly agree with this4, it is a decent proxy for output which is what I was looking to represent. Other fields will have their own metrics of productivity (number of sales, features shipped, etc).
For full reproducibility, I tried to get my publications list from pubmed or ORCID but ran into problems. The issue with pubmed queries is that my name follows Portuguese naming conventions, and each publisher seems to ignore what I write in the provided forms and makes up a different version of my name. Not helping matters is that my first and last names are fairly common, so there were a lot of false positives in the pubmed queries. I could retrieve some of the variables I needed from ORCID but publications dates were missing for some paper.
Luckily I keep all my publications in a bibtex
file exported from zotero and ended up resorting to that. The publication data was extracted with the help of bib2df
to read in the bibtex file, and rcrossref
to retrieve publications dates which were incomplete or missing in the bibtex file.
Code
# ==========================================================================
# Get publication list
# ==========================================================================
<- "/home/antonio/Documents/personal/CV/Awesome-CV/examples/resume/AD_papers.bib"
path
<- bib2df(path)
df setDT(df)
<- df[!is.na(DOI)]$DOI
my_dois
# my_dois <-
# rorcid::identifiers(rorcid::works('0000-0002-1803-1863'))
<- rcrossref::cr_works(dois = my_dois)$data %>%
pubs setDT() %>%
c("title", "type", "container.title", "published.print",
.[, "published.online", "deposited")] %>%
:= ifelse(!is.na(published.print), published.print,
.[, Date %>%
published.online)] := ifelse(!is.na(Date), Date, deposited)] %>%
.[, Date := ifelse(nchar(Date) == 4, paste(Date, "03-01",
.[, Date sep = "-"), Date)] %>%
:= ifelse(nchar(Date) == 7, paste(Date, "01",
.[, Date sep = "-"), Date)] %>%
:= as.IDate(Date)]
.[, Date
%>%
pubs head() %>%
::kable() knitr
title | type | container.title | published.print | published.online | deposited | Date |
---|---|---|---|---|---|---|
Innate Immune Training of Granulopoiesis Promotes Anti-tumor Activity | journal-article | Cell | 2020-10 | NA | 2022-11-25 | 2020-10-01 |
The RNA binding protein human antigen R is a gatekeeper of liver homeostasis | journal-article | Hepatology | 2022-04 | 2021-12-05 | 2023-06-07 | 2022-04-01 |
Exosomal miRNAs from Prostate Cancer Impair Osteoblast Function in Mice | journal-article | International Journal of Molecular Sciences | NA | 2022-01-24 | 2022-01-26 | 2022-01-24 |
Membrane-associated cytoplasmic granules carrying the Argonaute protein WAGO-3 enable paternal epigenetic inheritance in Caenorhabditis elegans | journal-article | Nature Cell Biology | 2022-02 | 2022-02-07 | 2022-10-23 | 2022-02-01 |
Bardet-Biedl syndrome proteins modulate the release of bioactive extracellular vesicles | journal-article | Nature Communications | NA | 2021-09-27 | 2022-12-02 | 2021-09-27 |
Condensation of Ded1p Promotes a Translational Switch from Housekeeping to Stress Protein Production | journal-article | Cell | 2020-05 | NA | 2022-06-25 | 2020-05-01 |
Once that was done, I could proceed to add the publications has milestones in the plot - one point per paper.
Not so fast.
Turns out that when that smooth line was created, a linear model fit on the original time points, I lost the reference points, ratio of data analysis, on the y-axis. So now I also need to estimate the Y positions in the plot.
ggplot hacking
I have done enough custom plots, and spent enough hours googling how to do it, to know that ggplot
objects store a ton of information which can be accessed and modified (at your own risk). So what I did was to extract the x
and y
coordinates of the fitted line from the ggplot
object.
This was done in several steps:
- re-run the linear model (
spline
) - plot the fitted line
- extract the coordinates from the plot (
ggplot_build(p_smooth)$data[[1]]
)
Code
# Retrieve the smooth line (dummy timeline) to add points
# of interest
# --------------------------------------------------------------------------
<- as.data.frame(spline(antonio$Date, antonio$da_ratio))
spline_int
<- ggplot(antonio, aes(x = Date, y = da_ratio)) + geom_smooth(method = "loess",
p_smooth color = "black", size = 1.5, span = 1, se = FALSE) + xlim(min(antonio$Date),
as.IDate("2020-12-01")) + ylim(0, 1)
<- ggplot_build(p_smooth)$data[[1]] %>%
smooth_df setDT() %>%
:= as.IDate(x)] %>%
.[, Date
.[]
%>%
smooth_df head() %>%
::kable() knitr
x | y | flipped_aes | PANEL | group | colour | fill | linewidth | linetype | weight | alpha | Date |
---|---|---|---|---|---|---|---|---|---|---|---|
10105.00 | 0.2179275 | FALSE | 1 | -1 | black | grey60 | 1.5 | 1 | 1 | 0.4 | 1997-09-01 |
10211.34 | 0.2120627 | FALSE | 1 | -1 | black | grey60 | 1.5 | 1 | 1 | 0.4 | 1997-12-16 |
10317.68 | 0.2067649 | FALSE | 1 | -1 | black | grey60 | 1.5 | 1 | 1 | 0.4 | 1998-04-01 |
10424.03 | 0.2020354 | FALSE | 1 | -1 | black | grey60 | 1.5 | 1 | 1 | 0.4 | 1998-07-17 |
10530.37 | 0.1978752 | FALSE | 1 | -1 | black | grey60 | 1.5 | 1 | 1 | 0.4 | 1998-10-31 |
10636.71 | 0.1942856 | FALSE | 1 | -1 | black | grey60 | 1.5 | 1 | 1 | 0.4 | 1999-02-14 |
Now that I have the coordinates, these need to be matched with the publication dates. However, the x-axis values (dates) of the fitted plot are intervals which do not match the CV dates. To solve that problem, as nearly any data manipulation problem, data.table
and rolling joins came to the rescue. The goal was to find the closest time point of my CV in the fitted plot.
Code
## find the closest date of publications in the dummy
## timeline
<- smooth_df[pubs, roll = "nearest", on = "Date"] %>%
pubs_smooth := as.IDate(x)] %>%
.[, new_date := NULL] %>%
.[, Date setnames("new_date", "Date") %>%
c("Date", "y", "title")] %>%
.[, <= as.IDate("2012-01-01"), Publications := "Experimental"] %>%
.[Date >= as.IDate("2012-01-01"), Publications := "Analyst"] %>%
.[Date := ifelse(str_detect(title, "NXF1"),
.[, Publications "Both", Publications)] %>%
.[]
%>%
pubs_smooth head() %>%
::kable() knitr
Date | y | title | Publications |
---|---|---|---|
2020-09-01 | 0.9472880 | Innate Immune Training of Granulopoiesis Promotes Anti-tumor Activity | Analyst |
2020-09-01 | 0.9472880 | The RNA binding protein human antigen R is a gatekeeper of liver homeostasis | Analyst |
2020-09-01 | 0.9472880 | Exosomal miRNAs from Prostate Cancer Impair Osteoblast Function in Mice | Analyst |
2020-09-01 | 0.9472880 | Membrane-associated cytoplasmic granules carrying the Argonaute protein WAGO-3 enable paternal epigenetic inheritance in Caenorhabditis elegans | Analyst |
2020-09-01 | 0.9472880 | Bardet-Biedl syndrome proteins modulate the release of bioactive extracellular vesicles | Analyst |
2020-05-17 | 0.9417646 | Condensation of Ded1p Promotes a Translational Switch from Housekeeping to Stress Protein Production | Analyst |
Some of that information is not necessary but I kept it anyway to show how it looks.
A similar join was done for some key outreach dates that will also be highlighted in the final plot.
Code
## same for outreach activities
<- antonio[Role == "Outreach" & Timpepoint == "Start"]
outreach
<- smooth_df[outreach, roll = "nearest", on = "Date"] %>%
outreach_smooth := as.IDate(x)] %>%
.[, new_date := NULL] %>%
.[, Date setnames("new_date", "Date") %>%
c("Date", "y", "Type")] %>%
.[, .[]
Annotate with academic degrees
I could have used points (geom_point
) with different color/shape to highlight my academic degrees, but these belong to a different category of events than publications or other more technical milestones / achievements, and would add too much information to the lines anyway. As an alternative I decided to use arrows and text to indicate when those degrees where obtained.
Again there is nothing magic or novel about this, in fact I took inspiration from Cedric Scherer, except the need for trial and error to get the locations correct. The positions of the arrows were put on a tibble
to make it easier to plot later on:
Code
# plot annotations - academic points of interest
# --------------------------------------------------------------------------
<- tibble(x1 = c(as.IDate("2000-09-01"), as.IDate("2004-12-01"),
arrows as.IDate("2010-09-01")), x2 = c(as.IDate("2001-09-01"), as.IDate("2005-05-01"),
as.IDate("2009-08-01")), y1 = c(0.08, 0.15, 0.35), y2 = c(0.13,
0.2, 0.4))
arrows
# A tibble: 3 × 4
x1 x2 y1 y2
<IDate> <IDate> <dbl> <dbl>
1 2000-09-01 2001-09-01 0.08 0.13
2 2004-12-01 2005-05-01 0.15 0.2
3 2010-09-01 2009-08-01 0.35 0.4
Remember that in the plots show in the X-window, the arrows, text annotations, and icons where not in the correct locations. I kept saving the plots with the final dimensions to see if everything is in it’s right place. Saving as pdf gave me the best results vs saving as png. This is the same reason why I am saving the plots and linking to them in the Rmd
rather than letting rmarkdown
do all the work for me.
This is the final version:
Code
<- p + geom_jitter(data = pubs_smooth %>%
p_complete arrange(Publications), aes(x = Date, y = y, fill = Publications),
size = 2, alpha = 0.8, stroke = 1, shape = 21) + geom_point(data = outreach_smooth,
aes(x = Date, y = y), fill = "#e37e00", size = 2, alpha = 0.7,
stroke = 1, shape = 23) + scale_colour_stata() + scale_fill_stata() +
guides(fill = guide_legend(reverse = TRUE)) + labs(fill = "Publications:") +
annotate("text", x = as.IDate("1999-09-01"), y = 0.08, color = "gray20",
lineheight = 0.9, label = "BSc", size = 4, family = "Annie Use Your Telescope") +
annotate("text", x = as.IDate("2004-11-01"), y = 0.12, color = "gray20",
lineheight = 0.9, label = "MSc", size = 4, family = "Annie Use Your Telescope") +
annotate("text", x = as.IDate("2010-10-01"), y = 0.3, color = "gray20",
lineheight = 0.9, label = "PhD", size = 4, family = "Annie Use Your Telescope") +
geom_curve(data = arrows, aes(x = x1, y = y1, xend = x2,
yend = y2), arrow = arrow(length = unit(0.07, "inch")),
size = 0.4, color = "gray20", curvature = 0.3) + theme(legend.direction = "horizontal",
legend.position = c(1, 0.05), legend.justification = c(1,
0))
<- add_img(p_complete)
p_save ggsave("timeline_complete.png", p_save, dpi = 300, width = 10,
height = 6, units = "cm")
What is the information contained in the figure:
- icons depicting a start as a bench scientist and the current status as a data analyst
- a line showing a smooth upward trajectory towards data science, but
- data science only took over bench work (red line) at some point after my PhD
- milestones (papers) are also depicted as round points, and in keeping with my trajectory there some to which I contributed only lab work, others with data analysis (one with both!).
- the milestones with the diamond shape are to be described in the presentation (blog, R package maintainer).
I hope you can appreciate that this plot contains a lot of information, but doesn’t feel cluttered. Also, this is not meant to be a self-contained, self-explanatory piece of data visualization - I will be guiding the audience through the pieces of information and revealing them only when needed.
Getting there
Well, nearly done. There are still a couple of modifications to be done for the presentation:
- No legend because I wanted to talk about the milestones during the presentation without giving way too early what the points meant;
- Indication of the places of work is still missing.
The latter is crucial to the presentation because I will be highlighting each place of work, and follow that up with slides describing the my role, tasks, and accomplishments. So in effect highlighting the laces of work in a stepwise manner is critical to create breaks and setting the tone for each part of the presentation.
Removing the legend is fairly simple (once you know who to do it and I always google it).
Code
## no legend
<- p_complete + guides(fill = "none")
p_no_leg <- add_img(p_no_leg)
p_save ggsave("timeline_no_legend.png", p_save, dpi = 300, width = 10,
height = 6, units = "cm")
To show how I moved around I made used of geom_annotation
and shaded boxes to highlight each job, one at the time. Each one of these was then shown as bridge between parts of the talk.
For simplicity I am plotting only one here:
Code
## add countries
<- p_complete + guides(fill = "none")
p_no_leg add_img(p_no_leg)
Code
# ggsave('timeline_no_legend.pdf', dpi = 600, width = 10,
# height = 6, units = 'cm')
<- p_no_leg + annotate("rect", xmin = min(antonio$Date),
p_pt xmax = as.Date("2006-02-01"), ymin = -Inf, ymax = Inf, fill = "#d9e6eb",
alpha = 0.35)
<- add_img(p_pt)
p_save
ggsave("timeline_no_legend.pt.png", p_save, dpi = 600, width = 10,
height = 6, units = "cm")
Final words
And that was it. It took me quite some time to get it done, including testing packages which were not used, but in the end I am quite happy with the result. It was a lot of work, but I wanted to impress and I did.
Footnotes
Footnotes
Before any presentation I always think about who my audience will be and what is the goal. Only then do I start sketching the slides with pen and paper to organize the ideas into slides. One key message per slide.↩︎
For example https://github.com/laresbernardo/lares and https://github.com/daattali/timevis.↩︎
It’s fine to use the publication record, but someone career should not be just about that. There’s also teaching, mentoring, outreach, software, etc, which I am not sure it’s always taken into account. Not that it affects me because I am not looking to climb the academic ladder.↩︎
The slide-deck is not available, but if a few people show interest I could make put it online.↩︎
Reuse
Citation
@online{domingues2020,
author = {Domingues, António},
title = {My Visual {CV}},
date = {2020-11-25},
url = {https://amjdomingues.com/posts/2020-11-25-visual-cv/},
langid = {en}
}