This article presents the “Unbraided Ribbon Problem” in which
geom_ribbon()
incorrectly fills the area between two
alternating lines with two different colors. To fix the problem, we use
geom_braid()
from ggbraid with
method = 'line'
.
Let’s compare the temperatures of two cities in the United States: New York, New York and San Francisco, California.
Getting Started
ggbraid provides a data frame called temps
, with daily
average temperatures of New York and San Francisco in 2021 as recorded
by the US National Weather Service (NWS) at weather.gov.1
library(ggplot2)
library(ggbraid)
library(dplyr)
library(tidyr)
data(temps)
temps
#> # A tibble: 730 × 3
#> city date avg
#> <chr> <date> <dbl>
#> 1 New York 2021-01-01 36.5
#> 2 New York 2021-01-02 43.5
#> 3 New York 2021-01-03 36
#> 4 New York 2021-01-04 39
#> 5 New York 2021-01-05 39
#> 6 New York 2021-01-06 37.5
#> 7 New York 2021-01-07 35.5
#> 8 New York 2021-01-08 32
#> 9 New York 2021-01-09 30.5
#> 10 New York 2021-01-10 35
#> # … with 720 more rows
city
is New York
or
San Francisco
, date
is a calendar date in the
YYYY-MM-DD
format, and avg
is the average
temperature recorded in degrees Fahrenheit (°F) and rounded to the
nearest half degree.2
What do the daily average temperatures look like?
We see much higher variability in temperatures in New York compared with San Francisco. This makes sense — New York is in the Northeastern US and experiences hot, humid summers and cold, occassionally snowy winters. San Francisco is on the West Coast and its Mediterranean climate means its temperature does not change much season to season.
Before we proceed further, let’s clean up the plot a bit and assign
it to a variable p
so we can reuse it throughout the
article.
p <- ggplot() +
geom_line(aes(x = date, y = avg, linetype = city), data = temps) +
scale_x_date(date_breaks = "1 month", date_labels = "%b") +
scale_y_continuous(
breaks = seq(20, 90, by = 10),
labels = function(x, ...) format(paste(x, "°F"), ...),
limits = c(18, 90)
) +
guides(fill = "none") +
labs(
title = "Average Daily Temperatures in 2021",
linetype = NULL,
y = NULL,
x = NULL
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(size = 15),
plot.title.position = "plot",
legend.position = c(0.75, 1.06),
legend.direction = "horizontal",
legend.key.size = unit(2, "line"),
legend.text = element_text(size = 12),
panel.grid.major.x = element_line(linewidth = 0.4),
panel.grid.major.y = element_line(linewidth = 0.4),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)
p
Much better.
Fun with Ribbons
Let’s fill the area between the two lines. We can do so with
geom_ribbon()
from ggplot2
.
geom_ribbon()
requires three aesthetics: x
,
ymin
, and ymax
. We can map date
to x
as we did in geom_line()
. However, we’ll
need to transform temps
to create new variables that we can
map to ymin
and ymax
.
We can pivot temps
with pivot_wider()
from
the tidyr
package, taking column names from
city
and values from avg
. Call the new data
frame temps_wide
.
temps_wide <- temps %>%
pivot_wider(names_from = city, values_from = avg) %>%
rename(ny = `New York`, sf = `San Francisco`)
temps_wide
#> # A tibble: 365 × 3
#> date ny sf
#> <date> <dbl> <dbl>
#> 1 2021-01-01 36.5 51.5
#> 2 2021-01-02 43.5 50
#> 3 2021-01-03 36 50.5
#> 4 2021-01-04 39 54.5
#> 5 2021-01-05 39 50
#> 6 2021-01-06 37.5 50.5
#> 7 2021-01-07 35.5 53
#> 8 2021-01-08 32 52.5
#> 9 2021-01-09 30.5 52
#> 10 2021-01-10 35 50.5
#> # … with 355 more rows
Now we can add a new layer to p
with
geom_ribbon()
using temps_wide
. Map
date
to x
, ny
to
ymin
, and sf
to ymax
.3 Finally,
add some transparency with alpha = 0.3
.
p +
geom_ribbon(
aes(x = date, ymin = ny, ymax = sf),
data = temps_wide,
alpha = 0.3
)
Great! Using geom_ribbon()
we’ve added a light grey
ribbon that runs between the two lines.
On second thought… what if we used two colors for the ribbon? We could have one color when New York is hotter than San Francisco and another color when New York is colder than San Francisco.
This shouldn’t be hard to do. Map sf > ny
to
fill
in geom_ribbon()
and…
p +
geom_ribbon(
aes(x = date, ymin = ny, ymax = sf, fill = sf > ny),
data = temps_wide,
alpha = 0.7
)
Chaos.
What happened? Is this a bug in geom_ribbon()
?
No, it’s not a bug. The problem is that we haven’t dealt with line intersections properly. I call this the Unbraided Ribbon Problem.
The Unbraided Ribbon Problem
Consider rows 80-82 from temps_wide
:
date | ny | sf |
---|---|---|
2021-03-21 | 52.5 | 52.0 |
2021-03-22 | 52.0 | 52.0 |
2021-03-23 | 54.5 | 56.5 |
After we pass temps_wide
to geom_ribbon()
and map date
to x
, ny
to
ymin
, sf
to ymax
, and
sf > ny
to fill
, we get the following:
x | ymin | ymax | fill |
---|---|---|---|
18707 | 52.5 | 52.0 | FALSE |
18708 | 52.0 | 52.0 | FALSE |
18709 | 54.5 | 56.5 | TRUE |
(x
is the integer representation of date
,
the number of days since January 1, 1970, the “Unix epoch”)
Ok, note the middle row. ymin
and ymax
are
equal here, so this is a point where the two lines intersect. It turns
out that geom_ribbon()
requires two rows for every line
intersection, one row where fill
is FALSE
and
another row where fill
is TRUE
.
So we must insert a new row in the data, yielding the following:
x | ymin | ymax | fill |
---|---|---|---|
18707 | 52.5 | 52.0 | FALSE |
18708 | 52.0 | 52.0 | FALSE |
18708 | 52.0 | 52.0 | TRUE |
18709 | 54.5 | 56.5 | TRUE |
We call this process braiding.
We need to braid the ribbon where the lines intersect.
And the intersection described here is not the only type that requires braiding.
There are instances where the two lines intersect between
two rows in the data. In these cases, we must use a mathematical formula
to determine the exact point at which the lines intersect and braid the
ribbon accordingly. There are also instances where both lines are
vertical at the same x
, an uncommon situation but one that
produces an infinite number of intersection points and requires braiding
to fix.
Braiding Ribbons with ggbraid
The functions in ggbraid take care of all the braiding for you.
Simply replace geom_ribbon()
with
geom_braid()
.
p +
geom_braid(
aes(x = date, ymin = ny, ymax = sf, fill = sf > ny),
data = temps_wide,
alpha = 0.7
)
#> `geom_braid()` using method = 'line'
There we go!
Notice the message from geom_braid()
that it is using
method = 'line'
. Since we’ve drawn lines with
geom_line()
we must use method = 'line'
to
determine the point at which the lines intersect when the intersection
occurs between two rows in the data. We can silence this message by
explicity including method = 'line'
within
geom_braid()
.
geom_braid()
takes the data provided, performs the
necessary braiding operations on it with stat_braid()
, and
passes the result to geom_ribbon()
for drawing. If we’d
like, we can still use geom_ribbon()
and set
stat = 'braid'
.
p +
geom_ribbon(
aes(x = date, ymin = ny, ymax = sf, fill = sf > ny),
data = temps_wide,
stat = "braid",
method = "line",
alpha = 0.7
)
This is the same plot as before. We’ve also silenced the message by
including method = 'line'
.
Finally, it may be helpful to label the ribbon colors so it’s clear
what they represent. This can happen in a legend (which we’ve turned off
with the guides(fill = "none")
layer in p
).
Another possibility is to provide text annotations on the plot.
hues <- scales::hue_pal()(2) # ggplot2 default color palette
p +
geom_braid(
aes(x = date, ymin = ny, ymax = sf, fill = sf > ny),
data = temps_wide,
method = "line",
alpha = 0.7
) +
annotate("text", x = as.Date("2021-09-10"), y = 84, size = 4, label = "NY hotter than SF", hjust = 0, color = hues[1]) +
annotate("text", x = as.Date("2021-02-20"), y = 23, size = 4, label = "NY colder than SF", hjust = 0, color = hues[2])