library(ggplot2)
ggplot(data.frame(x = rnorm(10000), y = rnorm(10000)), aes(x = x, y = y)) +
geom_point(size = 0.01) +
labs(title = "10,000 points produce a lot of overplotting") +
coord_fixed(xlim = c(-3, 3), ylim = c(-3, 3))
ggplot(data.frame(x = rnorm(10000), y = rnorm(10000)), aes(x = x, y = y)) +
geom_point(size = 1e-10) +
labs(title = "Even very small point sizes don't fix the issue") +
coord_fixed(xlim = c(-3, 3), ylim = c(-3, 3))
ggplot(data.frame(x = rnorm(10000), y = rnorm(10000)), aes(x = x, y = y)) +
geom_point(size = 0.01, shape = 21, fill = "black", color = "#00000000") +
labs(title = "Removing the stroke produces accurate point sizes") +
coord_fixed(xlim = c(-3, 3), ylim = c(-3, 3))
The trick is to use a shape that distinguishes stroke and fill color.
Another way to see the problem, is to compare the points with a stroke (top line) and without a stroke (bottom line) directly.
ggplot(data.frame(x = seq(0, 2, length.out = 30))) +
geom_point(aes(x = x, y = 0.2, size = x), shape = 21, fill = "#00000000", color = "black") +
geom_point(aes(x = x, y = -0.2, size = x), shape = 21, fill = "black", color = "#00000000") +
scale_size_identity() +
lims(y = c(-1, 1)) +
labs(title = "The row without a stroke vanishes")