Rethinking your plotting habits

Data visualisation block - data club

Denis Schluppeck

2023-03-29

Introduction to this block

Want to

  • explore ideas around data visualisation
  • think about better ways to show things
  • learn some practical /useful tips

(maybe with your favourite tool, language, …)

Discussion forum, webpage

Aims

Make plots / visualisations of data:

  • reproducible
  • more flexible for exploration
  • publication-ready (little or no editing by hand)

Principles should apply across different languages: matlab, python, r, julia, …

and different kinds of data: fMRI, EEG, psychophys, …

… many people’s default

Charts vs Graphics

  • a chart is one instance of a way to visualise, often tied to a function like plot() or histogram(), …
  • a graphic is more flexible, may be a mixture, … ideally it can be built up and composed

Enter: A Grammar of Graphics

Leland Wilkinson’s classic book

A caveat

this talk won’t turn us into datavis professionals

Layered components

Data: variables \(\rightarrow\) aesthetics

  • position (x,y)
  • size (radius, area)
  • colour, shape, linewidth, …

stat - transform data?!

identity seems obvious… a do many others, but nb! jitter

geoms

The actual marks on the plot, points, lines, polygons, …

scales

  • governs mapping from data to aesthetic properties
  • think about: domain / range
  • categorial, continuous data?

Wickham (2009), Figure 7

facets

  • also conditioned or trellis plots

  • multiple panels / plots for subset of data

    • by cuts (e.g. quartiles)
    • by category (conditioned on a variable)

see example

coordinates

Deciding on the coordinate system to use - linear - log / semilog / sqrt - polar

Wickham (2009), Figure 8

Example data

delta estimate shift coherence subject
-12.5 -6 -6 0.04 A
-0.5 -8 -6 0.04 A
-16.5 -8 -6 0.04 A
-11.5 -4 -6 0.04 A
-22.5 -1 -6 0.04 A
-16.5 -14 -6 0.04 A
-1.5 -3 -6 0.04 A
-8.5 -3 -6 0.04 A
-7.5 -3 -6 0.04 A
-15.5 -8 -6 0.04 A
-17.5 -2 -6 0.04 A
-8.5 -13 -6 0.04 A
-15.5 -4 -6 0.04 A
-3.5 -7 -6 0.04 A
-16.5 -12 -6 0.04 A
-23.5 -14 -6 0.04 A
-20.5 -5 -6 0.04 A
-14.5 -1 -6 0.04 A
-18.5 -1 -6 0.04 A
-22.5 -5 -6 0.04 A

delta \(\rightarrow\) x, estimate \(\rightarrow\) y ; use points

Code
e |> ggplot(aes(x = delta, y = estimate)) +
     geom_point() 

delta \(\rightarrow\) x, estimate \(\rightarrow\) y, shift \(\rightarrow\) color

Code
e |> ggplot(aes(x = delta, y = estimate, color=shift)) +
    geom_point()

fix overplotting / small random jitter + transparency

Code
e |> ggplot(aes(x = delta, y = estimate, color=shift)) +
    geom_jitter(alpha=0.5)

too busy, split up by using facets

Code
e |> ggplot(aes(x = delta, y = estimate, color=shift)) +
     geom_jitter(alpha=0.5) +
     facet_wrap(~shift)

can we add density estimate?

Code
e |> ggplot(aes(x = delta, y = estimate, color=shift)) +
     geom_jitter(alpha=0.5) +
     geom_density2d(color="black", size=0.4)+
     facet_wrap(~shift)

make sure the coordinates x/y axes are equal + add a line at y=0

Code
e |> ggplot(aes(x = delta, y = estimate, color=shift)) +
     geom_jitter(alpha=0.5) +
     geom_hline(yintercept = 0, color="black") +
     facet_wrap(~shift) +
     coord_equal() 

simple change geom from point to bin2d + flourish… add regression lines.

Code
e |> ggplot(aes(x = delta, y = estimate, group=sign(delta))) +
     geom_bin2d() +
     scale_fill_gradient(low = "#eeeeee", high="#ff0000") +
     geom_smooth(method="lm", color="black") +
     facet_wrap(~shift) +
     coord_equal() 

Etc, etc…

Now for some discussion + thinking…

Slides, resources + links are on our webpage:

https://schluppeck.github.io/ng-data-club/