Mixing writing and python

Author

Denis Schluppeck

Session date:

2023-01-04

Background

Here is an example of a document that produces a plot from some data that’s stored separately.

The data in Figure 1 shows the daily interactions with the moodle page for my second year lab classes. Can you spot the two dominant patterns in the data?

an actual computed figure

Code
#! /usr/bin/env python3
#
# schluppeck, 2022-12-10

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import display, Markdown

# in terminal (but not here), we need
# mpl.use('tkagg')

file_name = "2017-mysteryTimeseries.csv"
rawdata = pd.read_csv(file_name)

data = rawdata.rename(
    columns={
        "theTime_day" : "date",  
        "n" : "interactions",
    }
)

# can inspect first few rows like this:
# data.head() 

data.plot() # pd dataframe has plot() method
plt.legend("")
plt.xlabel('Days on course')
plt.ylabel('Moodle interactions')
plt.show()
Figure 1: A line plot of some mystery data

or a table

A badly formatted table… rstats with various packages handles tabular data much more nicely!

Code
data.head()\
  .style
  date interactions
0 2016-06-13 2
1 2016-07-21 1
2 2016-09-01 2
3 2016-09-09 2
4 2016-09-13 1

or some “maths”

If you want to compute things for including in your text, so-called inline code, then you can make your code spit out markdown that’s been patched up. If you turn #| echo: true to false, then the code is hidden!

`python data.shape[0]`
Code
nrows =  data.shape[0]
ncols = data.shape[1]

nInteractions = data.max()[1]
dInteractions = data.max()[0]

display(
    Markdown(
    """
### Patched up markdown

The dataframe had {nrows} rows and {ncols} columns.

The largest number of interactions was {n} on {d}

""".format(nrows = nrows, ncols = ncols, n = nInteractions, d=dInteractions)))
/var/folders/t6/cyw370ts3tqfydrs33_n_39m0000gr/T/ipykernel_45716/1740164801.py:4: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/var/folders/t6/cyw370ts3tqfydrs33_n_39m0000gr/T/ipykernel_45716/1740164801.py:5: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

Patched up markdown

The dataframe had 81 rows and 2 columns.

The largest number of interactions was 534 on 2016-12-08

Notes

  • For me, to get the quarto preview to run correctly, I also had to install pip3 install matplotlib-inline

  • Check out how conveniently the output format can be swapped out with `quarto render 01-doc-with-python