Mixing writing and `python`

Author

Denis Schluppeck

Session date:

2023-01-04

Background

Here is an example of a document that produces a plot from some data that’s stored separately.

The data in Figure 1 shows the daily interactions with the moodle page for my second year lab classes. Can you spot the two dominant patterns in the data?

an actual computed figure

Code

#! /usr/bin/env python3
#
# schluppeck, 2022-12-10

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import display, Markdown

# in terminal (but not here), we need
# mpl.use('tkagg')

file_name = "2017-mysteryTimeseries.csv"
rawdata = pd.read_csv(file_name)

data = rawdata.rename(
    columns={
        "theTime_day" : "date",  
        "n" : "interactions",
    }
)

# can inspect first few rows like this:
# data.head() 

data.plot() # pd dataframe has plot() method
plt.legend("")
plt.xlabel('Days on course')
plt.ylabel('Moodle interactions')
plt.show()

Figure 1: A line plot of some mystery data

or a table

A badly formatted table… rstats with various packages handles tabular data much more nicely!

Code

data.head()\
  .style

	date	interactions
0	2016-06-13	2
1	2016-07-21	1
2	2016-09-01	2
3	2016-09-09	2
4	2016-09-13	1

or some “maths”

If you want to compute things for including in your text, so-called inline code, then you can make your code spit out markdown that’s been patched up. If you turn #| echo: true to false, then the code is hidden!

`python data.shape[0]`

Code

nrows =  data.shape[0]
ncols = data.shape[1]

nInteractions = data.max()[1]
dInteractions = data.max()[0]

display(
    Markdown(
    """
### Patched up markdown

The dataframe had {nrows} rows and {ncols} columns.

The largest number of interactions was {n} on {d}

""".format(nrows = nrows, ncols = ncols, n = nInteractions, d=dInteractions)))

/var/folders/t6/cyw370ts3tqfydrs33_n_39m0000gr/T/ipykernel_45716/1740164801.py:4: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/var/folders/t6/cyw370ts3tqfydrs33_n_39m0000gr/T/ipykernel_45716/1740164801.py:5: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

Patched up markdown

The dataframe had 81 rows and 2 columns.

The largest number of interactions was 534 on 2016-12-08

Notes

For me, to get the quarto preview to run correctly, I also had to install pip3 install matplotlib-inline
Check out how conveniently the output format can be swapped out with `quarto render 01-doc-with-python