Here is an example of a document that produces a plot from some data that’s stored separately.
The data in Figure 1 shows the daily interactions with the moodle page for my second year lab classes. Can you spot the two dominant patterns in the data?
an actual computed figure
Code
#! /usr/bin/env python3## schluppeck, 2022-12-10import numpy as npimport matplotlib as mplimport matplotlib.pyplot as pltimport pandas as pdfrom IPython.display import display, Markdown# in terminal (but not here), we need# mpl.use('tkagg')file_name ="2017-mysteryTimeseries.csv"rawdata = pd.read_csv(file_name)data = rawdata.rename( columns={"theTime_day" : "date", "n" : "interactions", })# can inspect first few rows like this:# data.head() data.plot() # pd dataframe has plot() methodplt.legend("")plt.xlabel('Days on course')plt.ylabel('Moodle interactions')plt.show()
or a table
A badly formatted table… rstats with various packages handles tabular data much more nicely!
Code
data.head()\ .style
date
interactions
0
2016-06-13
2
1
2016-07-21
1
2
2016-09-01
2
3
2016-09-09
2
4
2016-09-13
1
or some “maths”
If you want to compute things for including in your text, so-called inline code, then you can make your code spit out markdown that’s been patched up. If you turn #| echo: true to false, then the code is hidden!
`python data.shape[0]`
Code
nrows = data.shape[0]ncols = data.shape[1]nInteractions = data.max()[1]dInteractions = data.max()[0]display( Markdown("""### Patched up markdownThe dataframe had {nrows} rows and {ncols} columns.The largest number of interactions was {n} on {d}""".format(nrows = nrows, ncols = ncols, n = nInteractions, d=dInteractions)))
/var/folders/t6/cyw370ts3tqfydrs33_n_39m0000gr/T/ipykernel_45716/1740164801.py:4: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/var/folders/t6/cyw370ts3tqfydrs33_n_39m0000gr/T/ipykernel_45716/1740164801.py:5: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
Patched up markdown
The dataframe had 81 rows and 2 columns.
The largest number of interactions was 534 on 2016-12-08
Notes
For me, to get the quarto preview to run correctly, I also had to install pip3 install matplotlib-inline
Check out how conveniently the output format can be swapped out with `quarto render 01-doc-with-python