Spaces:

raymondEDS
/

DS_webclass

Running

App Files Files Community

raymondEDS commited on 12 days ago

Commit

63a7f01

1 Parent(s): 1289315

Week 2 HW

Browse files

Files changed (15) hide show

.DS_Store +0 -0
Reference files/Week2_ref/Ch02-statlearn-lab.ipynb +3229 -0
Reference files/Week2_ref/Lecture_1_basics.ipynb +0 -0
app/.DS_Store +0 -0
app/__pycache__/main.cpython-311.pyc +0 -0
app/components/__pycache__/login.cpython-311.pyc +0 -0
app/components/login.py +6 -2
app/main.py +19 -170
app/pages/.DS_Store +0 -0
app/pages/1_Week_1.py +0 -168
app/pages/__pycache__/week_1.cpython-311.pyc +0 -0
app/pages/__pycache__/week_2.cpython-311.pyc +0 -0
app/pages/week_1.py +8 -149
app/pages/week_1_WIP.py +159 -0
app/pages/week_2.py +228 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

Reference files/Week2_ref/Ch02-statlearn-lab.ipynb ADDED Viewed

	@@ -0,0 +1,3229 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "245f0c86",
+   "metadata": {},
+   "source": [
+    "\n",
+    "# Chapter 2\n",
+    "\n",
+    "# Lab: Introduction to Python\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ab29948",
+   "metadata": {},
+   "source": [
+    "## Getting Started"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed622870",
+   "metadata": {},
+   "source": [
+    "To run the labs in this book, you will need two things:\n",
+    "\n",
+    "* An installation of `Python3`, which is the specific version of `Python`  used in the labs. \n",
+    "* Access to  `Jupyter`, a very popular `Python` interface that runs code through a file called a *notebook*. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "844d37fc",
+   "metadata": {},
+   "source": [
+    "You can download and install  `Python3`   by following the instructions available at [anaconda.com](http://anaconda.com). "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "462ff1fe",
+   "metadata": {},
+   "source": [
+    " There are a number of ways to get access to `Jupyter`. Here are just a few:\n",
+    " \n",
+    " * Using Google's `Colaboratory` service: [colab.research.google.com/](https://colab.research.google.com/). \n",
+    " * Using `JupyterHub`, available at [jupyter.org/hub](https://jupyter.org/hub). \n",
+    " * Using your own `jupyter` installation. Installation instructions are available at [jupyter.org/install](https://jupyter.org/install). \n",
+    " \n",
+    "Please see the `Python` resources page on the book website [statlearning.com](https://www.statlearning.com) for up-to-date information about getting `Python` and `Jupyter` working on your computer. \n",
+    "\n",
+    "You will need to install the `ISLP` package, which provides access to the datasets and custom-built functions that we provide.\n",
+    "Inside a macOS or Linux terminal type `pip install ISLP`; this also installs most other packages needed in the labs. The `Python` resources page has a link to the `ISLP` documentation website.\n",
+    "\n",
+    "To run this lab, download the file `Ch2-statlearn-lab.ipynb` from the `Python` resources page. \n",
+    "Now run the following code at the command line: `jupyter lab Ch2-statlearn-lab.ipynb`.\n",
+    "\n",
+    "If you're using Windows, you can use the `start menu` to access `anaconda`, and follow the links. For example, to install `ISLP` and run this lab, you can run the same code above in an `anaconda` shell.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b46f9182",
+   "metadata": {},
+   "source": [
+    "## Basic Commands\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54060fd9",
+   "metadata": {},
+   "source": [
+    "In this lab, we will introduce some simple `Python` commands. \n",
+    " For more resources about `Python` in general, readers may want to consult the tutorial at [docs.python.org/3/tutorial/](https://docs.python.org/3/tutorial/). \n",
+    "\n",
+    "\n",
+    " \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3dbd0e9",
+   "metadata": {},
+   "source": [
+    "Like most programming languages, `Python` uses *functions*\n",
+    "to perform operations.   To run a\n",
+    "function called `fun`, we type\n",
+    "`fun(input1,input2)`, where the inputs (or *arguments*)\n",
+    "`input1` and `input2` tell\n",
+    "`Python` how to run the function.  A function can have any number of\n",
+    "inputs. For example, the\n",
+    "`print()`  function outputs a text representation of all of its arguments to the console."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "9e8aa21f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "fit a model with 11 variables\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('fit a model with', 11, 'variables')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27d935f8",
+   "metadata": {},
+   "source": [
+    " The following command will provide information about the `print()` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d62ec119",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "print?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04b3e2a3",
+   "metadata": {},
+   "source": [
+    "Adding two integers in `Python` is pretty intuitive."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c64e9f4d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "3 + 5\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd754cba",
+   "metadata": {},
+   "source": [
+    "In `Python`, textual data is handled using\n",
+    "*strings*. For instance, `\"hello\"` and\n",
+    "`'hello'`\n",
+    "are strings. \n",
+    "We can concatenate them using the addition `+` symbol."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9abccc1f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "\"hello\" + \"world\"\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c28db903",
+   "metadata": {},
+   "source": [
+    " A string is actually a type of *sequence*: this is a generic term for an ordered list. \n",
+    " The three most important types of sequences are lists, tuples, and strings.  \n",
+    "We introduce lists now. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fdcc5a1",
+   "metadata": {},
+   "source": [
+    "The following command instructs `Python` to join together\n",
+    "the numbers 3, 4, and 5, and to save them as a\n",
+    "*list* named `x`. When we\n",
+    "type `x`, it gives us back the list."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "802ca33c",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "x = [3, 4, 5]\n",
+    "x\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5492ecd1",
+   "metadata": {},
+   "source": [
+    "Note that we used the brackets\n",
+    "`[]` to construct this list. \n",
+    "\n",
+    "We will often want to add two sets of numbers together. It is reasonable to try the following code,\n",
+    "though it will not produce the desired results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a8c72744",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "y = [4, 9, 7]\n",
+    "x + y\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b84f9d0e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x[3]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f42ea1d",
+   "metadata": {},
+   "source": [
+    "The result may appear slightly counterintuitive: why did `Python` not add the entries of the lists\n",
+    "element-by-element? \n",
+    " In `Python`, lists hold *arbitrary* objects, and  are added using  *concatenation*. \n",
+    " In fact, concatenation is the behavior that we saw earlier when we entered `\"hello\" + \" \" + \"world\"`. \n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69015df5",
+   "metadata": {},
+   "source": [
+    "This example reflects the fact that \n",
+    " `Python` is a general-purpose programming language. Much of `Python`'s  data-specific\n",
+    "functionality comes from other packages, notably `numpy`\n",
+    "and `pandas`. \n",
+    "In the next section, we will introduce the  `numpy` package. \n",
+    "See [docs.scipy.org/doc/numpy/user/quickstart.html](https://docs.scipy.org/doc/numpy/user/quickstart.html) for more information about `numpy`.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16bfc4a2",
+   "metadata": {},
+   "source": [
+    "## Introduction to Numerical Python\n",
+    "\n",
+    "As mentioned earlier, this book makes use of functionality   that is contained in the `numpy` \n",
+    " *library*, or *package*. A package is a collection of modules that are not necessarily included in \n",
+    " the base `Python` distribution. The name `numpy` is an abbreviation for *numerical Python*. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5bed3f0",
+   "metadata": {},
+   "source": [
+    "  To access `numpy`, we must first `import` it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f1c7d1db",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c8614e7",
+   "metadata": {},
+   "source": [
+    "In the previous line, we named the `numpy` *module* `np`; an abbreviation for easier referencing."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba1224a6",
+   "metadata": {},
+   "source": [
+    "In `numpy`, an *array* is  a generic term for a multidimensional\n",
+    "set of numbers.\n",
+    "We use the `np.array()` function to define   `x` and `y`, which are one-dimensional arrays, i.e. vectors."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e2ea2bfd",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x = np.array([3, 4, 5])\n",
+    "y = np.array([4, 9, 7])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a977e05a",
+   "metadata": {},
+   "source": [
+    "Note that if you forgot to run the `import numpy as np` command earlier, then\n",
+    "you will encounter an error in calling the `np.array()` function in the previous line. \n",
+    " The syntax `np.array()` indicates that the function being called\n",
+    "is part of the `numpy` package, which we have abbreviated as `np`. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "742431b6",
+   "metadata": {},
+   "source": [
+    "Since `x` and `y` have been defined using `np.array()`, we get a sensible result when we add them together. Compare this to our results in the previous section,\n",
+    " when we tried to add two lists without using `numpy`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59fbf9fd",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x + y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ceccc2b",
+   "metadata": {},
+   "source": [
+    "    \n",
+    " \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74be6d74",
+   "metadata": {},
+   "source": [
+    "In `numpy`, matrices are typically represented as two-dimensional arrays, and vectors as one-dimensional arrays. {While it is also possible to create matrices using  `np.matrix()`, we will use `np.array()` throughout the labs in this book.}\n",
+    "We can create a two-dimensional array as follows. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2279437e",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x = np.array([[1, 2], [3, 4]])\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f96f304d",
+   "metadata": {},
+   "source": [
+    "    \n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f764f7d1",
+   "metadata": {},
+   "source": [
+    "The object `x` has several \n",
+    "*attributes*, or associated objects. To access an attribute of `x`, we type `x.attribute`, where we replace `attribute`\n",
+    "with the name of the attribute. \n",
+    "For instance, we can access the `ndim` attribute of  `x` as follows. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "75bf1b1e",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "x.ndim"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e3b83bf",
+   "metadata": {},
+   "source": [
+    "The output indicates that `x` is a two-dimensional array.  \n",
+    "Similarly, `x.dtype` is the *data type* attribute of the object `x`. This indicates that `x` is \n",
+    "comprised of 64-bit integers:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "58292240",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x.dtype"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf9cf94b",
+   "metadata": {},
+   "source": [
+    "Why is `x` comprised of integers? This is because we created `x` by passing in exclusively integers to the `np.array()` function.\n",
+    "  If\n",
+    "we had passed in any decimals, then we would have obtained an array of\n",
+    "*floating point numbers* (i.e. real-valued numbers). "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc5fff57",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "np.array([[1, 2], [3.0, 4]]).dtype\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41a79641",
+   "metadata": {},
+   "source": [
+    "Typing `fun?` will cause `Python` to display \n",
+    "documentation associated with the function `fun`, if it exists.\n",
+    "We can try this for `np.array()`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "762562a6",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "np.array?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4d82167",
+   "metadata": {},
+   "source": [
+    "This documentation indicates that we could create a floating point array by passing a `dtype` argument into `np.array()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66d2b82a",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "np.array([[1, 2], [3, 4]], float).dtype\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e3ba5be",
+   "metadata": {},
+   "source": [
+    "The array `x` is two-dimensional. We can find out the number of rows and columns by looking\n",
+    "at its `shape` attribute."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89881402",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "x.shape\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2967b644",
+   "metadata": {},
+   "source": [
+    "A *method* is a function that is associated with an\n",
+    "object. \n",
+    "For instance, given an array `x`, the expression\n",
+    "`x.sum()` sums all of its elements, using the `sum()`\n",
+    "method for arrays. \n",
+    "The call `x.sum()` automatically provides `x` as the\n",
+    "first argument to its `sum()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0572d3f6",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x = np.array([1, 2, 3, 4])\n",
+    "x.sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3f49995",
+   "metadata": {},
+   "source": [
+    "We could also sum the elements of `x` by passing in `x` as an argument to the `np.sum()` function. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "33b10a6f",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x = np.array([1, 2, 3, 4])\n",
+    "np.sum(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f3dd2c3",
+   "metadata": {},
+   "source": [
+    " As another example, the\n",
+    "`reshape()` method returns a new array with the same elements as\n",
+    "`x`, but a different shape.\n",
+    " We do this by passing in a `tuple` in our call to\n",
+    " `reshape()`, in this case `(2, 3)`.  This tuple specifies that we would like to create a two-dimensional array with \n",
+    "$2$ rows and $3$ columns. {Like lists, tuples represent a sequence of objects. Why do we need more than one way to create a sequence? There are a few differences between tuples and lists, but perhaps the most important is that elements of a tuple cannot be modified, whereas elements of a list can be.}\n",
+    " \n",
+    "In what follows, the\n",
+    "`\\n` character creates a *new line*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a32716db",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "x = np.array([1, 2, 3, 4, 5, 6])\n",
+    "print('beginning x:\\n', x)\n",
+    "x_reshape = x.reshape((2, 3))\n",
+    "print('reshaped x:\\n', x_reshape)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2483179e",
+   "metadata": {},
+   "source": [
+    "The previous output reveals that `numpy` arrays are specified as a sequence\n",
+    "of *rows*. This is  called *row-major ordering*, as opposed to *column-major ordering*. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e256575f",
+   "metadata": {},
+   "source": [
+    "`Python` (and hence `numpy`) uses 0-based\n",
+    "indexing. This means that to access the top left element of `x_reshape`, \n",
+    "we type in `x_reshape[0,0]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3db6e1cf",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x_reshape[0, 0] "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e10119e",
+   "metadata": {},
+   "source": [
+    "Similarly, `x_reshape[1,2]` yields the element in the second row and the third column \n",
+    "of `x_reshape`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e15c753f",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "x_reshape[1, 2] "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9c55622",
+   "metadata": {},
+   "source": [
+    "Similarly, `x[2]` yields the\n",
+    "third entry of `x`. \n",
+    "\n",
+    "Now, let's modify the top left element of `x_reshape`.  To our surprise, we discover that the first element of `x` has been modified as well!\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91c6e7d8",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "print('x before we modify x_reshape:\\n', x)\n",
+    "print('x_reshape before we modify x_reshape:\\n', x_reshape)\n",
+    "x_reshape[0, 0] = 5\n",
+    "print('x_reshape after we modify its top left element:\\n', x_reshape)\n",
+    "print('x after we modify top left element of x_reshape:\\n', x)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a840507",
+   "metadata": {},
+   "source": [
+    "Modifying `x_reshape` also modified `x` because the two objects occupy the same space in memory.\n",
+    " \n",
+    "\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec551f3e",
+   "metadata": {},
+   "source": [
+    "We just saw that we can modify an element of an array. Can we also modify a tuple? It turns out that we cannot --- and trying to do so introduces\n",
+    "an *exception*, or error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59d95dce",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "my_tuple = (3, 4, 5)\n",
+    "my_tuple[0] = 2\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d594f1af",
+   "metadata": {},
+   "source": [
+    "We now briefly mention some attributes of arrays that will come in handy. An array's `shape` attribute contains its dimension; this is always a tuple.\n",
+    "The  `ndim` attribute yields the number of dimensions, and `T` provides its transpose. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a6fde9af",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "x_reshape.shape, x_reshape.ndim, x_reshape.T\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76d20b98",
+   "metadata": {},
+   "source": [
+    "Notice that the three individual outputs `(2,3)`, `2`, and `array([[5, 4],[2, 5], [3,6]])` are themselves output as a tuple. \n",
+    " \n",
+    "We will often want to apply functions to arrays. \n",
+    "For instance, we can compute the\n",
+    "square root of the entries using the `np.sqrt()` function: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fadb6b45",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "np.sqrt(x)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22fab2ce",
+   "metadata": {},
+   "source": [
+    "We can also square the elements:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fda3134b",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "x**2\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1278f26b",
+   "metadata": {},
+   "source": [
+    "We can compute the square roots using the same notation, raising to the power of $1/2$ instead of 2."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "52eb335b",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "x**0.5\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "299a5a85",
+   "metadata": {},
+   "source": [
+    "Throughout this book, we will often want to generate random data. \n",
+    "The `np.random.normal()`  function generates a vector of random\n",
+    "normal variables. We can learn more about this function by looking at the help page, via a call to `np.random.normal?`.\n",
+    "The first line of the help page  reads `normal(loc=0.0, scale=1.0, size=None)`. \n",
+    " This  *signature* line tells us that the function's arguments are  `loc`, `scale`, and `size`. These are *keyword* arguments, which means that when they are passed into\n",
+    " the function, they can be referred to by name (in any order). {`Python` also uses *positional* arguments. Positional arguments do not need to use a keyword. To see an example, type in `np.sum?`. We see that `a` is a positional argument, i.e. this function assumes that the first unnamed argument that it receives is the array to be summed. By contrast, `axis` and `dtype` are keyword arguments: the position in which these arguments are entered into `np.sum()` does not matter.}\n",
+    " By default, this function will generate random normal variable(s) with mean (`loc`) $0$ and standard deviation (`scale`) $1$; furthermore, \n",
+    " a single random variable will be generated unless the argument to `size` is changed. \n",
+    "\n",
+    "We now generate 50 independent random variables from a $N(0,1)$ distribution. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac5e9d29",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "x = np.random.normal(size=50)\n",
+    "x\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d77cf45a",
+   "metadata": {},
+   "source": [
+    "We create an array `y` by adding an independent $N(50,1)$ random variable to each element of `x`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55fa905e",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "y = x + np.random.normal(loc=50, scale=1, size=50)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eacfecc9",
+   "metadata": {},
+   "source": [
+    "The `np.corrcoef()` function computes the correlation matrix between `x` and `y`. The off-diagonal elements give the \n",
+    "correlation between `x` and `y`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fde0dc19",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "np.corrcoef(x, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a594218",
+   "metadata": {},
+   "source": [
+    "If you're following along in your own `Jupyter` notebook, then you probably noticed that you got a different set of results when you ran the past few \n",
+    "commands. In particular, \n",
+    " each\n",
+    "time we call `np.random.normal()`, we will get a different answer, as shown in the following example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5099cf54",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "print(np.random.normal(scale=5, size=2))\n",
+    "print(np.random.normal(scale=5, size=2)) \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e209118",
+   "metadata": {},
+   "source": [
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed7697a4",
+   "metadata": {},
+   "source": [
+    "In order to ensure that our code provides exactly the same results\n",
+    "each time it is run, we can set a *random seed* \n",
+    "using the \n",
+    "`np.random.default_rng()` function.\n",
+    "This function takes an arbitrary, user-specified integer argument. If we set a random seed before \n",
+    "generating random data, then re-running our code will yield the same results. The\n",
+    "object `rng` has essentially all the random number generating methods found in `np.random`. Hence, to\n",
+    "generate normal data we use `rng.normal()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9d8074e5",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "rng = np.random.default_rng(1303)\n",
+    "print(rng.normal(scale=5, size=2))\n",
+    "rng2 = np.random.default_rng(1303)\n",
+    "print(rng2.normal(scale=5, size=2)) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93f826ef",
+   "metadata": {},
+   "source": [
+    "Throughout the labs in this book, we use `np.random.default_rng()`  whenever we\n",
+    "perform calculations involving random quantities within `numpy`.  In principle, this\n",
+    "should enable the reader to exactly reproduce the stated results. However, as new versions of `numpy` become available, it is possible\n",
+    "that some small discrepancies may occur between the output\n",
+    "in the labs and the output\n",
+    "from `numpy`.\n",
+    "\n",
+    "The `np.mean()`,  `np.var()`, and `np.std()`  functions can be used\n",
+    "to compute the mean, variance, and standard deviation of arrays.  These functions are also\n",
+    "available as methods on the arrays."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e98472df",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "rng = np.random.default_rng(3)\n",
+    "y = rng.standard_normal(10)\n",
+    "np.mean(y), y.mean()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2870d61f",
+   "metadata": {},
+   "source": [
+    "    \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c2784fd",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "np.var(y), y.var(), np.mean((y - y.mean())**2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86261a69",
+   "metadata": {},
+   "source": [
+    "Notice that by default `np.var()` divides by the sample size $n$ rather\n",
+    "than $n-1$; see the `ddof` argument in `np.var?`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e7205f2",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "np.sqrt(np.var(y)), np.std(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4faf901",
+   "metadata": {},
+   "source": [
+    "The `np.mean()`,  `np.var()`, and `np.std()` functions can also be applied to the rows and columns of a matrix. \n",
+    "To see this, we construct a $10 \\times 3$ matrix of $N(0,1)$ random variables, and consider computing its row sums. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fce06849",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "X = rng.standard_normal((10, 3))\n",
+    "X"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cc355d2",
+   "metadata": {},
+   "source": [
+    "Since arrays are row-major ordered, the first axis, i.e. `axis=0`, refers to its rows. We pass this argument into the `mean()` method for the object `X`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1403ff7a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "X.mean(axis=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6785c0ec",
+   "metadata": {},
+   "source": [
+    "The following yields the same result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e9255ba",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "X.mean(0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5de246dc",
+   "metadata": {},
+   "source": [
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30b002fa",
+   "metadata": {},
+   "source": [
+    "## Graphics\n",
+    "In `Python`, common practice is to use  the library\n",
+    "`matplotlib` for graphics.\n",
+    "However, since `Python` was not written with data analysis in mind,\n",
+    "  the notion of plotting is not intrinsic to the language. \n",
+    "We will use the `subplots()` function\n",
+    "from `matplotlib.pyplot` to create a figure and the\n",
+    "axes onto which we plot our data.\n",
+    "For many more examples of how to make plots in `Python`,\n",
+    "readers are encouraged to visit [matplotlib.org/stable/gallery/](https://matplotlib.org/stable/gallery/index.html).\n",
+    "\n",
+    "In `matplotlib`, a plot consists of a *figure* and one or more *axes*. You can think of the figure as the blank canvas upon which \n",
+    "one or more plots will be displayed: it is the entire plotting window. \n",
+    "The *axes* contain important information about each plot, such as its $x$- and $y$-axis labels,\n",
+    "title,  and more. (Note that in `matplotlib`, the word *axes* is not the plural of *axis*: a plot's *axes* contains much more information \n",
+    "than just the $x$-axis and  the $y$-axis.)\n",
+    "\n",
+    "We begin by importing the `subplots()` function\n",
+    "from `matplotlib`. We use this function\n",
+    "throughout when creating figures.\n",
+    "The function returns a tuple of length two: a figure\n",
+    "object as well as the relevant axes object. We will typically\n",
+    "pass `figsize` as a keyword argument.\n",
+    "Having created our axes, we attempt our first plot using its  `plot()` method.\n",
+    "To learn more about it, \n",
+    "type `ax.plot?`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8236e5f7",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "from matplotlib.pyplot import subplots\n",
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "x = rng.standard_normal(100)\n",
+    "y = rng.standard_normal(100)\n",
+    "ax.plot(x, y);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbef67e6",
+   "metadata": {},
+   "source": [
+    "We pause here to note that we have *unpacked* the tuple of length two returned by `subplots()` into the two distinct\n",
+    "variables `fig` and `ax`. Unpacking\n",
+    "is typically preferred to the following equivalent but slightly more verbose code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ddc9ed4f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "output = subplots(figsize=(8, 8))\n",
+    "fig = output[0]\n",
+    "ax = output[1]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "104d6b8f",
+   "metadata": {},
+   "source": [
+    "We see that our earlier cell produced a line plot, which is the default. To create a scatterplot, we provide an additional argument to `ax.plot()`, indicating that circles should be displayed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c64ed600",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.plot(x, y, 'o');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "840be2a9",
+   "metadata": {},
+   "source": [
+    "Different values\n",
+    "of this additional argument can be used to produce different colored lines\n",
+    "as well as different linestyles. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "971b98bd",
+   "metadata": {},
+   "source": [
+    "As an alternative, we could use the  `ax.scatter()` function to create a scatterplot."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bc6245e2",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.scatter(x, y, marker='o');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97f36df0",
+   "metadata": {},
+   "source": [
+    "Notice that in the code blocks above, we have ended\n",
+    "the last line with a semicolon. This prevents `ax.plot(x, y)` from printing\n",
+    "text  to the notebook. However, it does not prevent a plot from being produced. \n",
+    " If we omit the trailing semi-colon, then we obtain the following output:  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2454807b",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.scatter(x, y, marker='o')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1230c0a6",
+   "metadata": {},
+   "source": [
+    "In what follows, we will use\n",
+    " trailing semicolons whenever the text that would be output is not\n",
+    "germane to the discussion at hand.\n",
+    "\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ccb9964",
+   "metadata": {},
+   "source": [
+    "To label our plot, we  make use of the `set_xlabel()`,  `set_ylabel()`, and  `set_title()` methods\n",
+    "of `ax`.\n",
+    "  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1e18a793",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.scatter(x, y, marker='o')\n",
+    "ax.set_xlabel(\"this is the x-axis\")\n",
+    "ax.set_ylabel(\"this is the y-axis\")\n",
+    "ax.set_title(\"Plot of X vs Y\");"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2d818ee",
+   "metadata": {},
+   "source": [
+    " Having access to the figure object `fig` itself means that we can go in and change some aspects and then redisplay it. Here, we change\n",
+    "  the size from `(8, 8)` to `(12, 3)`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aec3f009",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig.set_size_inches(12,3)\n",
+    "fig"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dee531cc",
+   "metadata": {},
+   "source": [
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "011bf802",
+   "metadata": {},
+   "source": [
+    "Occasionally we will want to create several plots within a figure. This can be\n",
+    "achieved by passing additional arguments to `subplots()`. \n",
+    "Below, we create a  $2 \\times 3$ grid of plots\n",
+    "in a figure of size determined by the `figsize` argument. In such\n",
+    "situations, there is often a relationship between the axes in the plots. For example,\n",
+    "all plots may have a common $x$-axis. The `subplots()` function can automatically handle\n",
+    "this situation when passed the keyword argument `sharex=True`.\n",
+    "The `axes` object below is an array pointing to different plots in the figure. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2cbc7fd4",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, axes = subplots(nrows=2,\n",
+    "                     ncols=3,\n",
+    "                     figsize=(15, 5))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8ff2e6d",
+   "metadata": {},
+   "source": [
+    "We now produce a scatter plot with `'o'` in the second column of the first row and\n",
+    "a scatter plot with `'+'` in the third column of the second row."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "702f80d9",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "axes[0,1].plot(x, y, 'o')\n",
+    "axes[1,2].scatter(x, y, marker='+')\n",
+    "fig"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b265f8b",
+   "metadata": {},
+   "source": [
+    "Type  `subplots?` to learn more about \n",
+    "`subplots()`. \n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bd7e707",
+   "metadata": {},
+   "source": [
+    "To save the output of `fig`, we call its `savefig()`\n",
+    "method. The argument `dpi` is the dots per inch, used\n",
+    "to determine how large the figure will be in pixels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5493d229",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "fig.savefig(\"Figure.png\", dpi=400)\n",
+    "fig.savefig(\"Figure.pdf\", dpi=200);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7152d0c7",
+   "metadata": {},
+   "source": [
+    "We can continue to modify `fig` using step-by-step updates; for example, we can modify the range of the $x$-axis, re-save the figure, and even re-display it. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bd07af12",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "axes[0,1].set_xlim([-1,1])\n",
+    "fig.savefig(\"Figure_updated.jpg\")\n",
+    "fig"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5278857",
+   "metadata": {},
+   "source": [
+    "We now create some more sophisticated plots. The \n",
+    "`ax.contour()` method  produces a  *contour plot* \n",
+    "in order to represent three-dimensional data, similar to a\n",
+    "topographical map.  It takes three arguments:\n",
+    "\n",
+    "* A vector of `x` values (the first dimension),\n",
+    "* A vector of `y` values (the second dimension), and\n",
+    "* A matrix whose elements correspond to the `z` value (the third\n",
+    "dimension) for each pair of `(x,y)` coordinates.\n",
+    "\n",
+    "To create `x` and `y`, we’ll use the command  `np.linspace(a, b, n)`, \n",
+    "which returns a vector of `n` numbers starting at  `a` and  ending at `b`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01019508",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "x = np.linspace(-np.pi, np.pi, 50)\n",
+    "y = x\n",
+    "f = np.multiply.outer(np.cos(y), 1 / (1 + x**2))\n",
+    "ax.contour(x, y, f);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ef3c475",
+   "metadata": {},
+   "source": [
+    "We can increase the resolution by adding more levels to the image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7d08992f",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.contour(x, y, f, levels=45);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e1d37a2",
+   "metadata": {},
+   "source": [
+    "To fine-tune the output of the\n",
+    "`ax.contour()`  function, take a\n",
+    "look at the help file by typing `?plt.contour`.\n",
+    " \n",
+    "The `ax.imshow()`  method is similar to \n",
+    "`ax.contour()`, except that it produces a color-coded plot\n",
+    "whose colors depend on the `z` value. This is known as a\n",
+    "*heatmap*, and is sometimes used to plot temperature in\n",
+    "weather forecasts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1f89d704",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.imshow(f);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2500a6ec",
+   "metadata": {},
+   "source": [
+    "## Sequences and Slice Notation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07001b88",
+   "metadata": {},
+   "source": [
+    "As seen above, the\n",
+    "function `np.linspace()`  can be used to create a sequence\n",
+    "of numbers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cd971131",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "seq1 = np.linspace(0, 10, 11)\n",
+    "seq1\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "926f96fc",
+   "metadata": {},
+   "source": [
+    "The function `np.arange()`\n",
+    " returns a sequence of numbers spaced out by `step`. If `step` is not specified, then a default value of $1$ is used. Let's create a sequence\n",
+    " that starts at $0$ and ends at $10$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aa630d16",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "seq2 = np.arange(0, 10)\n",
+    "seq2\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6908bad7",
+   "metadata": {},
+   "source": [
+    "Why isn't $10$ output above? This has to do with *slice* notation in `Python`. \n",
+    "Slice notation  \n",
+    "is used to index sequences such as lists, tuples and arrays.\n",
+    "Suppose we want to retrieve the fourth through sixth (inclusive) entries\n",
+    "of a string. We obtain a slice of the string using the indexing  notation  `[3:6]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89955ee2",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "\"hello world\"[3:6]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17d73e4d",
+   "metadata": {},
+   "source": [
+    "In the code block above, the notation `3:6` is shorthand for  `slice(3,6)` when used inside\n",
+    "`[]`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "517f592d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "\"hello world\"[slice(3,6)]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "680fe656",
+   "metadata": {},
+   "source": [
+    "You might have expected  `slice(3,6)` to output the fourth through seventh characters in the text string (recalling that  `Python` begins its indexing at zero),  but instead it output  the fourth through sixth. \n",
+    " This also explains why the earlier `np.arange(0, 10)` command output only the integers from $0$ to $9$. \n",
+    "See the documentation `slice?` for useful options in creating slices. \n",
+    "\n",
+    "    \n",
+    "\n",
+    "\n",
+    "\n",
+    "    \n",
+    "\n",
+    "\n",
+    "    \n",
+    "\n",
+    " \n",
+    "\n",
+    "    \n",
+    "\n",
+    " \n",
+    "\n",
+    "    \n",
+    "\n",
+    "\n",
+    "    \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "522a2761",
+   "metadata": {},
+   "source": [
+    "## Indexing Data\n",
+    "To begin, we  create a two-dimensional `numpy` array."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "35927abd",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A = np.array(np.arange(16)).reshape((4, 4))\n",
+    "A\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27c88984",
+   "metadata": {},
+   "source": [
+    "Typing `A[1,2]` retrieves the element corresponding to the second row and third\n",
+    "column. (As usual, `Python` indexes from $0.$)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "78ee7f5b",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[1,2]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd65ec1c",
+   "metadata": {},
+   "source": [
+    "The first number after the open-bracket symbol `[`\n",
+    " refers to the row, and the second number refers to the column. \n",
+    "\n",
+    "### Indexing Rows, Columns, and Submatrices\n",
+    " To select multiple rows at a time, we can pass in a list\n",
+    "  specifying our selection. For instance, `[1,3]` will retrieve the second and fourth rows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "16212696",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[[1,3]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b8b3ce3",
+   "metadata": {},
+   "source": [
+    "To select the first and third columns, we pass in  `[0,2]` as the second argument in the square brackets.\n",
+    "In this case we need to supply the first argument `:` \n",
+    "which selects all rows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d5f473d2",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[:,[0,2]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "471ed1b4",
+   "metadata": {},
+   "source": [
+    "Now, suppose that we want to select the submatrix made up of the second and fourth \n",
+    "rows as well as the first and third columns. This is where\n",
+    "indexing gets slightly tricky. It is natural to try  to use lists to retrieve the rows and columns:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c89646d6",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[[1,3],[0,2]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cbf1ff9",
+   "metadata": {},
+   "source": [
+    " Oops --- what happened? We got a one-dimensional array of length two identical to"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "87f6b4f2",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "np.array([A[1,0],A[3,2]])\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a93dc96",
+   "metadata": {},
+   "source": [
+    " Similarly,  the following code fails to extract the submatrix comprised of the second and fourth rows and the first, third, and fourth columns:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5da5bda8",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[[1,3],[0,2,3]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4fd2f83",
+   "metadata": {},
+   "source": [
+    "We can see what has gone wrong here. When supplied with two indexing lists, the `numpy` interpretation is that these provide pairs of $i,j$ indices for a series of entries. That is why the pair of lists must have the same length. However, that was not our intent, since we are looking for a submatrix.\n",
+    "\n",
+    "One easy way to do this is as follows. We first create a submatrix by subsetting the rows of `A`, and then on the fly we make a further submatrix by subsetting its columns.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac48a95b",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "A[[1,3]][:,[0,2]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e8388aa",
+   "metadata": {},
+   "source": [
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a09467cd",
+   "metadata": {},
+   "source": [
+    "There are more efficient ways of achieving the same result.\n",
+    "\n",
+    "The *convenience function* `np.ix_()` allows us  to extract a submatrix\n",
+    "using lists, by creating an intermediate *mesh* object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee195cc4",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "idx = np.ix_([1,3],[0,2,3])\n",
+    "A[idx]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7177cb9",
+   "metadata": {},
+   "source": [
+    "Alternatively, we can subset matrices efficiently using slices.\n",
+    "  \n",
+    "The slice\n",
+    "`1:4:2` captures the second and fourth items of a sequence, while the slice `0:3:2` captures\n",
+    "the first and third items (the third element in a slice sequence is the step size)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "48917bb5",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "A[1:4:2,0:3:2]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "697c5ab0",
+   "metadata": {},
+   "source": [
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c647dbf0",
+   "metadata": {},
+   "source": [
+    "Why are we able to retrieve a submatrix directly using slices but not using lists?\n",
+    "Its because they are different `Python` types, and\n",
+    "are treated differently by `numpy`.\n",
+    "Slices can be used to extract objects from arbitrary sequences, such as strings, lists, and tuples, while the use of lists for indexing is more limited.\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "    \n",
+    "\n",
+    " \n",
+    "\n",
+    "    \n",
+    "\n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2dce8961",
+   "metadata": {},
+   "source": [
+    "### Boolean Indexing\n",
+    "In `numpy`, a *Boolean* is a type  that equals either   `True` or  `False` (also represented as $1$ and $0$, respectively).\n",
+    "The next line creates a vector of $0$'s, represented as Booleans, of length equal to the first dimension of `A`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5d4caf22",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "keep_rows = np.zeros(A.shape[0], bool)\n",
+    "keep_rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d83fadb5",
+   "metadata": {},
+   "source": [
+    "We now set two of the elements to `True`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "348820e3",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "keep_rows[[1,3]] = True\n",
+    "keep_rows\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0fb487d",
+   "metadata": {},
+   "source": [
+    "Note that the elements of `keep_rows`, when viewed as integers, are the same as the\n",
+    "values of `np.array([0,1,0,1])`. Below, we use  `==` to verify their equality. When\n",
+    "applied to two arrays, the `==`   operation is applied elementwise."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4aafe45b",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "np.all(keep_rows == np.array([0,1,0,1]))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "603c0c53",
+   "metadata": {},
+   "source": [
+    "(Here, the function `np.all()` has checked whether\n",
+    "all entries of an array are `True`. A similar function, `np.any()`, can be used to check whether any entries of an array are `True`.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0a449d1",
+   "metadata": {},
+   "source": [
+    "   However, even though `np.array([0,1,0,1])`  and `keep_rows` are equal according to `==`, they index different sets of rows!\n",
+    "The former retrieves the first, second, first, and second rows of `A`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1be6a588",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[np.array([0,1,0,1])]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e45bbebe",
+   "metadata": {},
+   "source": [
+    " By contrast, `keep_rows` retrieves only the second and fourth rows  of `A` --- i.e. the rows for which the Boolean equals `TRUE`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e83da57b",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "A[keep_rows]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "374d34a7",
+   "metadata": {},
+   "source": [
+    "This example shows that Booleans and integers are treated differently by `numpy`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25db74bf",
+   "metadata": {},
+   "source": [
+    "We again make use of the `np.ix_()` function\n",
+    " to create a mesh containing the second and fourth rows, and the first,  third, and fourth columns. This time, we apply the function to Booleans,\n",
+    " rather than lists."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "09675294",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "keep_cols = np.zeros(A.shape[1], bool)\n",
+    "keep_cols[[0, 2, 3]] = True\n",
+    "idx_bool = np.ix_(keep_rows, keep_cols)\n",
+    "A[idx_bool]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0166c179",
+   "metadata": {},
+   "source": [
+    "We can also mix a list with an array of Booleans in the arguments to `np.ix_()`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a85614e4",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "idx_mixed = np.ix_([1,3], keep_cols)\n",
+    "A[idx_mixed]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6a338f1",
+   "metadata": {},
+   "source": [
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3541e0c",
+   "metadata": {},
+   "source": [
+    "For more details on indexing in `numpy`, readers are referred\n",
+    "to the `numpy` tutorial mentioned earlier.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab75f168",
+   "metadata": {},
+   "source": [
+    "## Loading Data\n",
+    "\n",
+    "Data sets often contain different types of data, and may have names associated with the rows or columns. \n",
+    "For these reasons, they typically are best accommodated using a\n",
+    " *data frame*. \n",
+    " We can think of a data frame  as a sequence\n",
+    "of arrays of identical length; these are the columns. Entries in the\n",
+    "different arrays can be combined to form a row.\n",
+    " The `pandas`\n",
+    "library can be used to create and work with data frame objects."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca018d13",
+   "metadata": {},
+   "source": [
+    "### Reading in a Data Set\n",
+    "\n",
+    "The first step of most analyses involves importing a data set into\n",
+    "`Python`.  \n",
+    " Before attempting to load\n",
+    "a data set, we must make sure that `Python` knows where to find the file containing it. \n",
+    "If the\n",
+    "file is in the same location\n",
+    "as this notebook file, then we are all set. \n",
+    "Otherwise, \n",
+    "the command\n",
+    "`os.chdir()`  can be used to *change directory*. (You will need to call `import os` before calling `os.chdir()`.) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b76342df",
+   "metadata": {},
+   "source": [
+    "We will begin by reading in `Auto.csv`, available on the book website. This is a comma-separated file, and can be read in using `pd.read_csv()`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff81e644",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "Auto = pd.read_csv('Auto.csv')\n",
+    "Auto\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42d6a799",
+   "metadata": {},
+   "source": [
+    "The book website also has a whitespace-delimited version of this data, called `Auto.data`. This can be read in as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5b45aa7f",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto = pd.read_csv('Auto.data', delim_whitespace=True)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f942c457",
+   "metadata": {},
+   "source": [
+    " Both `Auto.csv` and `Auto.data` are simply text\n",
+    "files. Before loading data into `Python`, it is a good idea to view it using\n",
+    "a text editor or other software, such as Microsoft Excel.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1aceff38",
+   "metadata": {},
+   "source": [
+    "We now take a look at the column of `Auto` corresponding to the variable `horsepower`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "413f626a",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto['horsepower']\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd11e757",
+   "metadata": {},
+   "source": [
+    "We see that the `dtype` of this column is `object`. \n",
+    "It turns out that all values of the `horsepower` column were interpreted as strings when reading\n",
+    "in the data. \n",
+    "We can find out why by looking at the unique values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "57b86346",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "np.unique(Auto['horsepower'])\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0aee233",
+   "metadata": {},
+   "source": [
+    "We see the culprit is the value `?`, which is being used to encode missing values.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7b032d4",
+   "metadata": {},
+   "source": [
+    "To fix the problem, we must provide `pd.read_csv()` with an argument called `na_values`.\n",
+    "Now,  each instance of  `?` in the file is replaced with the\n",
+    "value `np.nan`, which means *not a number*:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a9698b26",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "Auto = pd.read_csv('Auto.data',\n",
+    "                   na_values=['?'],\n",
+    "                   delim_whitespace=True)\n",
+    "Auto['horsepower'].sum()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13cb364e",
+   "metadata": {},
+   "source": [
+    "The `Auto.shape`  attribute tells us that the data has 397\n",
+    "observations, or rows, and nine variables, or columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4877cb2c",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "Auto.shape\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fdc6f47",
+   "metadata": {},
+   "source": [
+    "There are\n",
+    "various ways to deal with  missing data. \n",
+    "In this case, since only five of the rows contain missing\n",
+    "observations,  we choose to use the `Auto.dropna()` method to simply remove these rows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ba1d33d",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "Auto_new = Auto.dropna()\n",
+    "Auto_new.shape\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac9748d9",
+   "metadata": {},
+   "source": [
+    "### Basics of Selecting Rows and Columns\n",
+    " \n",
+    "We can use `Auto.columns`  to check the variable names."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3d03baab",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "Auto = Auto_new # overwrite the previous value\n",
+    "Auto.columns\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d24d4d42",
+   "metadata": {},
+   "source": [
+    "Accessing the rows and columns of a data frame is similar, but not identical, to accessing the rows and columns of an array. \n",
+    "Recall that the first argument to the `[]` method\n",
+    "is always applied to the rows of the array.  \n",
+    "Similarly, \n",
+    "passing in a slice to the `[]` method creates a data frame whose *rows* are determined by the slice:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "410b4dd7",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto[:3]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ea0be7b",
+   "metadata": {},
+   "source": [
+    "Similarly, an array of Booleans can be used to subset the rows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3540804d",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "idx_80 = Auto['year'] > 80\n",
+    "Auto[idx_80]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a02221a2",
+   "metadata": {},
+   "source": [
+    "However, if we pass  in a list of strings to the `[]` method, then we obtain a data frame containing the corresponding set of *columns*. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66d174f1",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto[['mpg', 'horsepower']]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54bef6a3",
+   "metadata": {},
+   "source": [
+    "Since we did not specify an *index* column when we loaded our data frame, the rows are labeled using integers\n",
+    "0 to 396."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "52789c77",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto.index\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f5fcb26",
+   "metadata": {},
+   "source": [
+    "We can use the\n",
+    "`set_index()` method to re-name the rows using the contents of `Auto['name']`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d83650bf",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re = Auto.set_index('name')\n",
+    "Auto_re\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "880d79d9",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.columns\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbee53b8",
+   "metadata": {},
+   "source": [
+    "We see that the column `'name'` is no longer there.\n",
+    " \n",
+    "Now that the index has been set to `name`, we can  access rows of the data \n",
+    "frame by `name` using the `{loc[]`} method of\n",
+    "`Auto`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c01f4095",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "rows = ['amc rebel sst', 'ford torino']\n",
+    "Auto_re.loc[rows]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29688cab",
+   "metadata": {},
+   "source": [
+    "As an alternative to using the index name, we could retrieve the 4th and 5th rows of `Auto` using the `{iloc[]`} method:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a4202eb8",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.iloc[[3,4]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5427ede0",
+   "metadata": {},
+   "source": [
+    "We can also use it to retrieve the 1st, 3rd and and 4th columns of `Auto_re`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "948b2d07",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.iloc[:,[0,2,3]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b83d56eb",
+   "metadata": {},
+   "source": [
+    "We can extract the 4th and 5th rows, as well as the 1st, 3rd and 4th columns, using\n",
+    "a single call to `iloc[]`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1cfdcc5c",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.iloc[[3,4],[0,2,3]]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bde6514",
+   "metadata": {},
+   "source": [
+    "Index entries need not be unique: there are several cars  in the data frame named `ford galaxie 500`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd9c5cda",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.loc['ford galaxie 500', ['mpg', 'origin']]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d097282",
+   "metadata": {},
+   "source": [
+    "### More on Selecting Rows and Columns\n",
+    "Suppose now that we want to create a data frame consisting of the  `weight` and `origin`  of the subset of cars with \n",
+    "`year` greater than 80 --- i.e. those built after 1980.\n",
+    "To do this, we first create a Boolean array that indexes the rows.\n",
+    "The `loc[]` method allows for Boolean entries as well as strings:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d431cb5",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "idx_80 = Auto_re['year'] > 80\n",
+    "Auto_re.loc[idx_80, ['weight', 'origin']]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "838a03e0",
+   "metadata": {},
+   "source": [
+    "To do this more concisely, we can use an anonymous function called a `lambda`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fac41ce1",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08e61254",
+   "metadata": {},
+   "source": [
+    "The `lambda` call creates a function that takes a single\n",
+    "argument, here `df`, and returns `df['year']>80`.\n",
+    "Since it is created inside the `loc[]` method for the\n",
+    "dataframe `Auto_re`, that dataframe will be the argument supplied.\n",
+    "As another example of using a `lambda`, suppose that\n",
+    "we want all cars built after 1980 that achieve greater than 30 miles per gallon:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b0885654",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.loc[lambda df: (df['year'] > 80) & (df['mpg'] > 30),\n",
+    "            ['weight', 'origin']\n",
+    "           ]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d87fc459",
+   "metadata": {},
+   "source": [
+    "The symbol `&` computes an element-wise *and* operation.\n",
+    "As another example, suppose that we want to retrieve all `Ford` and `Datsun`\n",
+    "cars with `displacement` less than 300. We check whether each `name` entry contains either the string `ford` or `datsun` using the  `str.contains()` method of the `index` attribute of \n",
+    "of the dataframe:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "213945a6",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto_re.loc[lambda df: (df['displacement'] < 300)\n",
+    "                       & (df.index.str.contains('ford')\n",
+    "                       | df.index.str.contains('datsun')),\n",
+    "            ['weight', 'origin']\n",
+    "           ]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a940fd1",
+   "metadata": {},
+   "source": [
+    "Here, the symbol `|` computes an element-wise *or* operation.\n",
+    " \n",
+    "In summary, a powerful set of operations is available to index the rows and columns of data frames. For integer based queries, use the `iloc[]` method. For string and Boolean\n",
+    "selections, use the `loc[]` method. For functional queries that filter rows, use the `loc[]` method\n",
+    "with a function (typically a `lambda`) in the rows argument.\n",
+    "\n",
+    "## For Loops\n",
+    "A `for` loop is a standard tool in many languages that\n",
+    "repeatedly evaluates some chunk of code while\n",
+    "varying different values inside the code.\n",
+    "For example, suppose we loop over elements of a list and compute their sum."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a3c4060a",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "total = 0\n",
+    "for value in [3,2,19]:\n",
+    "    total += value\n",
+    "print('Total is: {0}'.format(total))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9117e3a1",
+   "metadata": {},
+   "source": [
+    "The indented code beneath the line with the `for` statement is run\n",
+    "for each value in the sequence\n",
+    "specified in the `for` statement. The loop ends either\n",
+    "when the cell ends or when code is indented at the same level\n",
+    "as the original `for` statement.\n",
+    "We see that the final line above which prints the total is executed\n",
+    "only once after the for loop has terminated. Loops\n",
+    "can be nested by additional indentation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f2bffb69",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "total = 0\n",
+    "for value in [2,3,19]:\n",
+    "    for weight in [3, 2, 1]:\n",
+    "        total += value * weight\n",
+    "print('Total is: {0}'.format(total))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f99e85b",
+   "metadata": {},
+   "source": [
+    "Above, we summed over each combination of `value` and `weight`.\n",
+    "We also took advantage of the *increment* notation\n",
+    "in `Python`: the expression `a += b` is equivalent\n",
+    "to `a = a + b`. Besides\n",
+    "being a convenient notation, this can save time in computationally\n",
+    "heavy tasks in which the intermediate value of `a+b` need not\n",
+    "be explicitly created.\n",
+    "\n",
+    "Perhaps a more\n",
+    "common task would be to sum over `(value, weight)` pairs. For instance,\n",
+    "to compute the average value of a random variable that takes on\n",
+    "possible values 2, 3 or 19 with probability 0.2, 0.3, 0.5 respectively\n",
+    "we would compute the weighted sum. Tasks such as this\n",
+    "can often be accomplished using the `zip()`  function that\n",
+    "loops over a sequence of tuples."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee827a53",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "total = 0\n",
+    "for value, weight in zip([2,3,19],\n",
+    "                         [0.2,0.3,0.5]):\n",
+    "    total += weight * value\n",
+    "print('Weighted average is: {0}'.format(total))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dec18466",
+   "metadata": {},
+   "source": [
+    "### String Formatting\n",
+    "In the code chunk above we also printed a string\n",
+    "displaying the total. However, the object `total`\n",
+    "is an  integer and not a string.\n",
+    "Inserting the value of something into\n",
+    "a string is a common task, made\n",
+    "simple using\n",
+    "some of the powerful string formatting\n",
+    "tools in `Python`.\n",
+    "Many data cleaning tasks involve\n",
+    "manipulating and programmatically\n",
+    "producing strings.\n",
+    "\n",
+    "For example we may want to loop over the columns of a data frame and\n",
+    "print the percent missing in each column.\n",
+    "Let’s create a data frame `D` with columns in which 20% of the entries are missing i.e. set\n",
+    "to `np.nan`.  We’ll create the\n",
+    "values in `D` from a normal distribution with mean 0 and variance 1 using `rng.standard_normal()`\n",
+    "and then overwrite some random entries using `rng.choice()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3a097fbc",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 2
+   },
+   "outputs": [],
+   "source": [
+    "rng = np.random.default_rng(1)\n",
+    "A = rng.standard_normal((127, 5))\n",
+    "M = rng.choice([0, np.nan], p=[0.8,0.2], size=A.shape)\n",
+    "A += M\n",
+    "D = pd.DataFrame(A, columns=['food',\n",
+    "                             'bar',\n",
+    "                             'pickle',\n",
+    "                             'snack',\n",
+    "                             'popcorn'])\n",
+    "D[:3]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e064e170",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "for col in D.columns:\n",
+    "    template = 'Column \"{0}\" has {1:.2%} missing values'\n",
+    "    print(template.format(col,\n",
+    "          np.isnan(D[col]).mean()))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a3e4dd8",
+   "metadata": {},
+   "source": [
+    "We see that the `template.format()` method expects two arguments `{0}`\n",
+    "and `{1:.2%}`, and the latter includes some formatting\n",
+    "information. In particular, it specifies that the second argument should be expressed as a percent with two decimal digits.\n",
+    "\n",
+    "The reference\n",
+    "[docs.python.org/3/library/string.html](https://docs.python.org/3/library/string.html)\n",
+    "includes many helpful and more complex examples."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8fd496a",
+   "metadata": {},
+   "source": [
+    "## Additional Graphical and Numerical Summaries\n",
+    "We can use the `ax.plot()` or  `ax.scatter()`  functions to display the quantitative variables. However, simply typing the variable names will produce an error message,\n",
+    "because `Python` does not know to look in the  `Auto`  data set for those variables."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c915ca52",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.plot(horsepower, mpg, 'o');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63d47021",
+   "metadata": {},
+   "source": [
+    "We can address this by accessing the columns directly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65cd6d02",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "ax.plot(Auto['horsepower'], Auto['mpg'], 'o');\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "726836f0",
+   "metadata": {},
+   "source": [
+    "Alternatively, we can use the `plot()` method with the call `Auto.plot()`.\n",
+    "Using this method,\n",
+    "the variables  can be accessed by name.\n",
+    "The plot methods of a data frame return a familiar object:\n",
+    "an axes. We can use it to update the plot as we did previously: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "76b5c0b1",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "ax = Auto.plot.scatter('horsepower', 'mpg')\n",
+    "ax.set_title('Horsepower vs. MPG');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69c46251",
+   "metadata": {},
+   "source": [
+    "If we want to save\n",
+    "the figure that contains a given axes, we can find the relevant figure\n",
+    "by accessing the `figure` attribute:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "183a2c2b",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "fig = ax.figure\n",
+    "fig.savefig('horsepower_mpg.png');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f10cb46",
+   "metadata": {},
+   "source": [
+    "We can further instruct the data frame to plot to a particular axes object. In this\n",
+    "case the corresponding `plot()` method will return the\n",
+    "modified axes we passed in as an argument. Note that\n",
+    "when we request a one-dimensional grid of plots, the object `axes` is similarly\n",
+    "one-dimensional. We place our scatter plot in the middle plot of a row of three plots\n",
+    "within a figure."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "75fbb981",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "fig, axes = subplots(ncols=3, figsize=(15, 5))\n",
+    "Auto.plot.scatter('horsepower', 'mpg', ax=axes[1]);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53ffc0da",
+   "metadata": {},
+   "source": [
+    "Note also that the columns of a data frame can be accessed as attributes: try typing in `Auto.horsepower`. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c4705e0",
+   "metadata": {},
+   "source": [
+    "We now consider the `cylinders` variable. Typing in `Auto.cylinders.dtype` reveals that it is being treated as a quantitative variable. \n",
+    "However, since there is only a small number of possible values for this variable, we may wish to treat it as \n",
+    " qualitative.  Below, we replace\n",
+    "the `cylinders` column with a categorical version of `Auto.cylinders`. The function `pd.Series()`  owes its name to the fact that `pandas` is often used in time series applications."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55b3a1cc",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto.cylinders = pd.Series(Auto.cylinders, dtype='category')\n",
+    "Auto.cylinders.dtype\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adc75408",
+   "metadata": {},
+   "source": [
+    " Now that `cylinders` is qualitative, we can display it using\n",
+    " the `boxplot()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f3d88794",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "Auto.boxplot('mpg', by='cylinders', ax=ax);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62d6582f",
+   "metadata": {},
+   "source": [
+    "The `hist()`  method can be used to plot a *histogram*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eea49f5b",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "Auto.hist('mpg', ax=ax);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5a5933c",
+   "metadata": {},
+   "source": [
+    "The color of the bars and the number of bins can be changed:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d5bcfff8",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "fig, ax = subplots(figsize=(8, 8))\n",
+    "Auto.hist('mpg', color='red', bins=12, ax=ax);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60c36b6c",
+   "metadata": {},
+   "source": [
+    " See `Auto.hist?` for more plotting\n",
+    "options.\n",
+    " \n",
+    "We can use the `pd.plotting.scatter_matrix()`   function to create a *scatterplot matrix* to visualize all of the pairwise relationships between the columns in\n",
+    "a data frame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "edb66cae",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "pd.plotting.scatter_matrix(Auto);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b162bd9",
+   "metadata": {},
+   "source": [
+    " We can also produce scatterplots\n",
+    "for a subset of the variables."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4f5d25d9",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "pd.plotting.scatter_matrix(Auto[['mpg',\n",
+    "                                 'displacement',\n",
+    "                                 'weight']]);\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cae5dfc",
+   "metadata": {},
+   "source": [
+    "The `describe()`  method produces a numerical summary of each column in a data frame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce7b23e2",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto[['mpg', 'weight']].describe()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5042294",
+   "metadata": {},
+   "source": [
+    "We can also produce a summary of just a single column."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a6545d2f",
+   "metadata": {
+    "execution": {},
+    "lines_to_next_cell": 0
+   },
+   "outputs": [],
+   "source": [
+    "Auto['cylinders'].describe()\n",
+    "Auto['mpg'].describe()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2ea7f81",
+   "metadata": {},
+   "source": [
+    "To exit `Jupyter`,  select `File / Close and Halt`.\n",
+    "\n",
+    " \n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "formats": "Rmd,ipynb",
+   "main_language": "python"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

Reference files/Week2_ref/Lecture_1_basics.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

app/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

app/__pycache__/main.cpython-311.pyc CHANGED Viewed

Binary files a/app/__pycache__/main.cpython-311.pyc and b/app/__pycache__/main.cpython-311.pyc differ

app/components/__pycache__/login.cpython-311.pyc CHANGED Viewed

Binary files a/app/components/__pycache__/login.cpython-311.pyc and b/app/components/__pycache__/login.cpython-311.pyc differ

app/components/login.py CHANGED Viewed

@@ -5,7 +5,11 @@ def login():
     Display a login form and return True if login is successful, False otherwise.
     """
     st.title("Login to Data Science Course App")
     # Create a form for login
     with st.form("login_form"):
         username = st.text_input("Username")
@@ -14,7 +18,7 @@ def login():
         if submit_button:
             # Check credentials (test account)
-            if username == "student" and password == "123":
                 # Store login state in session
                 st.session_state.logged_in = True
                 st.session_state.username = username

     Display a login form and return True if login is successful, False otherwise.
     """
     st.title("Login to Data Science Course App")
+    #usernames
+    usernames = ["admin", "student", "manxiii"]
+    passwords = ["admin", "123", "manxi123"]
     # Create a form for login
     with st.form("login_form"):
         username = st.text_input("Username")
         if submit_button:
             # Check credentials (test account)
+            if username in usernames and password in passwords:
                 # Store login state in session
                 st.session_state.logged_in = True
                 st.session_state.username = username

app/main.py CHANGED Viewed

@@ -12,6 +12,10 @@ sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 # Import the login component
 from app.components.login import login
 # Page configuration
 st.set_page_config(
     page_title="Data Science Course App",
@@ -101,6 +105,11 @@ def sidebar_navigation():
         if st.session_state.logged_in:
             st.write(f"Welcome, {st.session_state.username}!")
             # Logout button
             if st.button("Logout"):
                 st.session_state.logged_in = False
@@ -120,156 +129,15 @@ def sidebar_navigation():
                 st.rerun()
 def show_week_content():
-    st.markdown("""
-    ## Week 1: Research Topic Selection and Literature Review
-    This week, you'll learn how to:
-    - Select a suitable research topic
-    - Conduct a literature review
-    - Define your research objectives
-    - Create a research proposal
-    """)
-    # Topic Selection Section
-    st.header("1. Topic Selection")
-    st.markdown("""
-    ### Guidelines for Selecting Your Research Topic:
-    - Choose a topic that interests you
-    - Ensure sufficient data availability
-    - Consider the scope and complexity
-    - Check for existing research gaps
-    """)
-    # Interactive Topic Selection
-    st.subheader("Topic Selection Form")
-    with st.form("topic_form"):
-        research_area = st.selectbox(
-            "Select your research area",
-            ["Computer Vision", "NLP", "Time Series", "Recommendation Systems", "Other"]
-        )
-        topic = st.text_input("Proposed Research Topic")
-        problem_statement = st.text_area("Brief Problem Statement")
-        motivation = st.text_area("Why is this research important?")
-        submitted = st.form_submit_button("Submit Topic")
-        if submitted:
-            st.success("Topic submitted successfully! We'll review and provide feedback.")
-    # Linear Regression Visualization
-    st.header("2. Linear Regression Demo")
-    st.markdown("""
-    ### Understanding Linear Regression
-    Linear regression is a fundamental machine learning algorithm that models the relationship between a dependent variable and one or more independent variables.
-    Below is an interactive demonstration of simple linear regression.
-    """)
-    # Create interactive controls
-    col1, col2 = st.columns(2)
-    with col1:
-        n_points = st.slider("Number of data points", 10, 100, 50)
-        noise = st.slider("Noise level", 0.1, 2.0, 0.5)
-    with col2:
-        slope = st.slider("True slope", -2.0, 2.0, 1.0)
-        intercept = st.slider("True intercept", -5.0, 5.0, 0.0)
-    # Generate synthetic data
-    np.random.seed(42)
-    X = np.random.rand(n_points) * 10
-    y = slope * X + intercept + np.random.normal(0, noise, n_points)
-    # Fit linear regression
-    X_reshaped = X.reshape(-1, 1)
-    model = LinearRegression()
-    model.fit(X_reshaped, y)
-    y_pred = model.predict(X_reshaped)
-    # Create the plot
-    fig = go.Figure()
-    # Add scatter plot of actual data
-    fig.add_trace(go.Scatter(
-        x=X,
-        y=y,
-        mode='markers',
-        name='Actual Data',
-        marker=dict(color='blue')
-    ))
-    # Add regression line
-    fig.add_trace(go.Scatter(
-        x=X,
-        y=y_pred,
-        mode='lines',
-        name='Regression Line',
-        line=dict(color='red')
-    ))
-    # Update layout
-    fig.update_layout(
-        title='Linear Regression Visualization',
-        xaxis_title='X',
-        yaxis_title='Y',
-        showlegend=True,
-        height=500
-    )
-    # Display the plot
-    st.plotly_chart(fig, use_container_width=True)
-    # Display regression coefficients
-    st.markdown(f"""
-    ### Regression Results
-    - Estimated slope: {model.coef_[0]:.2f}
-    - Estimated intercept: {model.intercept_:.2f}
-    - R² score: {model.score(X_reshaped, y):.2f}
-    """)
-    # Literature Review Section
-    st.header("3. Literature Review")
-    st.markdown("""
-    ### Steps for Conducting Literature Review:
-    1. Search for relevant papers
-    2. Read and analyze key papers
-    3. Identify research gaps
-    4. Document your findings
-    """)
-    # Literature Review Template
-    st.subheader("Literature Review Template")
-    with st.expander("Download Template"):
-        st.download_button(
-            label="Download Literature Review Template",
-            data="Literature Review Template\n\n1. Introduction\n2. Related Work\n3. Methodology\n4. Results\n5. Discussion\n6. Conclusion",
-            file_name="literature_review_template.txt",
-            mime="text/plain"
-        )
-    # Weekly Assignment
-    st.header("Weekly Assignment")
-    st.markdown("""
-    ### Assignment 1: Research Proposal
-    1. Select your research topic
-    2. Write a brief problem statement
-    3. Conduct initial literature review
-    4. Submit your research proposal
-    **Due Date:** End of Week 1
-    """)
-    # Assignment Submission
-    st.subheader("Submit Your Assignment")
-    with st.form("assignment_form"):
-        proposal_file = st.file_uploader("Upload your research proposal (PDF or DOC)")
-        comments = st.text_area("Additional comments or questions")
-        if st.form_submit_button("Submit Assignment"):
-            if proposal_file is not None:
-                st.success("Assignment submitted successfully!")
-            else:
-                st.error("Please upload your research proposal.")
 # Main content
 def main():
@@ -280,33 +148,14 @@ def main():
         return
     # User is logged in, show course content
-    if st.session_state.current_week == 1:
         show_week_content()
     else:
         st.title("Data Science Research Paper Course")
         st.markdown("""
         ## Welcome to the Data Science Research Paper Course! 📚
-        This 10-week course will guide you through the process of creating a machine learning research paper.
-        Each week, you'll learn new concepts and complete tasks that build upon each other.
-        ### Getting Started
-        1. Use the sidebar to navigate between weeks
-        2. Complete the weekly tasks and assignments
-        3. Track your progress using the progress bar
-        4. Submit your work for feedback
-        ### Course Overview
-        - Week 1: Research Topic Selection and Literature Review
-        - Week 2: Data Collection and Preprocessing
-        - Week 3: Exploratory Data Analysis
-        - Week 4: Feature Engineering
-        - Week 5: Model Selection and Baseline
-        - Week 6: Model Training and Optimization
-        - Week 7: Model Evaluation
-        - Week 8: Results Analysis
-        - Week 9: Paper Writing
-        - Week 10: Final Review and Submission
         """)
 if __name__ == "__main__":

 # Import the login component
 from app.components.login import login
+# Import week pages
+from app.pages import week_1
+from app.pages import week_2
 # Page configuration
 st.set_page_config(
     page_title="Data Science Course App",
         if st.session_state.logged_in:
             st.write(f"Welcome, {st.session_state.username}!")
+            # Debug button to show current week
+            if st.session_state.username == "admin":
+                if st.button("Debug: Show Current Week"):
+                    st.write(f"Current week: {st.session_state.current_week}")
             # Logout button
             if st.button("Logout"):
                 st.session_state.logged_in = False
                 st.rerun()
 def show_week_content():
+    # Debug print to show current week
+    st.write(f"Debug: Current week is {st.session_state.current_week}")
+    if st.session_state.current_week == 1:
+        week_1.show()
+    elif st.session_state.current_week == 2:
+        week_2.show()
+    else:
+        st.warning("Content for this week is not yet available.")
 # Main content
 def main():
         return
     # User is logged in, show course content
+    if st.session_state.current_week in [1, 2]:
         show_week_content()
     else:
         st.title("Data Science Research Paper Course")
         st.markdown("""
         ## Welcome to the Data Science Research Paper Course! 📚
+        This section has not bee released yet.
         """)
 if __name__ == "__main__":

app/pages/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

app/pages/1_Week_1.py DELETED Viewed

@@ -1,168 +0,0 @@
-import streamlit as st
-import numpy as np
-import plotly.graph_objects as go
-from sklearn.linear_model import LinearRegression
-# Page configuration
-st.set_page_config(
-    page_title="Week 1 - Research Topic Selection",
-    page_icon="📚",
-    layout="wide"
-)
-# Check if user is logged in
-if not st.session_state.get("logged_in", False):
-    st.warning("Please log in to access this page.")
-    st.stop()
-# Main content
-st.markdown("""
-## Week 1: Research Topic Selection and Literature Review
-This week, you'll learn how to:
-- Select a suitable research topic
-- Conduct a literature review
-- Define your research objectives
-- Create a research proposal
-""")
-# Topic Selection Section
-st.header("1. Topic Selection")
-st.markdown("""
-### Guidelines for Selecting Your Research Topic:
-- Choose a topic that interests you
-- Ensure sufficient data availability
-- Consider the scope and complexity
-- Check for existing research gaps
-""")
-# Interactive Topic Selection
-st.subheader("Topic Selection Form")
-with st.form("topic_form"):
-    research_area = st.selectbox(
-        "Select your research area",
-        ["Computer Vision", "NLP", "Time Series", "Recommendation Systems", "Other"]
-    )
-    topic = st.text_input("Proposed Research Topic")
-    problem_statement = st.text_area("Brief Problem Statement")
-    motivation = st.text_area("Why is this research important?")
-    submitted = st.form_submit_button("Submit Topic")
-    if submitted:
-        st.success("Topic submitted successfully! We'll review and provide feedback.")
-# Linear Regression Visualization
-st.header("2. Linear Regression Demo")
-st.markdown("""
-### Understanding Linear Regression
-Linear regression is a fundamental machine learning algorithm that models the relationship between a dependent variable and one or more independent variables.
-Below is an interactive demonstration of simple linear regression.
-""")
-# Create interactive controls
-col1, col2 = st.columns(2)
-with col1:
-    n_points = st.slider("Number of data points", 10, 100, 50)
-    noise = st.slider("Noise level", 0.1, 2.0, 0.5)
-with col2:
-    slope = st.slider("True slope", -2.0, 2.0, 1.0)
-    intercept = st.slider("True intercept", -5.0, 5.0, 0.0)
-# Generate synthetic data
-np.random.seed(42)
-X = np.random.rand(n_points) * 10
-y = slope * X + intercept + np.random.normal(0, noise, n_points)
-# Fit linear regression
-X_reshaped = X.reshape(-1, 1)
-model = LinearRegression()
-model.fit(X_reshaped, y)
-y_pred = model.predict(X_reshaped)
-# Create the plot
-fig = go.Figure()
-# Add scatter plot of actual data
-fig.add_trace(go.Scatter(
-    x=X,
-    y=y,
-    mode='markers',
-    name='Actual Data',
-    marker=dict(color='blue')
-))
-# Add regression line
-fig.add_trace(go.Scatter(
-    x=X,
-    y=y_pred,
-    mode='lines',
-    name='Regression Line',
-    line=dict(color='red')
-))
-# Update layout
-fig.update_layout(
-    title='Linear Regression Visualization',
-    xaxis_title='X',
-    yaxis_title='Y',
-    showlegend=True,
-    height=500
-)
-# Display the plot
-st.plotly_chart(fig, use_container_width=True)
-# Display regression coefficients
-st.markdown(f"""
-### Regression Results
-- Estimated slope: {model.coef_[0]:.2f}
-- Estimated intercept: {model.intercept_:.2f}
-- R² score: {model.score(X_reshaped, y):.2f}
-""")
-# Literature Review Section
-st.header("3. Literature Review")
-st.markdown("""
-### Steps for Conducting Literature Review:
-1. Search for relevant papers
-2. Read and analyze key papers
-3. Identify research gaps
-4. Document your findings
-""")
-# Literature Review Template
-st.subheader("Literature Review Template")
-with st.expander("Download Template"):
-    st.download_button(
-        label="Download Literature Review Template",
-        data="Literature Review Template\n\n1. Introduction\n2. Related Work\n3. Methodology\n4. Results\n5. Discussion\n6. Conclusion",
-        file_name="literature_review_template.txt",
-        mime="text/plain"
-    )
-# Weekly Assignment
-st.header("Weekly Assignment")
-st.markdown("""
-### Assignment 1: Research Proposal
-1. Select your research topic
-2. Write a brief problem statement
-3. Conduct initial literature review
-4. Submit your research proposal
-**Due Date:** End of Week 1
-""")
-# Assignment Submission
-st.subheader("Submit Your Assignment")
-with st.form("assignment_form"):
-    proposal_file = st.file_uploader("Upload your research proposal (PDF or DOC)")
-    comments = st.text_area("Additional comments or questions")
-    if st.form_submit_button("Submit Assignment"):
-        if proposal_file is not None:
-            st.success("Assignment submitted successfully!")
-        else:
-            st.error("Please upload your research proposal.")

app/pages/__pycache__/week_1.cpython-311.pyc ADDED Viewed

Binary file (891 Bytes). View file

app/pages/__pycache__/week_2.cpython-311.pyc ADDED Viewed

Binary file (10.5 kB). View file

app/pages/week_1.py CHANGED Viewed

@@ -3,157 +3,16 @@ import numpy as np
 import plotly.graph_objects as go
 from sklearn.linear_model import LinearRegression
-def show_week_content():
     st.markdown("""
-    ## Week 1: Research Topic Selection and Literature Review
-    This week, you'll learn how to:
-    - Select a suitable research topic
-    - Conduct a literature review
-    - Define your research objectives
-    - Create a research proposal
     """)
-    # Topic Selection Section
-    st.header("1. Topic Selection")
-    st.markdown("""
-    ### Guidelines for Selecting Your Research Topic:
-    - Choose a topic that interests you
-    - Ensure sufficient data availability
-    - Consider the scope and complexity
-    - Check for existing research gaps
-    """)
-    # Interactive Topic Selection
-    st.subheader("Topic Selection Form")
-    with st.form("topic_form"):
-        research_area = st.selectbox(
-            "Select your research area",
-            ["Computer Vision", "NLP", "Time Series", "Recommendation Systems", "Other"]
-        )
-        topic = st.text_input("Proposed Research Topic")
-        problem_statement = st.text_area("Brief Problem Statement")
-        motivation = st.text_area("Why is this research important?")
-        submitted = st.form_submit_button("Submit Topic")
-        if submitted:
-            st.success("Topic submitted successfully! We'll review and provide feedback.")
-    # Linear Regression Visualization
-    st.header("2. Linear Regression Demo")
-    st.markdown("""
-    ### Understanding Linear Regression
-    Linear regression is a fundamental machine learning algorithm that models the relationship between a dependent variable and one or more independent variables.
-    Below is an interactive demonstration of simple linear regression.
-    """)
-    # Create interactive controls
-    col1, col2 = st.columns(2)
-    with col1:
-        n_points = st.slider("Number of data points", 10, 100, 50)
-        noise = st.slider("Noise level", 0.1, 2.0, 0.5)
-    with col2:
-        slope = st.slider("True slope", -2.0, 2.0, 1.0)
-        intercept = st.slider("True intercept", -5.0, 5.0, 0.0)
-    # Generate synthetic data
-    np.random.seed(42)
-    X = np.random.rand(n_points) * 10
-    y = slope * X + intercept + np.random.normal(0, noise, n_points)
-    # Fit linear regression
-    X_reshaped = X.reshape(-1, 1)
-    model = LinearRegression()
-    model.fit(X_reshaped, y)
-    y_pred = model.predict(X_reshaped)
-    # Create the plot
-    fig = go.Figure()
-    # Add scatter plot of actual data
-    fig.add_trace(go.Scatter(
-        x=X,
-        y=y,
-        mode='markers',
-        name='Actual Data',
-        marker=dict(color='blue')
-    ))
-    # Add regression line
-    fig.add_trace(go.Scatter(
-        x=X,
-        y=y_pred,
-        mode='lines',
-        name='Regression Line',
-        line=dict(color='red')
-    ))
-    # Update layout
-    fig.update_layout(
-        title='Linear Regression Visualization',
-        xaxis_title='X',
-        yaxis_title='Y',
-        showlegend=True,
-        height=500
-    )
-    # Display the plot
-    st.plotly_chart(fig, use_container_width=True)
-    # Display regression coefficients
-    st.markdown(f"""
-    ### Regression Results
-    - Estimated slope: {model.coef_[0]:.2f}
-    - Estimated intercept: {model.intercept_:.2f}
-    - R² score: {model.score(X_reshaped, y):.2f}
-    """)
-    # Literature Review Section
-    st.header("3. Literature Review")
-    st.markdown("""
-    ### Steps for Conducting Literature Review:
-    1. Search for relevant papers
-    2. Read and analyze key papers
-    3. Identify research gaps
-    4. Document your findings
-    """)
-    # Literature Review Template
-    st.subheader("Literature Review Template")
-    with st.expander("Download Template"):
-        st.download_button(
-            label="Download Literature Review Template",
-            data="Literature Review Template\n\n1. Introduction\n2. Related Work\n3. Methodology\n4. Results\n5. Discussion\n6. Conclusion",
-            file_name="literature_review_template.txt",
-            mime="text/plain"
-        )
-    # Weekly Assignment
-    st.header("Weekly Assignment")
     st.markdown("""
-    ### Assignment 1: Research Proposal
-    1. Select your research topic
-    2. Write a brief problem statement
-    3. Conduct initial literature review
-    4. Submit your research proposal
-    **Due Date:** End of Week 1
     """)
-    # Assignment Submission
-    st.subheader("Submit Your Assignment")
-    with st.form("assignment_form"):
-        proposal_file = st.file_uploader("Upload your research proposal (PDF or DOC)")
-        comments = st.text_area("Additional comments or questions")
-        if st.form_submit_button("Submit Assignment"):
-            if proposal_file is not None:
-                st.success("Assignment submitted successfully!")
-            else:
-                st.error("Please upload your research proposal.")
 if __name__ == "__main__":
-    show_week_content()

 import plotly.graph_objects as go
 from sklearn.linear_model import LinearRegression
+# Week 1 content in person
+def show():
     st.markdown("""
+    ## Week 1 content in person
     """)
+# Week 1 content online
+def show():
     st.markdown("""
+    ## Week 1 content not online yet
     """)
 if __name__ == "__main__":
+    show()

app/pages/week_1_WIP.py ADDED Viewed

	@@ -0,0 +1,159 @@

+import streamlit as st
+import numpy as np
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+def show():
+    st.markdown("""
+    ## Week 1: Research Topic Selection and Literature Review
+    This week, you'll learn how to:
+    - Select a suitable research topic
+    - Conduct a literature review
+    - Define your research objectives
+    - Create a research proposal
+    """)
+    # Topic Selection Section
+    st.header("1. Topic Selection")
+    st.markdown("""
+    ### Guidelines for Selecting Your Research Topic:
+    - Choose a topic that interests you
+    - Ensure sufficient data availability
+    - Consider the scope and complexity
+    - Check for existing research gaps
+    """)
+    # Interactive Topic Selection
+    st.subheader("Topic Selection Form")
+    with st.form("topic_form"):
+        research_area = st.selectbox(
+            "Select your research area",
+            ["Computer Vision", "NLP", "Time Series", "Recommendation Systems", "Other"]
+        )
+        topic = st.text_input("Proposed Research Topic")
+        problem_statement = st.text_area("Brief Problem Statement")
+        motivation = st.text_area("Why is this research important?")
+        submitted = st.form_submit_button("Submit Topic")
+        if submitted:
+            st.success("Topic submitted successfully! We'll review and provide feedback.")
+    # Linear Regression Visualization
+    st.header("2. Linear Regression Demo")
+    st.markdown("""
+    ### Understanding Linear Regression
+    Linear regression is a fundamental machine learning algorithm that models the relationship between a dependent variable and one or more independent variables.
+    Below is an interactive demonstration of simple linear regression.
+    """)
+    # Create interactive controls
+    col1, col2 = st.columns(2)
+    with col1:
+        n_points = st.slider("Number of data points", 10, 100, 50)
+        noise = st.slider("Noise level", 0.1, 2.0, 0.5)
+    with col2:
+        slope = st.slider("True slope", -2.0, 2.0, 1.0)
+        intercept = st.slider("True intercept", -5.0, 5.0, 0.0)
+    # Generate synthetic data
+    np.random.seed(42)
+    X = np.random.rand(n_points) * 10
+    y = slope * X + intercept + np.random.normal(0, noise, n_points)
+    # Fit linear regression
+    X_reshaped = X.reshape(-1, 1)
+    model = LinearRegression()
+    model.fit(X_reshaped, y)
+    y_pred = model.predict(X_reshaped)
+    # Create the plot
+    fig = go.Figure()
+    # Add scatter plot of actual data
+    fig.add_trace(go.Scatter(
+        x=X,
+        y=y,
+        mode='markers',
+        name='Actual Data',
+        marker=dict(color='blue')
+    ))
+    # Add regression line
+    fig.add_trace(go.Scatter(
+        x=X,
+        y=y_pred,
+        mode='lines',
+        name='Regression Line',
+        line=dict(color='red')
+    ))
+    # Update layout
+    fig.update_layout(
+        title='Linear Regression Visualization',
+        xaxis_title='X',
+        yaxis_title='Y',
+        showlegend=True,
+        height=500
+    )
+    # Display the plot
+    st.plotly_chart(fig, use_container_width=True)
+    # Display regression coefficients
+    st.markdown(f"""
+    ### Regression Results
+    - Estimated slope: {model.coef_[0]:.2f}
+    - Estimated intercept: {model.intercept_:.2f}
+    - R² score: {model.score(X_reshaped, y):.2f}
+    """)
+    # Literature Review Section
+    st.header("3. Literature Review")
+    st.markdown("""
+    ### Steps for Conducting Literature Review:
+    1. Search for relevant papers
+    2. Read and analyze key papers
+    3. Identify research gaps
+    4. Document your findings
+    """)
+    # Literature Review Template
+    st.subheader("Literature Review Template")
+    with st.expander("Download Template"):
+        st.download_button(
+            label="Download Literature Review Template",
+            data="Literature Review Template\n\n1. Introduction\n2. Related Work\n3. Methodology\n4. Results\n5. Discussion\n6. Conclusion",
+            file_name="literature_review_template.txt",
+            mime="text/plain"
+        )
+    # Weekly Assignment
+    st.header("Weekly Assignment")
+    st.markdown("""
+    ### Assignment 1: Research Proposal
+    1. Select your research topic
+    2. Write a brief problem statement
+    3. Conduct initial literature review
+    4. Submit your research proposal
+    **Due Date:** End of Week 1
+    """)
+    # Assignment Submission
+    st.subheader("Submit Your Assignment")
+    with st.form("assignment_form"):
+        proposal_file = st.file_uploader("Upload your research proposal (PDF or DOC)")
+        comments = st.text_area("Additional comments or questions")
+        if st.form_submit_button("Submit Assignment"):
+            if proposal_file is not None:
+                st.success("Assignment submitted successfully!")
+            else:
+                st.error("Please upload your research proposal.")
+if __name__ == "__main__":
+    show()

app/pages/week_2.py ADDED Viewed

	@@ -0,0 +1,228 @@

+import streamlit as st
+import numpy as np
+import plotly.graph_objects as go
+import io
+import sys
+import pandas as pd
+from contextlib import redirect_stdout
+import matplotlib.pyplot as plt
+import seaborn as sns
+# Initialize session state for notebook-like cells
+if 'cells' not in st.session_state:
+    st.session_state.cells = []
+if 'df' not in st.session_state:
+    st.session_state.df = None
+def capture_output(code, df=None):
+    """Helper function to capture print output"""
+    f = io.StringIO()
+    with redirect_stdout(f):
+        try:
+            # Create a dictionary of variables to use in exec
+            variables = {'pd': pd, 'np': np, 'plt': plt, 'sns': sns}
+            if df is not None:
+                variables['df'] = df
+            exec(code, variables)
+        except Exception as e:
+            return f"Error: {str(e)}"
+    return f.getvalue()
+def show():
+    st.markdown("""
+    ## Week 2: Python Basics - Part 1: Coding Exercises
+    In this first part, we'll learn some fundamental Python concepts through hands-on exercises:
+    - Importing libraries
+    - Using print statements
+    - Basic arithmetic operations
+    - Working with lists
+    """)
+    # Importing Libraries Section
+    st.header("1. Importing Libraries")
+    st.markdown("""
+    Python has a rich ecosystem of libraries. To use them, we need to import them first.
+    """)
+    with st.expander("Import Example"):
+        st.code("""
+    # Importing a library
+    import math
+    # Using a function from the library
+    print(math.sqrt(16))  # This will print 4.0
+        """, line_numbers=True)
+    # Interactive Import Exercise
+    st.subheader("Try it yourself!")
+    import_code = st.text_area("Try importing and using the math library:",
+                             "import math\nprint(math.sqrt(25))",
+                             height=100)
+    if st.button("Run Import Code"):
+        output = capture_output(import_code)
+        st.code(output, line_numbers=True)
+    # Print Statements Section
+    st.header("2. Print Statements")
+    st.markdown("""
+    The print() function is used to display output to the console.
+    """)
+    with st.expander("Print Examples"):
+        st.code("""
+    # Basic print
+    print("Hello, World!")
+    # Print with variables
+    name = "Alice"
+    print(f"Hello, {name}!")
+    # Print multiple items
+    print("The answer is:", 42)
+        """, line_numbers=True)
+    # Interactive Print Exercise
+    st.subheader("Try it yourself!")
+    print_code = st.text_area("Try some print statements:",
+                            'print("Hello, World!")\nname = "Python"\nprint(f"Hello, {name}!")',
+                            height=100)
+    if st.button("Run Print Code"):
+        output = capture_output(print_code)
+        st.code(output, line_numbers=True)
+    # Basic Arithmetic Section
+    st.header("3. Basic Arithmetic")
+    st.markdown("""
+    Python can perform basic mathematical operations.
+    """)
+    with st.expander("Arithmetic Examples"):
+        st.code("""
+    # Addition
+    result = 5 + 3
+    print(result)  # Prints 8
+    # Subtraction
+    result = 10 - 4
+    print(result)  # Prints 6
+    # Multiplication
+    result = 6 * 7
+    print(result)  # Prints 42
+    # Division
+    result = 15 / 3
+    print(result)  # Prints 5.0
+        """, line_numbers=True)
+    # Interactive Arithmetic Exercise
+    st.subheader("Try it yourself!")
+    arithmetic_code = st.text_area("Try some arithmetic operations:",
+                                 'print(5 + 3)\nprint(10 - 4)\nprint(6 * 7)\nprint(15 / 3)',
+                                 height=100)
+    if st.button("Run Arithmetic Code"):
+        output = capture_output(arithmetic_code)
+        st.code(output, line_numbers=True)
+    # Lists Section
+    st.header("4. Lists")
+    st.markdown("""
+    Lists are used to store multiple items in a single variable.
+    """)
+    with st.expander("List Examples"):
+        st.code("""
+    # Creating a list
+    fruits = ["apple", "banana", "cherry"]
+    # Accessing list items
+    print(fruits[0])  # Prints "apple"
+    # Adding to a list
+    fruits.append("orange")
+    print(fruits)  # Prints ["apple", "banana", "cherry", "orange"]
+    # List length
+    print(len(fruits))  # Prints 4
+        """, line_numbers=True)
+    # Interactive List Exercise
+    st.subheader("Try it yourself!")
+    list_code = st.text_area("Try working with lists:",
+                           'fruits = ["apple", "banana", "cherry"]\nprint(fruits[0])\nfruits.append("orange")\nprint(fruits)\nprint(len(fruits))',
+                           height=100)
+    if st.button("Run List Code"):
+        output = capture_output(list_code)
+        st.code(output, line_numbers=True)
+    # Practice Exercise
+    st.header("Practice Exercise")
+    st.markdown("""
+    ### Try this exercise:
+    Create a program that:
+    1. Imports the math library
+    2. Creates a list of numbers
+    3. Uses a loop to print each number and its square root
+    """)
+    # Interactive Practice Exercise
+    st.subheader("Try your solution!")
+    practice_code = st.text_area("Write your solution here:",
+                               'import math\n\nnumbers = [4, 9, 16, 25]\n\nfor num in numbers:\n    print(f"Number: {num}, Square root: {math.sqrt(num)}")',
+                               height=150)
+    if st.button("Run Practice Code"):
+        output = capture_output(practice_code)
+        st.code(output, line_numbers=True)
+    st.markdown("""
+    ## Part 2: Data Cleaning Lab
+    In this lab, we'll learn how to clean and prepare data using pandas. We'll work with the Advertising dataset and practice common data cleaning techniques.
+    This lab is hosted in a Jupyter notebook environment. We will create a new notebook for this lab.
+    """)
+    st.markdown("""
+    ## Week 2: Reference Material
+    Please refer to the following links:
+    - [Pandas Documentation](https://pandas.pydata.org/docs/)
+    - [Numpy Documentation](https://numpy.org/doc/)
+    - [Matplotlib Documentation](https://matplotlib.org/stable/users/index.html)
+    - [Seaborn Documentation](https://seaborn.pydata.org/index.html)
+    For learning more about python use the following link:
+    - [Introduction to Statistical Learning](https://www.statlearning.com/resources-python)
+    - [Learning Python notebook](https://github.com/intro-stat-learning/ISLP_labs/blob/stable/Ch02-statlearn-lab.ipynb)
+    For our dataset used today for class:
+    - [Advertising Dataset](https://www.statlearning.com/s/Advertising.csv)
+    """)
+    # Weekly Assignment
+    st.header("Weekly Assignment")
+    st.markdown("""
+    ### Assignment 2: Python Basics
+    1. Import the dataset that you studied last week: https://github.com/hollandstam1/thesis/blob/main/_book/Quantifying- Art-Historical-Narratives.pdf
+    2. Create a new notebook and load the dataset
+    3. Explore the dataset by answering the following questions:
+        - How many rows and columns are there in the dataset?
+        - What are the variables in the dataset?
+        - What is the data type of each variable?
+        - What is the range of each variable?
+        - What is the mean of each variable?
+    **Due Date:** End of Week 2
+    """)
+   '''
+    # Assignment Submission
+    st.subheader("Submit Your Assignment")
+    with st.form("assignment_form"):
+        script_file = st.file_uploader("Upload your Python script (.py)")
+        comments = st.text_area("Additional comments or questions")
+        if st.form_submit_button("Submit Assignment"):
+            if script_file is not None:
+                st.success("Assignment submitted successfully!")
+            else:
+                st.error("Please upload your Python script.")'''