raymondEDS commited on
Commit
3215313
·
1 Parent(s): 63a7f01

removing files

Browse files
Reference files/Week2_ref/Ch02-statlearn-lab.ipynb DELETED
@@ -1,3229 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "id": "245f0c86",
6
- "metadata": {},
7
- "source": [
8
- "\n",
9
- "# Chapter 2\n",
10
- "\n",
11
- "# Lab: Introduction to Python\n",
12
- "\n"
13
- ]
14
- },
15
- {
16
- "cell_type": "markdown",
17
- "id": "5ab29948",
18
- "metadata": {},
19
- "source": [
20
- "## Getting Started"
21
- ]
22
- },
23
- {
24
- "cell_type": "markdown",
25
- "id": "ed622870",
26
- "metadata": {},
27
- "source": [
28
- "To run the labs in this book, you will need two things:\n",
29
- "\n",
30
- "* An installation of `Python3`, which is the specific version of `Python` used in the labs. \n",
31
- "* Access to `Jupyter`, a very popular `Python` interface that runs code through a file called a *notebook*. "
32
- ]
33
- },
34
- {
35
- "cell_type": "markdown",
36
- "id": "844d37fc",
37
- "metadata": {},
38
- "source": [
39
- "You can download and install `Python3` by following the instructions available at [anaconda.com](http://anaconda.com). "
40
- ]
41
- },
42
- {
43
- "cell_type": "markdown",
44
- "id": "462ff1fe",
45
- "metadata": {},
46
- "source": [
47
- " There are a number of ways to get access to `Jupyter`. Here are just a few:\n",
48
- " \n",
49
- " * Using Google's `Colaboratory` service: [colab.research.google.com/](https://colab.research.google.com/). \n",
50
- " * Using `JupyterHub`, available at [jupyter.org/hub](https://jupyter.org/hub). \n",
51
- " * Using your own `jupyter` installation. Installation instructions are available at [jupyter.org/install](https://jupyter.org/install). \n",
52
- " \n",
53
- "Please see the `Python` resources page on the book website [statlearning.com](https://www.statlearning.com) for up-to-date information about getting `Python` and `Jupyter` working on your computer. \n",
54
- "\n",
55
- "You will need to install the `ISLP` package, which provides access to the datasets and custom-built functions that we provide.\n",
56
- "Inside a macOS or Linux terminal type `pip install ISLP`; this also installs most other packages needed in the labs. The `Python` resources page has a link to the `ISLP` documentation website.\n",
57
- "\n",
58
- "To run this lab, download the file `Ch2-statlearn-lab.ipynb` from the `Python` resources page. \n",
59
- "Now run the following code at the command line: `jupyter lab Ch2-statlearn-lab.ipynb`.\n",
60
- "\n",
61
- "If you're using Windows, you can use the `start menu` to access `anaconda`, and follow the links. For example, to install `ISLP` and run this lab, you can run the same code above in an `anaconda` shell.\n"
62
- ]
63
- },
64
- {
65
- "cell_type": "markdown",
66
- "id": "b46f9182",
67
- "metadata": {},
68
- "source": [
69
- "## Basic Commands\n"
70
- ]
71
- },
72
- {
73
- "cell_type": "markdown",
74
- "id": "54060fd9",
75
- "metadata": {},
76
- "source": [
77
- "In this lab, we will introduce some simple `Python` commands. \n",
78
- " For more resources about `Python` in general, readers may want to consult the tutorial at [docs.python.org/3/tutorial/](https://docs.python.org/3/tutorial/). \n",
79
- "\n",
80
- "\n",
81
- " \n"
82
- ]
83
- },
84
- {
85
- "cell_type": "markdown",
86
- "id": "d3dbd0e9",
87
- "metadata": {},
88
- "source": [
89
- "Like most programming languages, `Python` uses *functions*\n",
90
- "to perform operations. To run a\n",
91
- "function called `fun`, we type\n",
92
- "`fun(input1,input2)`, where the inputs (or *arguments*)\n",
93
- "`input1` and `input2` tell\n",
94
- "`Python` how to run the function. A function can have any number of\n",
95
- "inputs. For example, the\n",
96
- "`print()` function outputs a text representation of all of its arguments to the console."
97
- ]
98
- },
99
- {
100
- "cell_type": "code",
101
- "execution_count": 1,
102
- "id": "9e8aa21f",
103
- "metadata": {
104
- "execution": {}
105
- },
106
- "outputs": [
107
- {
108
- "name": "stdout",
109
- "output_type": "stream",
110
- "text": [
111
- "fit a model with 11 variables\n"
112
- ]
113
- }
114
- ],
115
- "source": [
116
- "print('fit a model with', 11, 'variables')\n"
117
- ]
118
- },
119
- {
120
- "cell_type": "markdown",
121
- "id": "27d935f8",
122
- "metadata": {},
123
- "source": [
124
- " The following command will provide information about the `print()` function."
125
- ]
126
- },
127
- {
128
- "cell_type": "code",
129
- "execution_count": null,
130
- "id": "d62ec119",
131
- "metadata": {
132
- "execution": {}
133
- },
134
- "outputs": [],
135
- "source": [
136
- "print?\n"
137
- ]
138
- },
139
- {
140
- "cell_type": "markdown",
141
- "id": "04b3e2a3",
142
- "metadata": {},
143
- "source": [
144
- "Adding two integers in `Python` is pretty intuitive."
145
- ]
146
- },
147
- {
148
- "cell_type": "code",
149
- "execution_count": null,
150
- "id": "c64e9f4d",
151
- "metadata": {
152
- "execution": {}
153
- },
154
- "outputs": [],
155
- "source": [
156
- "3 + 5\n"
157
- ]
158
- },
159
- {
160
- "cell_type": "markdown",
161
- "id": "cd754cba",
162
- "metadata": {},
163
- "source": [
164
- "In `Python`, textual data is handled using\n",
165
- "*strings*. For instance, `\"hello\"` and\n",
166
- "`'hello'`\n",
167
- "are strings. \n",
168
- "We can concatenate them using the addition `+` symbol."
169
- ]
170
- },
171
- {
172
- "cell_type": "code",
173
- "execution_count": null,
174
- "id": "9abccc1f",
175
- "metadata": {
176
- "execution": {}
177
- },
178
- "outputs": [],
179
- "source": [
180
- "\"hello\" + \"world\"\n"
181
- ]
182
- },
183
- {
184
- "cell_type": "markdown",
185
- "id": "c28db903",
186
- "metadata": {},
187
- "source": [
188
- " A string is actually a type of *sequence*: this is a generic term for an ordered list. \n",
189
- " The three most important types of sequences are lists, tuples, and strings. \n",
190
- "We introduce lists now. "
191
- ]
192
- },
193
- {
194
- "cell_type": "markdown",
195
- "id": "5fdcc5a1",
196
- "metadata": {},
197
- "source": [
198
- "The following command instructs `Python` to join together\n",
199
- "the numbers 3, 4, and 5, and to save them as a\n",
200
- "*list* named `x`. When we\n",
201
- "type `x`, it gives us back the list."
202
- ]
203
- },
204
- {
205
- "cell_type": "code",
206
- "execution_count": null,
207
- "id": "802ca33c",
208
- "metadata": {
209
- "execution": {}
210
- },
211
- "outputs": [],
212
- "source": [
213
- "x = [3, 4, 5]\n",
214
- "x\n"
215
- ]
216
- },
217
- {
218
- "cell_type": "markdown",
219
- "id": "5492ecd1",
220
- "metadata": {},
221
- "source": [
222
- "Note that we used the brackets\n",
223
- "`[]` to construct this list. \n",
224
- "\n",
225
- "We will often want to add two sets of numbers together. It is reasonable to try the following code,\n",
226
- "though it will not produce the desired results."
227
- ]
228
- },
229
- {
230
- "cell_type": "code",
231
- "execution_count": null,
232
- "id": "a8c72744",
233
- "metadata": {
234
- "execution": {}
235
- },
236
- "outputs": [],
237
- "source": [
238
- "y = [4, 9, 7]\n",
239
- "x + y\n"
240
- ]
241
- },
242
- {
243
- "cell_type": "code",
244
- "execution_count": null,
245
- "id": "b84f9d0e",
246
- "metadata": {},
247
- "outputs": [],
248
- "source": [
249
- "x[3]"
250
- ]
251
- },
252
- {
253
- "cell_type": "markdown",
254
- "id": "8f42ea1d",
255
- "metadata": {},
256
- "source": [
257
- "The result may appear slightly counterintuitive: why did `Python` not add the entries of the lists\n",
258
- "element-by-element? \n",
259
- " In `Python`, lists hold *arbitrary* objects, and are added using *concatenation*. \n",
260
- " In fact, concatenation is the behavior that we saw earlier when we entered `\"hello\" + \" \" + \"world\"`. \n",
261
- " "
262
- ]
263
- },
264
- {
265
- "cell_type": "markdown",
266
- "id": "69015df5",
267
- "metadata": {},
268
- "source": [
269
- "This example reflects the fact that \n",
270
- " `Python` is a general-purpose programming language. Much of `Python`'s data-specific\n",
271
- "functionality comes from other packages, notably `numpy`\n",
272
- "and `pandas`. \n",
273
- "In the next section, we will introduce the `numpy` package. \n",
274
- "See [docs.scipy.org/doc/numpy/user/quickstart.html](https://docs.scipy.org/doc/numpy/user/quickstart.html) for more information about `numpy`.\n"
275
- ]
276
- },
277
- {
278
- "cell_type": "markdown",
279
- "id": "16bfc4a2",
280
- "metadata": {},
281
- "source": [
282
- "## Introduction to Numerical Python\n",
283
- "\n",
284
- "As mentioned earlier, this book makes use of functionality that is contained in the `numpy` \n",
285
- " *library*, or *package*. A package is a collection of modules that are not necessarily included in \n",
286
- " the base `Python` distribution. The name `numpy` is an abbreviation for *numerical Python*. "
287
- ]
288
- },
289
- {
290
- "cell_type": "markdown",
291
- "id": "f5bed3f0",
292
- "metadata": {},
293
- "source": [
294
- " To access `numpy`, we must first `import` it."
295
- ]
296
- },
297
- {
298
- "cell_type": "code",
299
- "execution_count": null,
300
- "id": "f1c7d1db",
301
- "metadata": {
302
- "execution": {},
303
- "lines_to_next_cell": 0
304
- },
305
- "outputs": [],
306
- "source": [
307
- "import numpy as np "
308
- ]
309
- },
310
- {
311
- "cell_type": "markdown",
312
- "id": "5c8614e7",
313
- "metadata": {},
314
- "source": [
315
- "In the previous line, we named the `numpy` *module* `np`; an abbreviation for easier referencing."
316
- ]
317
- },
318
- {
319
- "cell_type": "markdown",
320
- "id": "ba1224a6",
321
- "metadata": {},
322
- "source": [
323
- "In `numpy`, an *array* is a generic term for a multidimensional\n",
324
- "set of numbers.\n",
325
- "We use the `np.array()` function to define `x` and `y`, which are one-dimensional arrays, i.e. vectors."
326
- ]
327
- },
328
- {
329
- "cell_type": "code",
330
- "execution_count": null,
331
- "id": "e2ea2bfd",
332
- "metadata": {
333
- "execution": {},
334
- "lines_to_next_cell": 0
335
- },
336
- "outputs": [],
337
- "source": [
338
- "x = np.array([3, 4, 5])\n",
339
- "y = np.array([4, 9, 7])"
340
- ]
341
- },
342
- {
343
- "cell_type": "markdown",
344
- "id": "a977e05a",
345
- "metadata": {},
346
- "source": [
347
- "Note that if you forgot to run the `import numpy as np` command earlier, then\n",
348
- "you will encounter an error in calling the `np.array()` function in the previous line. \n",
349
- " The syntax `np.array()` indicates that the function being called\n",
350
- "is part of the `numpy` package, which we have abbreviated as `np`. "
351
- ]
352
- },
353
- {
354
- "cell_type": "markdown",
355
- "id": "742431b6",
356
- "metadata": {},
357
- "source": [
358
- "Since `x` and `y` have been defined using `np.array()`, we get a sensible result when we add them together. Compare this to our results in the previous section,\n",
359
- " when we tried to add two lists without using `numpy`. "
360
- ]
361
- },
362
- {
363
- "cell_type": "code",
364
- "execution_count": null,
365
- "id": "59fbf9fd",
366
- "metadata": {
367
- "execution": {},
368
- "lines_to_next_cell": 0
369
- },
370
- "outputs": [],
371
- "source": [
372
- "x + y"
373
- ]
374
- },
375
- {
376
- "cell_type": "markdown",
377
- "id": "2ceccc2b",
378
- "metadata": {},
379
- "source": [
380
- " \n",
381
- " \n"
382
- ]
383
- },
384
- {
385
- "cell_type": "markdown",
386
- "id": "74be6d74",
387
- "metadata": {},
388
- "source": [
389
- "In `numpy`, matrices are typically represented as two-dimensional arrays, and vectors as one-dimensional arrays. {While it is also possible to create matrices using `np.matrix()`, we will use `np.array()` throughout the labs in this book.}\n",
390
- "We can create a two-dimensional array as follows. "
391
- ]
392
- },
393
- {
394
- "cell_type": "code",
395
- "execution_count": null,
396
- "id": "2279437e",
397
- "metadata": {
398
- "execution": {},
399
- "lines_to_next_cell": 0
400
- },
401
- "outputs": [],
402
- "source": [
403
- "x = np.array([[1, 2], [3, 4]])\n",
404
- "x"
405
- ]
406
- },
407
- {
408
- "cell_type": "markdown",
409
- "id": "f96f304d",
410
- "metadata": {},
411
- "source": [
412
- " \n",
413
- "\n"
414
- ]
415
- },
416
- {
417
- "cell_type": "markdown",
418
- "id": "f764f7d1",
419
- "metadata": {},
420
- "source": [
421
- "The object `x` has several \n",
422
- "*attributes*, or associated objects. To access an attribute of `x`, we type `x.attribute`, where we replace `attribute`\n",
423
- "with the name of the attribute. \n",
424
- "For instance, we can access the `ndim` attribute of `x` as follows. "
425
- ]
426
- },
427
- {
428
- "cell_type": "code",
429
- "execution_count": null,
430
- "id": "75bf1b1e",
431
- "metadata": {
432
- "execution": {}
433
- },
434
- "outputs": [],
435
- "source": [
436
- "x.ndim"
437
- ]
438
- },
439
- {
440
- "cell_type": "markdown",
441
- "id": "4e3b83bf",
442
- "metadata": {},
443
- "source": [
444
- "The output indicates that `x` is a two-dimensional array. \n",
445
- "Similarly, `x.dtype` is the *data type* attribute of the object `x`. This indicates that `x` is \n",
446
- "comprised of 64-bit integers:"
447
- ]
448
- },
449
- {
450
- "cell_type": "code",
451
- "execution_count": null,
452
- "id": "58292240",
453
- "metadata": {
454
- "execution": {},
455
- "lines_to_next_cell": 0
456
- },
457
- "outputs": [],
458
- "source": [
459
- "x.dtype"
460
- ]
461
- },
462
- {
463
- "cell_type": "markdown",
464
- "id": "cf9cf94b",
465
- "metadata": {},
466
- "source": [
467
- "Why is `x` comprised of integers? This is because we created `x` by passing in exclusively integers to the `np.array()` function.\n",
468
- " If\n",
469
- "we had passed in any decimals, then we would have obtained an array of\n",
470
- "*floating point numbers* (i.e. real-valued numbers). "
471
- ]
472
- },
473
- {
474
- "cell_type": "code",
475
- "execution_count": null,
476
- "id": "fc5fff57",
477
- "metadata": {
478
- "execution": {},
479
- "lines_to_next_cell": 2
480
- },
481
- "outputs": [],
482
- "source": [
483
- "np.array([[1, 2], [3.0, 4]]).dtype\n"
484
- ]
485
- },
486
- {
487
- "cell_type": "markdown",
488
- "id": "41a79641",
489
- "metadata": {},
490
- "source": [
491
- "Typing `fun?` will cause `Python` to display \n",
492
- "documentation associated with the function `fun`, if it exists.\n",
493
- "We can try this for `np.array()`. "
494
- ]
495
- },
496
- {
497
- "cell_type": "code",
498
- "execution_count": null,
499
- "id": "762562a6",
500
- "metadata": {
501
- "execution": {},
502
- "lines_to_next_cell": 0
503
- },
504
- "outputs": [],
505
- "source": [
506
- "np.array?\n"
507
- ]
508
- },
509
- {
510
- "cell_type": "markdown",
511
- "id": "d4d82167",
512
- "metadata": {},
513
- "source": [
514
- "This documentation indicates that we could create a floating point array by passing a `dtype` argument into `np.array()`."
515
- ]
516
- },
517
- {
518
- "cell_type": "code",
519
- "execution_count": null,
520
- "id": "66d2b82a",
521
- "metadata": {
522
- "execution": {},
523
- "lines_to_next_cell": 2
524
- },
525
- "outputs": [],
526
- "source": [
527
- "np.array([[1, 2], [3, 4]], float).dtype\n"
528
- ]
529
- },
530
- {
531
- "cell_type": "markdown",
532
- "id": "1e3ba5be",
533
- "metadata": {},
534
- "source": [
535
- "The array `x` is two-dimensional. We can find out the number of rows and columns by looking\n",
536
- "at its `shape` attribute."
537
- ]
538
- },
539
- {
540
- "cell_type": "code",
541
- "execution_count": null,
542
- "id": "89881402",
543
- "metadata": {
544
- "execution": {},
545
- "lines_to_next_cell": 2
546
- },
547
- "outputs": [],
548
- "source": [
549
- "x.shape\n"
550
- ]
551
- },
552
- {
553
- "cell_type": "markdown",
554
- "id": "2967b644",
555
- "metadata": {},
556
- "source": [
557
- "A *method* is a function that is associated with an\n",
558
- "object. \n",
559
- "For instance, given an array `x`, the expression\n",
560
- "`x.sum()` sums all of its elements, using the `sum()`\n",
561
- "method for arrays. \n",
562
- "The call `x.sum()` automatically provides `x` as the\n",
563
- "first argument to its `sum()` method."
564
- ]
565
- },
566
- {
567
- "cell_type": "code",
568
- "execution_count": null,
569
- "id": "0572d3f6",
570
- "metadata": {
571
- "execution": {},
572
- "lines_to_next_cell": 0
573
- },
574
- "outputs": [],
575
- "source": [
576
- "x = np.array([1, 2, 3, 4])\n",
577
- "x.sum()"
578
- ]
579
- },
580
- {
581
- "cell_type": "markdown",
582
- "id": "e3f49995",
583
- "metadata": {},
584
- "source": [
585
- "We could also sum the elements of `x` by passing in `x` as an argument to the `np.sum()` function. "
586
- ]
587
- },
588
- {
589
- "cell_type": "code",
590
- "execution_count": null,
591
- "id": "33b10a6f",
592
- "metadata": {
593
- "execution": {},
594
- "lines_to_next_cell": 0
595
- },
596
- "outputs": [],
597
- "source": [
598
- "x = np.array([1, 2, 3, 4])\n",
599
- "np.sum(x)"
600
- ]
601
- },
602
- {
603
- "cell_type": "markdown",
604
- "id": "2f3dd2c3",
605
- "metadata": {},
606
- "source": [
607
- " As another example, the\n",
608
- "`reshape()` method returns a new array with the same elements as\n",
609
- "`x`, but a different shape.\n",
610
- " We do this by passing in a `tuple` in our call to\n",
611
- " `reshape()`, in this case `(2, 3)`. This tuple specifies that we would like to create a two-dimensional array with \n",
612
- "$2$ rows and $3$ columns. {Like lists, tuples represent a sequence of objects. Why do we need more than one way to create a sequence? There are a few differences between tuples and lists, but perhaps the most important is that elements of a tuple cannot be modified, whereas elements of a list can be.}\n",
613
- " \n",
614
- "In what follows, the\n",
615
- "`\\n` character creates a *new line*."
616
- ]
617
- },
618
- {
619
- "cell_type": "code",
620
- "execution_count": null,
621
- "id": "a32716db",
622
- "metadata": {
623
- "execution": {}
624
- },
625
- "outputs": [],
626
- "source": [
627
- "x = np.array([1, 2, 3, 4, 5, 6])\n",
628
- "print('beginning x:\\n', x)\n",
629
- "x_reshape = x.reshape((2, 3))\n",
630
- "print('reshaped x:\\n', x_reshape)\n"
631
- ]
632
- },
633
- {
634
- "cell_type": "markdown",
635
- "id": "2483179e",
636
- "metadata": {},
637
- "source": [
638
- "The previous output reveals that `numpy` arrays are specified as a sequence\n",
639
- "of *rows*. This is called *row-major ordering*, as opposed to *column-major ordering*. "
640
- ]
641
- },
642
- {
643
- "cell_type": "markdown",
644
- "id": "e256575f",
645
- "metadata": {},
646
- "source": [
647
- "`Python` (and hence `numpy`) uses 0-based\n",
648
- "indexing. This means that to access the top left element of `x_reshape`, \n",
649
- "we type in `x_reshape[0,0]`."
650
- ]
651
- },
652
- {
653
- "cell_type": "code",
654
- "execution_count": null,
655
- "id": "3db6e1cf",
656
- "metadata": {
657
- "execution": {},
658
- "lines_to_next_cell": 0
659
- },
660
- "outputs": [],
661
- "source": [
662
- "x_reshape[0, 0] "
663
- ]
664
- },
665
- {
666
- "cell_type": "markdown",
667
- "id": "0e10119e",
668
- "metadata": {},
669
- "source": [
670
- "Similarly, `x_reshape[1,2]` yields the element in the second row and the third column \n",
671
- "of `x_reshape`. "
672
- ]
673
- },
674
- {
675
- "cell_type": "code",
676
- "execution_count": null,
677
- "id": "e15c753f",
678
- "metadata": {
679
- "execution": {},
680
- "lines_to_next_cell": 0
681
- },
682
- "outputs": [],
683
- "source": [
684
- "x_reshape[1, 2] "
685
- ]
686
- },
687
- {
688
- "cell_type": "markdown",
689
- "id": "f9c55622",
690
- "metadata": {},
691
- "source": [
692
- "Similarly, `x[2]` yields the\n",
693
- "third entry of `x`. \n",
694
- "\n",
695
- "Now, let's modify the top left element of `x_reshape`. To our surprise, we discover that the first element of `x` has been modified as well!\n",
696
- "\n"
697
- ]
698
- },
699
- {
700
- "cell_type": "code",
701
- "execution_count": null,
702
- "id": "91c6e7d8",
703
- "metadata": {
704
- "execution": {}
705
- },
706
- "outputs": [],
707
- "source": [
708
- "print('x before we modify x_reshape:\\n', x)\n",
709
- "print('x_reshape before we modify x_reshape:\\n', x_reshape)\n",
710
- "x_reshape[0, 0] = 5\n",
711
- "print('x_reshape after we modify its top left element:\\n', x_reshape)\n",
712
- "print('x after we modify top left element of x_reshape:\\n', x)\n"
713
- ]
714
- },
715
- {
716
- "cell_type": "markdown",
717
- "id": "8a840507",
718
- "metadata": {},
719
- "source": [
720
- "Modifying `x_reshape` also modified `x` because the two objects occupy the same space in memory.\n",
721
- " \n",
722
- "\n",
723
- " "
724
- ]
725
- },
726
- {
727
- "cell_type": "markdown",
728
- "id": "ec551f3e",
729
- "metadata": {},
730
- "source": [
731
- "We just saw that we can modify an element of an array. Can we also modify a tuple? It turns out that we cannot --- and trying to do so introduces\n",
732
- "an *exception*, or error."
733
- ]
734
- },
735
- {
736
- "cell_type": "code",
737
- "execution_count": null,
738
- "id": "59d95dce",
739
- "metadata": {
740
- "execution": {},
741
- "lines_to_next_cell": 2
742
- },
743
- "outputs": [],
744
- "source": [
745
- "my_tuple = (3, 4, 5)\n",
746
- "my_tuple[0] = 2\n"
747
- ]
748
- },
749
- {
750
- "cell_type": "markdown",
751
- "id": "d594f1af",
752
- "metadata": {},
753
- "source": [
754
- "We now briefly mention some attributes of arrays that will come in handy. An array's `shape` attribute contains its dimension; this is always a tuple.\n",
755
- "The `ndim` attribute yields the number of dimensions, and `T` provides its transpose. "
756
- ]
757
- },
758
- {
759
- "cell_type": "code",
760
- "execution_count": null,
761
- "id": "a6fde9af",
762
- "metadata": {
763
- "execution": {}
764
- },
765
- "outputs": [],
766
- "source": [
767
- "x_reshape.shape, x_reshape.ndim, x_reshape.T\n"
768
- ]
769
- },
770
- {
771
- "cell_type": "markdown",
772
- "id": "76d20b98",
773
- "metadata": {},
774
- "source": [
775
- "Notice that the three individual outputs `(2,3)`, `2`, and `array([[5, 4],[2, 5], [3,6]])` are themselves output as a tuple. \n",
776
- " \n",
777
- "We will often want to apply functions to arrays. \n",
778
- "For instance, we can compute the\n",
779
- "square root of the entries using the `np.sqrt()` function: "
780
- ]
781
- },
782
- {
783
- "cell_type": "code",
784
- "execution_count": null,
785
- "id": "fadb6b45",
786
- "metadata": {
787
- "execution": {}
788
- },
789
- "outputs": [],
790
- "source": [
791
- "np.sqrt(x)\n"
792
- ]
793
- },
794
- {
795
- "cell_type": "markdown",
796
- "id": "22fab2ce",
797
- "metadata": {},
798
- "source": [
799
- "We can also square the elements:"
800
- ]
801
- },
802
- {
803
- "cell_type": "code",
804
- "execution_count": null,
805
- "id": "fda3134b",
806
- "metadata": {
807
- "execution": {}
808
- },
809
- "outputs": [],
810
- "source": [
811
- "x**2\n"
812
- ]
813
- },
814
- {
815
- "cell_type": "markdown",
816
- "id": "1278f26b",
817
- "metadata": {},
818
- "source": [
819
- "We can compute the square roots using the same notation, raising to the power of $1/2$ instead of 2."
820
- ]
821
- },
822
- {
823
- "cell_type": "code",
824
- "execution_count": null,
825
- "id": "52eb335b",
826
- "metadata": {
827
- "execution": {},
828
- "lines_to_next_cell": 2
829
- },
830
- "outputs": [],
831
- "source": [
832
- "x**0.5\n"
833
- ]
834
- },
835
- {
836
- "cell_type": "markdown",
837
- "id": "299a5a85",
838
- "metadata": {},
839
- "source": [
840
- "Throughout this book, we will often want to generate random data. \n",
841
- "The `np.random.normal()` function generates a vector of random\n",
842
- "normal variables. We can learn more about this function by looking at the help page, via a call to `np.random.normal?`.\n",
843
- "The first line of the help page reads `normal(loc=0.0, scale=1.0, size=None)`. \n",
844
- " This *signature* line tells us that the function's arguments are `loc`, `scale`, and `size`. These are *keyword* arguments, which means that when they are passed into\n",
845
- " the function, they can be referred to by name (in any order). {`Python` also uses *positional* arguments. Positional arguments do not need to use a keyword. To see an example, type in `np.sum?`. We see that `a` is a positional argument, i.e. this function assumes that the first unnamed argument that it receives is the array to be summed. By contrast, `axis` and `dtype` are keyword arguments: the position in which these arguments are entered into `np.sum()` does not matter.}\n",
846
- " By default, this function will generate random normal variable(s) with mean (`loc`) $0$ and standard deviation (`scale`) $1$; furthermore, \n",
847
- " a single random variable will be generated unless the argument to `size` is changed. \n",
848
- "\n",
849
- "We now generate 50 independent random variables from a $N(0,1)$ distribution. "
850
- ]
851
- },
852
- {
853
- "cell_type": "code",
854
- "execution_count": null,
855
- "id": "ac5e9d29",
856
- "metadata": {
857
- "execution": {}
858
- },
859
- "outputs": [],
860
- "source": [
861
- "x = np.random.normal(size=50)\n",
862
- "x\n"
863
- ]
864
- },
865
- {
866
- "cell_type": "markdown",
867
- "id": "d77cf45a",
868
- "metadata": {},
869
- "source": [
870
- "We create an array `y` by adding an independent $N(50,1)$ random variable to each element of `x`."
871
- ]
872
- },
873
- {
874
- "cell_type": "code",
875
- "execution_count": null,
876
- "id": "55fa905e",
877
- "metadata": {
878
- "execution": {},
879
- "lines_to_next_cell": 0
880
- },
881
- "outputs": [],
882
- "source": [
883
- "y = x + np.random.normal(loc=50, scale=1, size=50)"
884
- ]
885
- },
886
- {
887
- "cell_type": "markdown",
888
- "id": "eacfecc9",
889
- "metadata": {},
890
- "source": [
891
- "The `np.corrcoef()` function computes the correlation matrix between `x` and `y`. The off-diagonal elements give the \n",
892
- "correlation between `x` and `y`. "
893
- ]
894
- },
895
- {
896
- "cell_type": "code",
897
- "execution_count": null,
898
- "id": "fde0dc19",
899
- "metadata": {
900
- "execution": {}
901
- },
902
- "outputs": [],
903
- "source": [
904
- "np.corrcoef(x, y)"
905
- ]
906
- },
907
- {
908
- "cell_type": "markdown",
909
- "id": "8a594218",
910
- "metadata": {},
911
- "source": [
912
- "If you're following along in your own `Jupyter` notebook, then you probably noticed that you got a different set of results when you ran the past few \n",
913
- "commands. In particular, \n",
914
- " each\n",
915
- "time we call `np.random.normal()`, we will get a different answer, as shown in the following example."
916
- ]
917
- },
918
- {
919
- "cell_type": "code",
920
- "execution_count": null,
921
- "id": "5099cf54",
922
- "metadata": {
923
- "execution": {},
924
- "lines_to_next_cell": 0
925
- },
926
- "outputs": [],
927
- "source": [
928
- "print(np.random.normal(scale=5, size=2))\n",
929
- "print(np.random.normal(scale=5, size=2)) \n"
930
- ]
931
- },
932
- {
933
- "cell_type": "markdown",
934
- "id": "2e209118",
935
- "metadata": {},
936
- "source": [
937
- " "
938
- ]
939
- },
940
- {
941
- "cell_type": "markdown",
942
- "id": "ed7697a4",
943
- "metadata": {},
944
- "source": [
945
- "In order to ensure that our code provides exactly the same results\n",
946
- "each time it is run, we can set a *random seed* \n",
947
- "using the \n",
948
- "`np.random.default_rng()` function.\n",
949
- "This function takes an arbitrary, user-specified integer argument. If we set a random seed before \n",
950
- "generating random data, then re-running our code will yield the same results. The\n",
951
- "object `rng` has essentially all the random number generating methods found in `np.random`. Hence, to\n",
952
- "generate normal data we use `rng.normal()`."
953
- ]
954
- },
955
- {
956
- "cell_type": "code",
957
- "execution_count": null,
958
- "id": "9d8074e5",
959
- "metadata": {
960
- "execution": {}
961
- },
962
- "outputs": [],
963
- "source": [
964
- "rng = np.random.default_rng(1303)\n",
965
- "print(rng.normal(scale=5, size=2))\n",
966
- "rng2 = np.random.default_rng(1303)\n",
967
- "print(rng2.normal(scale=5, size=2)) "
968
- ]
969
- },
970
- {
971
- "cell_type": "markdown",
972
- "id": "93f826ef",
973
- "metadata": {},
974
- "source": [
975
- "Throughout the labs in this book, we use `np.random.default_rng()` whenever we\n",
976
- "perform calculations involving random quantities within `numpy`. In principle, this\n",
977
- "should enable the reader to exactly reproduce the stated results. However, as new versions of `numpy` become available, it is possible\n",
978
- "that some small discrepancies may occur between the output\n",
979
- "in the labs and the output\n",
980
- "from `numpy`.\n",
981
- "\n",
982
- "The `np.mean()`, `np.var()`, and `np.std()` functions can be used\n",
983
- "to compute the mean, variance, and standard deviation of arrays. These functions are also\n",
984
- "available as methods on the arrays."
985
- ]
986
- },
987
- {
988
- "cell_type": "code",
989
- "execution_count": null,
990
- "id": "e98472df",
991
- "metadata": {
992
- "execution": {},
993
- "lines_to_next_cell": 0
994
- },
995
- "outputs": [],
996
- "source": [
997
- "rng = np.random.default_rng(3)\n",
998
- "y = rng.standard_normal(10)\n",
999
- "np.mean(y), y.mean()"
1000
- ]
1001
- },
1002
- {
1003
- "cell_type": "markdown",
1004
- "id": "2870d61f",
1005
- "metadata": {},
1006
- "source": [
1007
- " \n"
1008
- ]
1009
- },
1010
- {
1011
- "cell_type": "code",
1012
- "execution_count": null,
1013
- "id": "8c2784fd",
1014
- "metadata": {
1015
- "execution": {},
1016
- "lines_to_next_cell": 2
1017
- },
1018
- "outputs": [],
1019
- "source": [
1020
- "np.var(y), y.var(), np.mean((y - y.mean())**2)"
1021
- ]
1022
- },
1023
- {
1024
- "cell_type": "markdown",
1025
- "id": "86261a69",
1026
- "metadata": {},
1027
- "source": [
1028
- "Notice that by default `np.var()` divides by the sample size $n$ rather\n",
1029
- "than $n-1$; see the `ddof` argument in `np.var?`.\n"
1030
- ]
1031
- },
1032
- {
1033
- "cell_type": "code",
1034
- "execution_count": null,
1035
- "id": "7e7205f2",
1036
- "metadata": {
1037
- "execution": {}
1038
- },
1039
- "outputs": [],
1040
- "source": [
1041
- "np.sqrt(np.var(y)), np.std(y)"
1042
- ]
1043
- },
1044
- {
1045
- "cell_type": "markdown",
1046
- "id": "d4faf901",
1047
- "metadata": {},
1048
- "source": [
1049
- "The `np.mean()`, `np.var()`, and `np.std()` functions can also be applied to the rows and columns of a matrix. \n",
1050
- "To see this, we construct a $10 \\times 3$ matrix of $N(0,1)$ random variables, and consider computing its row sums. "
1051
- ]
1052
- },
1053
- {
1054
- "cell_type": "code",
1055
- "execution_count": null,
1056
- "id": "fce06849",
1057
- "metadata": {
1058
- "execution": {}
1059
- },
1060
- "outputs": [],
1061
- "source": [
1062
- "X = rng.standard_normal((10, 3))\n",
1063
- "X"
1064
- ]
1065
- },
1066
- {
1067
- "cell_type": "markdown",
1068
- "id": "6cc355d2",
1069
- "metadata": {},
1070
- "source": [
1071
- "Since arrays are row-major ordered, the first axis, i.e. `axis=0`, refers to its rows. We pass this argument into the `mean()` method for the object `X`. "
1072
- ]
1073
- },
1074
- {
1075
- "cell_type": "code",
1076
- "execution_count": null,
1077
- "id": "1403ff7a",
1078
- "metadata": {
1079
- "execution": {}
1080
- },
1081
- "outputs": [],
1082
- "source": [
1083
- "X.mean(axis=0)"
1084
- ]
1085
- },
1086
- {
1087
- "cell_type": "markdown",
1088
- "id": "6785c0ec",
1089
- "metadata": {},
1090
- "source": [
1091
- "The following yields the same result."
1092
- ]
1093
- },
1094
- {
1095
- "cell_type": "code",
1096
- "execution_count": null,
1097
- "id": "7e9255ba",
1098
- "metadata": {
1099
- "execution": {},
1100
- "lines_to_next_cell": 0
1101
- },
1102
- "outputs": [],
1103
- "source": [
1104
- "X.mean(0)"
1105
- ]
1106
- },
1107
- {
1108
- "cell_type": "markdown",
1109
- "id": "5de246dc",
1110
- "metadata": {},
1111
- "source": [
1112
- " "
1113
- ]
1114
- },
1115
- {
1116
- "cell_type": "markdown",
1117
- "id": "30b002fa",
1118
- "metadata": {},
1119
- "source": [
1120
- "## Graphics\n",
1121
- "In `Python`, common practice is to use the library\n",
1122
- "`matplotlib` for graphics.\n",
1123
- "However, since `Python` was not written with data analysis in mind,\n",
1124
- " the notion of plotting is not intrinsic to the language. \n",
1125
- "We will use the `subplots()` function\n",
1126
- "from `matplotlib.pyplot` to create a figure and the\n",
1127
- "axes onto which we plot our data.\n",
1128
- "For many more examples of how to make plots in `Python`,\n",
1129
- "readers are encouraged to visit [matplotlib.org/stable/gallery/](https://matplotlib.org/stable/gallery/index.html).\n",
1130
- "\n",
1131
- "In `matplotlib`, a plot consists of a *figure* and one or more *axes*. You can think of the figure as the blank canvas upon which \n",
1132
- "one or more plots will be displayed: it is the entire plotting window. \n",
1133
- "The *axes* contain important information about each plot, such as its $x$- and $y$-axis labels,\n",
1134
- "title, and more. (Note that in `matplotlib`, the word *axes* is not the plural of *axis*: a plot's *axes* contains much more information \n",
1135
- "than just the $x$-axis and the $y$-axis.)\n",
1136
- "\n",
1137
- "We begin by importing the `subplots()` function\n",
1138
- "from `matplotlib`. We use this function\n",
1139
- "throughout when creating figures.\n",
1140
- "The function returns a tuple of length two: a figure\n",
1141
- "object as well as the relevant axes object. We will typically\n",
1142
- "pass `figsize` as a keyword argument.\n",
1143
- "Having created our axes, we attempt our first plot using its `plot()` method.\n",
1144
- "To learn more about it, \n",
1145
- "type `ax.plot?`."
1146
- ]
1147
- },
1148
- {
1149
- "cell_type": "code",
1150
- "execution_count": null,
1151
- "id": "8236e5f7",
1152
- "metadata": {
1153
- "execution": {}
1154
- },
1155
- "outputs": [],
1156
- "source": [
1157
- "from matplotlib.pyplot import subplots\n",
1158
- "fig, ax = subplots(figsize=(8, 8))\n",
1159
- "x = rng.standard_normal(100)\n",
1160
- "y = rng.standard_normal(100)\n",
1161
- "ax.plot(x, y);\n"
1162
- ]
1163
- },
1164
- {
1165
- "cell_type": "markdown",
1166
- "id": "bbef67e6",
1167
- "metadata": {},
1168
- "source": [
1169
- "We pause here to note that we have *unpacked* the tuple of length two returned by `subplots()` into the two distinct\n",
1170
- "variables `fig` and `ax`. Unpacking\n",
1171
- "is typically preferred to the following equivalent but slightly more verbose code:"
1172
- ]
1173
- },
1174
- {
1175
- "cell_type": "code",
1176
- "execution_count": null,
1177
- "id": "ddc9ed4f",
1178
- "metadata": {
1179
- "execution": {}
1180
- },
1181
- "outputs": [],
1182
- "source": [
1183
- "output = subplots(figsize=(8, 8))\n",
1184
- "fig = output[0]\n",
1185
- "ax = output[1]"
1186
- ]
1187
- },
1188
- {
1189
- "cell_type": "markdown",
1190
- "id": "104d6b8f",
1191
- "metadata": {},
1192
- "source": [
1193
- "We see that our earlier cell produced a line plot, which is the default. To create a scatterplot, we provide an additional argument to `ax.plot()`, indicating that circles should be displayed."
1194
- ]
1195
- },
1196
- {
1197
- "cell_type": "code",
1198
- "execution_count": null,
1199
- "id": "c64ed600",
1200
- "metadata": {
1201
- "execution": {},
1202
- "lines_to_next_cell": 0
1203
- },
1204
- "outputs": [],
1205
- "source": [
1206
- "fig, ax = subplots(figsize=(8, 8))\n",
1207
- "ax.plot(x, y, 'o');"
1208
- ]
1209
- },
1210
- {
1211
- "cell_type": "markdown",
1212
- "id": "840be2a9",
1213
- "metadata": {},
1214
- "source": [
1215
- "Different values\n",
1216
- "of this additional argument can be used to produce different colored lines\n",
1217
- "as well as different linestyles. \n"
1218
- ]
1219
- },
1220
- {
1221
- "cell_type": "markdown",
1222
- "id": "971b98bd",
1223
- "metadata": {},
1224
- "source": [
1225
- "As an alternative, we could use the `ax.scatter()` function to create a scatterplot."
1226
- ]
1227
- },
1228
- {
1229
- "cell_type": "code",
1230
- "execution_count": null,
1231
- "id": "bc6245e2",
1232
- "metadata": {
1233
- "execution": {}
1234
- },
1235
- "outputs": [],
1236
- "source": [
1237
- "fig, ax = subplots(figsize=(8, 8))\n",
1238
- "ax.scatter(x, y, marker='o');"
1239
- ]
1240
- },
1241
- {
1242
- "cell_type": "markdown",
1243
- "id": "97f36df0",
1244
- "metadata": {},
1245
- "source": [
1246
- "Notice that in the code blocks above, we have ended\n",
1247
- "the last line with a semicolon. This prevents `ax.plot(x, y)` from printing\n",
1248
- "text to the notebook. However, it does not prevent a plot from being produced. \n",
1249
- " If we omit the trailing semi-colon, then we obtain the following output: "
1250
- ]
1251
- },
1252
- {
1253
- "cell_type": "code",
1254
- "execution_count": null,
1255
- "id": "2454807b",
1256
- "metadata": {
1257
- "execution": {},
1258
- "lines_to_next_cell": 0
1259
- },
1260
- "outputs": [],
1261
- "source": [
1262
- "fig, ax = subplots(figsize=(8, 8))\n",
1263
- "ax.scatter(x, y, marker='o')\n"
1264
- ]
1265
- },
1266
- {
1267
- "cell_type": "markdown",
1268
- "id": "1230c0a6",
1269
- "metadata": {},
1270
- "source": [
1271
- "In what follows, we will use\n",
1272
- " trailing semicolons whenever the text that would be output is not\n",
1273
- "germane to the discussion at hand.\n",
1274
- "\n",
1275
- "\n",
1276
- "\n"
1277
- ]
1278
- },
1279
- {
1280
- "cell_type": "markdown",
1281
- "id": "0ccb9964",
1282
- "metadata": {},
1283
- "source": [
1284
- "To label our plot, we make use of the `set_xlabel()`, `set_ylabel()`, and `set_title()` methods\n",
1285
- "of `ax`.\n",
1286
- " "
1287
- ]
1288
- },
1289
- {
1290
- "cell_type": "code",
1291
- "execution_count": null,
1292
- "id": "1e18a793",
1293
- "metadata": {
1294
- "execution": {}
1295
- },
1296
- "outputs": [],
1297
- "source": [
1298
- "fig, ax = subplots(figsize=(8, 8))\n",
1299
- "ax.scatter(x, y, marker='o')\n",
1300
- "ax.set_xlabel(\"this is the x-axis\")\n",
1301
- "ax.set_ylabel(\"this is the y-axis\")\n",
1302
- "ax.set_title(\"Plot of X vs Y\");"
1303
- ]
1304
- },
1305
- {
1306
- "cell_type": "markdown",
1307
- "id": "f2d818ee",
1308
- "metadata": {},
1309
- "source": [
1310
- " Having access to the figure object `fig` itself means that we can go in and change some aspects and then redisplay it. Here, we change\n",
1311
- " the size from `(8, 8)` to `(12, 3)`.\n"
1312
- ]
1313
- },
1314
- {
1315
- "cell_type": "code",
1316
- "execution_count": null,
1317
- "id": "aec3f009",
1318
- "metadata": {
1319
- "execution": {},
1320
- "lines_to_next_cell": 0
1321
- },
1322
- "outputs": [],
1323
- "source": [
1324
- "fig.set_size_inches(12,3)\n",
1325
- "fig"
1326
- ]
1327
- },
1328
- {
1329
- "cell_type": "markdown",
1330
- "id": "dee531cc",
1331
- "metadata": {},
1332
- "source": [
1333
- " "
1334
- ]
1335
- },
1336
- {
1337
- "cell_type": "markdown",
1338
- "id": "011bf802",
1339
- "metadata": {},
1340
- "source": [
1341
- "Occasionally we will want to create several plots within a figure. This can be\n",
1342
- "achieved by passing additional arguments to `subplots()`. \n",
1343
- "Below, we create a $2 \\times 3$ grid of plots\n",
1344
- "in a figure of size determined by the `figsize` argument. In such\n",
1345
- "situations, there is often a relationship between the axes in the plots. For example,\n",
1346
- "all plots may have a common $x$-axis. The `subplots()` function can automatically handle\n",
1347
- "this situation when passed the keyword argument `sharex=True`.\n",
1348
- "The `axes` object below is an array pointing to different plots in the figure. "
1349
- ]
1350
- },
1351
- {
1352
- "cell_type": "code",
1353
- "execution_count": null,
1354
- "id": "2cbc7fd4",
1355
- "metadata": {
1356
- "execution": {},
1357
- "lines_to_next_cell": 0
1358
- },
1359
- "outputs": [],
1360
- "source": [
1361
- "fig, axes = subplots(nrows=2,\n",
1362
- " ncols=3,\n",
1363
- " figsize=(15, 5))"
1364
- ]
1365
- },
1366
- {
1367
- "cell_type": "markdown",
1368
- "id": "b8ff2e6d",
1369
- "metadata": {},
1370
- "source": [
1371
- "We now produce a scatter plot with `'o'` in the second column of the first row and\n",
1372
- "a scatter plot with `'+'` in the third column of the second row."
1373
- ]
1374
- },
1375
- {
1376
- "cell_type": "code",
1377
- "execution_count": null,
1378
- "id": "702f80d9",
1379
- "metadata": {
1380
- "execution": {},
1381
- "lines_to_next_cell": 0
1382
- },
1383
- "outputs": [],
1384
- "source": [
1385
- "axes[0,1].plot(x, y, 'o')\n",
1386
- "axes[1,2].scatter(x, y, marker='+')\n",
1387
- "fig"
1388
- ]
1389
- },
1390
- {
1391
- "cell_type": "markdown",
1392
- "id": "5b265f8b",
1393
- "metadata": {},
1394
- "source": [
1395
- "Type `subplots?` to learn more about \n",
1396
- "`subplots()`. \n",
1397
- "\n",
1398
- "\n"
1399
- ]
1400
- },
1401
- {
1402
- "cell_type": "markdown",
1403
- "id": "1bd7e707",
1404
- "metadata": {},
1405
- "source": [
1406
- "To save the output of `fig`, we call its `savefig()`\n",
1407
- "method. The argument `dpi` is the dots per inch, used\n",
1408
- "to determine how large the figure will be in pixels."
1409
- ]
1410
- },
1411
- {
1412
- "cell_type": "code",
1413
- "execution_count": null,
1414
- "id": "5493d229",
1415
- "metadata": {
1416
- "execution": {},
1417
- "lines_to_next_cell": 2
1418
- },
1419
- "outputs": [],
1420
- "source": [
1421
- "fig.savefig(\"Figure.png\", dpi=400)\n",
1422
- "fig.savefig(\"Figure.pdf\", dpi=200);\n"
1423
- ]
1424
- },
1425
- {
1426
- "cell_type": "markdown",
1427
- "id": "7152d0c7",
1428
- "metadata": {},
1429
- "source": [
1430
- "We can continue to modify `fig` using step-by-step updates; for example, we can modify the range of the $x$-axis, re-save the figure, and even re-display it. "
1431
- ]
1432
- },
1433
- {
1434
- "cell_type": "code",
1435
- "execution_count": null,
1436
- "id": "bd07af12",
1437
- "metadata": {
1438
- "execution": {}
1439
- },
1440
- "outputs": [],
1441
- "source": [
1442
- "axes[0,1].set_xlim([-1,1])\n",
1443
- "fig.savefig(\"Figure_updated.jpg\")\n",
1444
- "fig"
1445
- ]
1446
- },
1447
- {
1448
- "cell_type": "markdown",
1449
- "id": "b5278857",
1450
- "metadata": {},
1451
- "source": [
1452
- "We now create some more sophisticated plots. The \n",
1453
- "`ax.contour()` method produces a *contour plot* \n",
1454
- "in order to represent three-dimensional data, similar to a\n",
1455
- "topographical map. It takes three arguments:\n",
1456
- "\n",
1457
- "* A vector of `x` values (the first dimension),\n",
1458
- "* A vector of `y` values (the second dimension), and\n",
1459
- "* A matrix whose elements correspond to the `z` value (the third\n",
1460
- "dimension) for each pair of `(x,y)` coordinates.\n",
1461
- "\n",
1462
- "To create `x` and `y`, we’ll use the command `np.linspace(a, b, n)`, \n",
1463
- "which returns a vector of `n` numbers starting at `a` and ending at `b`."
1464
- ]
1465
- },
1466
- {
1467
- "cell_type": "code",
1468
- "execution_count": null,
1469
- "id": "01019508",
1470
- "metadata": {
1471
- "execution": {},
1472
- "lines_to_next_cell": 0
1473
- },
1474
- "outputs": [],
1475
- "source": [
1476
- "fig, ax = subplots(figsize=(8, 8))\n",
1477
- "x = np.linspace(-np.pi, np.pi, 50)\n",
1478
- "y = x\n",
1479
- "f = np.multiply.outer(np.cos(y), 1 / (1 + x**2))\n",
1480
- "ax.contour(x, y, f);\n"
1481
- ]
1482
- },
1483
- {
1484
- "cell_type": "markdown",
1485
- "id": "9ef3c475",
1486
- "metadata": {},
1487
- "source": [
1488
- "We can increase the resolution by adding more levels to the image."
1489
- ]
1490
- },
1491
- {
1492
- "cell_type": "code",
1493
- "execution_count": null,
1494
- "id": "7d08992f",
1495
- "metadata": {
1496
- "execution": {},
1497
- "lines_to_next_cell": 0
1498
- },
1499
- "outputs": [],
1500
- "source": [
1501
- "fig, ax = subplots(figsize=(8, 8))\n",
1502
- "ax.contour(x, y, f, levels=45);"
1503
- ]
1504
- },
1505
- {
1506
- "cell_type": "markdown",
1507
- "id": "8e1d37a2",
1508
- "metadata": {},
1509
- "source": [
1510
- "To fine-tune the output of the\n",
1511
- "`ax.contour()` function, take a\n",
1512
- "look at the help file by typing `?plt.contour`.\n",
1513
- " \n",
1514
- "The `ax.imshow()` method is similar to \n",
1515
- "`ax.contour()`, except that it produces a color-coded plot\n",
1516
- "whose colors depend on the `z` value. This is known as a\n",
1517
- "*heatmap*, and is sometimes used to plot temperature in\n",
1518
- "weather forecasts."
1519
- ]
1520
- },
1521
- {
1522
- "cell_type": "code",
1523
- "execution_count": null,
1524
- "id": "1f89d704",
1525
- "metadata": {
1526
- "execution": {},
1527
- "lines_to_next_cell": 2
1528
- },
1529
- "outputs": [],
1530
- "source": [
1531
- "fig, ax = subplots(figsize=(8, 8))\n",
1532
- "ax.imshow(f);\n"
1533
- ]
1534
- },
1535
- {
1536
- "cell_type": "markdown",
1537
- "id": "2500a6ec",
1538
- "metadata": {},
1539
- "source": [
1540
- "## Sequences and Slice Notation"
1541
- ]
1542
- },
1543
- {
1544
- "cell_type": "markdown",
1545
- "id": "07001b88",
1546
- "metadata": {},
1547
- "source": [
1548
- "As seen above, the\n",
1549
- "function `np.linspace()` can be used to create a sequence\n",
1550
- "of numbers."
1551
- ]
1552
- },
1553
- {
1554
- "cell_type": "code",
1555
- "execution_count": null,
1556
- "id": "cd971131",
1557
- "metadata": {
1558
- "execution": {},
1559
- "lines_to_next_cell": 2
1560
- },
1561
- "outputs": [],
1562
- "source": [
1563
- "seq1 = np.linspace(0, 10, 11)\n",
1564
- "seq1\n"
1565
- ]
1566
- },
1567
- {
1568
- "cell_type": "markdown",
1569
- "id": "926f96fc",
1570
- "metadata": {},
1571
- "source": [
1572
- "The function `np.arange()`\n",
1573
- " returns a sequence of numbers spaced out by `step`. If `step` is not specified, then a default value of $1$ is used. Let's create a sequence\n",
1574
- " that starts at $0$ and ends at $10$."
1575
- ]
1576
- },
1577
- {
1578
- "cell_type": "code",
1579
- "execution_count": null,
1580
- "id": "aa630d16",
1581
- "metadata": {
1582
- "execution": {}
1583
- },
1584
- "outputs": [],
1585
- "source": [
1586
- "seq2 = np.arange(0, 10)\n",
1587
- "seq2\n"
1588
- ]
1589
- },
1590
- {
1591
- "cell_type": "markdown",
1592
- "id": "6908bad7",
1593
- "metadata": {},
1594
- "source": [
1595
- "Why isn't $10$ output above? This has to do with *slice* notation in `Python`. \n",
1596
- "Slice notation \n",
1597
- "is used to index sequences such as lists, tuples and arrays.\n",
1598
- "Suppose we want to retrieve the fourth through sixth (inclusive) entries\n",
1599
- "of a string. We obtain a slice of the string using the indexing notation `[3:6]`."
1600
- ]
1601
- },
1602
- {
1603
- "cell_type": "code",
1604
- "execution_count": null,
1605
- "id": "89955ee2",
1606
- "metadata": {
1607
- "execution": {},
1608
- "lines_to_next_cell": 0
1609
- },
1610
- "outputs": [],
1611
- "source": [
1612
- "\"hello world\"[3:6]"
1613
- ]
1614
- },
1615
- {
1616
- "cell_type": "markdown",
1617
- "id": "17d73e4d",
1618
- "metadata": {},
1619
- "source": [
1620
- "In the code block above, the notation `3:6` is shorthand for `slice(3,6)` when used inside\n",
1621
- "`[]`. "
1622
- ]
1623
- },
1624
- {
1625
- "cell_type": "code",
1626
- "execution_count": null,
1627
- "id": "517f592d",
1628
- "metadata": {
1629
- "execution": {}
1630
- },
1631
- "outputs": [],
1632
- "source": [
1633
- "\"hello world\"[slice(3,6)]\n"
1634
- ]
1635
- },
1636
- {
1637
- "cell_type": "markdown",
1638
- "id": "680fe656",
1639
- "metadata": {},
1640
- "source": [
1641
- "You might have expected `slice(3,6)` to output the fourth through seventh characters in the text string (recalling that `Python` begins its indexing at zero), but instead it output the fourth through sixth. \n",
1642
- " This also explains why the earlier `np.arange(0, 10)` command output only the integers from $0$ to $9$. \n",
1643
- "See the documentation `slice?` for useful options in creating slices. \n",
1644
- "\n",
1645
- " \n",
1646
- "\n",
1647
- "\n",
1648
- "\n",
1649
- " \n",
1650
- "\n",
1651
- "\n",
1652
- " \n",
1653
- "\n",
1654
- " \n",
1655
- "\n",
1656
- " \n",
1657
- "\n",
1658
- " \n",
1659
- "\n",
1660
- " \n",
1661
- "\n",
1662
- "\n",
1663
- " \n"
1664
- ]
1665
- },
1666
- {
1667
- "cell_type": "markdown",
1668
- "id": "522a2761",
1669
- "metadata": {},
1670
- "source": [
1671
- "## Indexing Data\n",
1672
- "To begin, we create a two-dimensional `numpy` array."
1673
- ]
1674
- },
1675
- {
1676
- "cell_type": "code",
1677
- "execution_count": null,
1678
- "id": "35927abd",
1679
- "metadata": {
1680
- "execution": {}
1681
- },
1682
- "outputs": [],
1683
- "source": [
1684
- "A = np.array(np.arange(16)).reshape((4, 4))\n",
1685
- "A\n"
1686
- ]
1687
- },
1688
- {
1689
- "cell_type": "markdown",
1690
- "id": "27c88984",
1691
- "metadata": {},
1692
- "source": [
1693
- "Typing `A[1,2]` retrieves the element corresponding to the second row and third\n",
1694
- "column. (As usual, `Python` indexes from $0.$)"
1695
- ]
1696
- },
1697
- {
1698
- "cell_type": "code",
1699
- "execution_count": null,
1700
- "id": "78ee7f5b",
1701
- "metadata": {
1702
- "execution": {}
1703
- },
1704
- "outputs": [],
1705
- "source": [
1706
- "A[1,2]\n"
1707
- ]
1708
- },
1709
- {
1710
- "cell_type": "markdown",
1711
- "id": "dd65ec1c",
1712
- "metadata": {},
1713
- "source": [
1714
- "The first number after the open-bracket symbol `[`\n",
1715
- " refers to the row, and the second number refers to the column. \n",
1716
- "\n",
1717
- "### Indexing Rows, Columns, and Submatrices\n",
1718
- " To select multiple rows at a time, we can pass in a list\n",
1719
- " specifying our selection. For instance, `[1,3]` will retrieve the second and fourth rows:"
1720
- ]
1721
- },
1722
- {
1723
- "cell_type": "code",
1724
- "execution_count": null,
1725
- "id": "16212696",
1726
- "metadata": {
1727
- "execution": {}
1728
- },
1729
- "outputs": [],
1730
- "source": [
1731
- "A[[1,3]]\n"
1732
- ]
1733
- },
1734
- {
1735
- "cell_type": "markdown",
1736
- "id": "0b8b3ce3",
1737
- "metadata": {},
1738
- "source": [
1739
- "To select the first and third columns, we pass in `[0,2]` as the second argument in the square brackets.\n",
1740
- "In this case we need to supply the first argument `:` \n",
1741
- "which selects all rows."
1742
- ]
1743
- },
1744
- {
1745
- "cell_type": "code",
1746
- "execution_count": null,
1747
- "id": "d5f473d2",
1748
- "metadata": {
1749
- "execution": {}
1750
- },
1751
- "outputs": [],
1752
- "source": [
1753
- "A[:,[0,2]]\n"
1754
- ]
1755
- },
1756
- {
1757
- "cell_type": "markdown",
1758
- "id": "471ed1b4",
1759
- "metadata": {},
1760
- "source": [
1761
- "Now, suppose that we want to select the submatrix made up of the second and fourth \n",
1762
- "rows as well as the first and third columns. This is where\n",
1763
- "indexing gets slightly tricky. It is natural to try to use lists to retrieve the rows and columns:"
1764
- ]
1765
- },
1766
- {
1767
- "cell_type": "code",
1768
- "execution_count": null,
1769
- "id": "c89646d6",
1770
- "metadata": {
1771
- "execution": {}
1772
- },
1773
- "outputs": [],
1774
- "source": [
1775
- "A[[1,3],[0,2]]\n"
1776
- ]
1777
- },
1778
- {
1779
- "cell_type": "markdown",
1780
- "id": "9cbf1ff9",
1781
- "metadata": {},
1782
- "source": [
1783
- " Oops --- what happened? We got a one-dimensional array of length two identical to"
1784
- ]
1785
- },
1786
- {
1787
- "cell_type": "code",
1788
- "execution_count": null,
1789
- "id": "87f6b4f2",
1790
- "metadata": {
1791
- "execution": {}
1792
- },
1793
- "outputs": [],
1794
- "source": [
1795
- "np.array([A[1,0],A[3,2]])\n"
1796
- ]
1797
- },
1798
- {
1799
- "cell_type": "markdown",
1800
- "id": "9a93dc96",
1801
- "metadata": {},
1802
- "source": [
1803
- " Similarly, the following code fails to extract the submatrix comprised of the second and fourth rows and the first, third, and fourth columns:"
1804
- ]
1805
- },
1806
- {
1807
- "cell_type": "code",
1808
- "execution_count": null,
1809
- "id": "5da5bda8",
1810
- "metadata": {
1811
- "execution": {}
1812
- },
1813
- "outputs": [],
1814
- "source": [
1815
- "A[[1,3],[0,2,3]]\n"
1816
- ]
1817
- },
1818
- {
1819
- "cell_type": "markdown",
1820
- "id": "f4fd2f83",
1821
- "metadata": {},
1822
- "source": [
1823
- "We can see what has gone wrong here. When supplied with two indexing lists, the `numpy` interpretation is that these provide pairs of $i,j$ indices for a series of entries. That is why the pair of lists must have the same length. However, that was not our intent, since we are looking for a submatrix.\n",
1824
- "\n",
1825
- "One easy way to do this is as follows. We first create a submatrix by subsetting the rows of `A`, and then on the fly we make a further submatrix by subsetting its columns.\n"
1826
- ]
1827
- },
1828
- {
1829
- "cell_type": "code",
1830
- "execution_count": null,
1831
- "id": "ac48a95b",
1832
- "metadata": {
1833
- "execution": {},
1834
- "lines_to_next_cell": 0
1835
- },
1836
- "outputs": [],
1837
- "source": [
1838
- "A[[1,3]][:,[0,2]]\n"
1839
- ]
1840
- },
1841
- {
1842
- "cell_type": "markdown",
1843
- "id": "5e8388aa",
1844
- "metadata": {},
1845
- "source": [
1846
- " "
1847
- ]
1848
- },
1849
- {
1850
- "cell_type": "markdown",
1851
- "id": "a09467cd",
1852
- "metadata": {},
1853
- "source": [
1854
- "There are more efficient ways of achieving the same result.\n",
1855
- "\n",
1856
- "The *convenience function* `np.ix_()` allows us to extract a submatrix\n",
1857
- "using lists, by creating an intermediate *mesh* object."
1858
- ]
1859
- },
1860
- {
1861
- "cell_type": "code",
1862
- "execution_count": null,
1863
- "id": "ee195cc4",
1864
- "metadata": {
1865
- "execution": {},
1866
- "lines_to_next_cell": 2
1867
- },
1868
- "outputs": [],
1869
- "source": [
1870
- "idx = np.ix_([1,3],[0,2,3])\n",
1871
- "A[idx]\n"
1872
- ]
1873
- },
1874
- {
1875
- "cell_type": "markdown",
1876
- "id": "b7177cb9",
1877
- "metadata": {},
1878
- "source": [
1879
- "Alternatively, we can subset matrices efficiently using slices.\n",
1880
- " \n",
1881
- "The slice\n",
1882
- "`1:4:2` captures the second and fourth items of a sequence, while the slice `0:3:2` captures\n",
1883
- "the first and third items (the third element in a slice sequence is the step size)."
1884
- ]
1885
- },
1886
- {
1887
- "cell_type": "code",
1888
- "execution_count": null,
1889
- "id": "48917bb5",
1890
- "metadata": {
1891
- "execution": {},
1892
- "lines_to_next_cell": 0
1893
- },
1894
- "outputs": [],
1895
- "source": [
1896
- "A[1:4:2,0:3:2]\n"
1897
- ]
1898
- },
1899
- {
1900
- "cell_type": "markdown",
1901
- "id": "697c5ab0",
1902
- "metadata": {},
1903
- "source": [
1904
- " "
1905
- ]
1906
- },
1907
- {
1908
- "cell_type": "markdown",
1909
- "id": "c647dbf0",
1910
- "metadata": {},
1911
- "source": [
1912
- "Why are we able to retrieve a submatrix directly using slices but not using lists?\n",
1913
- "Its because they are different `Python` types, and\n",
1914
- "are treated differently by `numpy`.\n",
1915
- "Slices can be used to extract objects from arbitrary sequences, such as strings, lists, and tuples, while the use of lists for indexing is more limited.\n",
1916
- "\n",
1917
- "\n",
1918
- "\n",
1919
- "\n",
1920
- " \n",
1921
- "\n",
1922
- " \n",
1923
- "\n",
1924
- " \n",
1925
- "\n",
1926
- " "
1927
- ]
1928
- },
1929
- {
1930
- "cell_type": "markdown",
1931
- "id": "2dce8961",
1932
- "metadata": {},
1933
- "source": [
1934
- "### Boolean Indexing\n",
1935
- "In `numpy`, a *Boolean* is a type that equals either `True` or `False` (also represented as $1$ and $0$, respectively).\n",
1936
- "The next line creates a vector of $0$'s, represented as Booleans, of length equal to the first dimension of `A`. "
1937
- ]
1938
- },
1939
- {
1940
- "cell_type": "code",
1941
- "execution_count": null,
1942
- "id": "5d4caf22",
1943
- "metadata": {
1944
- "execution": {},
1945
- "lines_to_next_cell": 0
1946
- },
1947
- "outputs": [],
1948
- "source": [
1949
- "keep_rows = np.zeros(A.shape[0], bool)\n",
1950
- "keep_rows"
1951
- ]
1952
- },
1953
- {
1954
- "cell_type": "markdown",
1955
- "id": "d83fadb5",
1956
- "metadata": {},
1957
- "source": [
1958
- "We now set two of the elements to `True`. "
1959
- ]
1960
- },
1961
- {
1962
- "cell_type": "code",
1963
- "execution_count": null,
1964
- "id": "348820e3",
1965
- "metadata": {
1966
- "execution": {}
1967
- },
1968
- "outputs": [],
1969
- "source": [
1970
- "keep_rows[[1,3]] = True\n",
1971
- "keep_rows\n"
1972
- ]
1973
- },
1974
- {
1975
- "cell_type": "markdown",
1976
- "id": "a0fb487d",
1977
- "metadata": {},
1978
- "source": [
1979
- "Note that the elements of `keep_rows`, when viewed as integers, are the same as the\n",
1980
- "values of `np.array([0,1,0,1])`. Below, we use `==` to verify their equality. When\n",
1981
- "applied to two arrays, the `==` operation is applied elementwise."
1982
- ]
1983
- },
1984
- {
1985
- "cell_type": "code",
1986
- "execution_count": null,
1987
- "id": "4aafe45b",
1988
- "metadata": {
1989
- "execution": {}
1990
- },
1991
- "outputs": [],
1992
- "source": [
1993
- "np.all(keep_rows == np.array([0,1,0,1]))\n"
1994
- ]
1995
- },
1996
- {
1997
- "cell_type": "markdown",
1998
- "id": "603c0c53",
1999
- "metadata": {},
2000
- "source": [
2001
- "(Here, the function `np.all()` has checked whether\n",
2002
- "all entries of an array are `True`. A similar function, `np.any()`, can be used to check whether any entries of an array are `True`.)"
2003
- ]
2004
- },
2005
- {
2006
- "cell_type": "markdown",
2007
- "id": "b0a449d1",
2008
- "metadata": {},
2009
- "source": [
2010
- " However, even though `np.array([0,1,0,1])` and `keep_rows` are equal according to `==`, they index different sets of rows!\n",
2011
- "The former retrieves the first, second, first, and second rows of `A`. "
2012
- ]
2013
- },
2014
- {
2015
- "cell_type": "code",
2016
- "execution_count": null,
2017
- "id": "1be6a588",
2018
- "metadata": {
2019
- "execution": {}
2020
- },
2021
- "outputs": [],
2022
- "source": [
2023
- "A[np.array([0,1,0,1])]\n"
2024
- ]
2025
- },
2026
- {
2027
- "cell_type": "markdown",
2028
- "id": "e45bbebe",
2029
- "metadata": {},
2030
- "source": [
2031
- " By contrast, `keep_rows` retrieves only the second and fourth rows of `A` --- i.e. the rows for which the Boolean equals `TRUE`. "
2032
- ]
2033
- },
2034
- {
2035
- "cell_type": "code",
2036
- "execution_count": null,
2037
- "id": "e83da57b",
2038
- "metadata": {
2039
- "execution": {}
2040
- },
2041
- "outputs": [],
2042
- "source": [
2043
- "A[keep_rows]\n"
2044
- ]
2045
- },
2046
- {
2047
- "cell_type": "markdown",
2048
- "id": "374d34a7",
2049
- "metadata": {},
2050
- "source": [
2051
- "This example shows that Booleans and integers are treated differently by `numpy`."
2052
- ]
2053
- },
2054
- {
2055
- "cell_type": "markdown",
2056
- "id": "25db74bf",
2057
- "metadata": {},
2058
- "source": [
2059
- "We again make use of the `np.ix_()` function\n",
2060
- " to create a mesh containing the second and fourth rows, and the first, third, and fourth columns. This time, we apply the function to Booleans,\n",
2061
- " rather than lists."
2062
- ]
2063
- },
2064
- {
2065
- "cell_type": "code",
2066
- "execution_count": null,
2067
- "id": "09675294",
2068
- "metadata": {
2069
- "execution": {}
2070
- },
2071
- "outputs": [],
2072
- "source": [
2073
- "keep_cols = np.zeros(A.shape[1], bool)\n",
2074
- "keep_cols[[0, 2, 3]] = True\n",
2075
- "idx_bool = np.ix_(keep_rows, keep_cols)\n",
2076
- "A[idx_bool]\n"
2077
- ]
2078
- },
2079
- {
2080
- "cell_type": "markdown",
2081
- "id": "0166c179",
2082
- "metadata": {},
2083
- "source": [
2084
- "We can also mix a list with an array of Booleans in the arguments to `np.ix_()`:"
2085
- ]
2086
- },
2087
- {
2088
- "cell_type": "code",
2089
- "execution_count": null,
2090
- "id": "a85614e4",
2091
- "metadata": {
2092
- "execution": {},
2093
- "lines_to_next_cell": 0
2094
- },
2095
- "outputs": [],
2096
- "source": [
2097
- "idx_mixed = np.ix_([1,3], keep_cols)\n",
2098
- "A[idx_mixed]\n"
2099
- ]
2100
- },
2101
- {
2102
- "cell_type": "markdown",
2103
- "id": "f6a338f1",
2104
- "metadata": {},
2105
- "source": [
2106
- " "
2107
- ]
2108
- },
2109
- {
2110
- "cell_type": "markdown",
2111
- "id": "b3541e0c",
2112
- "metadata": {},
2113
- "source": [
2114
- "For more details on indexing in `numpy`, readers are referred\n",
2115
- "to the `numpy` tutorial mentioned earlier.\n"
2116
- ]
2117
- },
2118
- {
2119
- "cell_type": "markdown",
2120
- "id": "ab75f168",
2121
- "metadata": {},
2122
- "source": [
2123
- "## Loading Data\n",
2124
- "\n",
2125
- "Data sets often contain different types of data, and may have names associated with the rows or columns. \n",
2126
- "For these reasons, they typically are best accommodated using a\n",
2127
- " *data frame*. \n",
2128
- " We can think of a data frame as a sequence\n",
2129
- "of arrays of identical length; these are the columns. Entries in the\n",
2130
- "different arrays can be combined to form a row.\n",
2131
- " The `pandas`\n",
2132
- "library can be used to create and work with data frame objects."
2133
- ]
2134
- },
2135
- {
2136
- "cell_type": "markdown",
2137
- "id": "ca018d13",
2138
- "metadata": {},
2139
- "source": [
2140
- "### Reading in a Data Set\n",
2141
- "\n",
2142
- "The first step of most analyses involves importing a data set into\n",
2143
- "`Python`. \n",
2144
- " Before attempting to load\n",
2145
- "a data set, we must make sure that `Python` knows where to find the file containing it. \n",
2146
- "If the\n",
2147
- "file is in the same location\n",
2148
- "as this notebook file, then we are all set. \n",
2149
- "Otherwise, \n",
2150
- "the command\n",
2151
- "`os.chdir()` can be used to *change directory*. (You will need to call `import os` before calling `os.chdir()`.) "
2152
- ]
2153
- },
2154
- {
2155
- "cell_type": "markdown",
2156
- "id": "b76342df",
2157
- "metadata": {},
2158
- "source": [
2159
- "We will begin by reading in `Auto.csv`, available on the book website. This is a comma-separated file, and can be read in using `pd.read_csv()`: "
2160
- ]
2161
- },
2162
- {
2163
- "cell_type": "code",
2164
- "execution_count": null,
2165
- "id": "ff81e644",
2166
- "metadata": {
2167
- "execution": {}
2168
- },
2169
- "outputs": [],
2170
- "source": [
2171
- "import pandas as pd\n",
2172
- "Auto = pd.read_csv('Auto.csv')\n",
2173
- "Auto\n"
2174
- ]
2175
- },
2176
- {
2177
- "cell_type": "markdown",
2178
- "id": "42d6a799",
2179
- "metadata": {},
2180
- "source": [
2181
- "The book website also has a whitespace-delimited version of this data, called `Auto.data`. This can be read in as follows:"
2182
- ]
2183
- },
2184
- {
2185
- "cell_type": "code",
2186
- "execution_count": null,
2187
- "id": "5b45aa7f",
2188
- "metadata": {
2189
- "execution": {},
2190
- "lines_to_next_cell": 0
2191
- },
2192
- "outputs": [],
2193
- "source": [
2194
- "Auto = pd.read_csv('Auto.data', delim_whitespace=True)\n"
2195
- ]
2196
- },
2197
- {
2198
- "cell_type": "markdown",
2199
- "id": "f942c457",
2200
- "metadata": {},
2201
- "source": [
2202
- " Both `Auto.csv` and `Auto.data` are simply text\n",
2203
- "files. Before loading data into `Python`, it is a good idea to view it using\n",
2204
- "a text editor or other software, such as Microsoft Excel.\n",
2205
- "\n"
2206
- ]
2207
- },
2208
- {
2209
- "cell_type": "markdown",
2210
- "id": "1aceff38",
2211
- "metadata": {},
2212
- "source": [
2213
- "We now take a look at the column of `Auto` corresponding to the variable `horsepower`: "
2214
- ]
2215
- },
2216
- {
2217
- "cell_type": "code",
2218
- "execution_count": null,
2219
- "id": "413f626a",
2220
- "metadata": {
2221
- "execution": {},
2222
- "lines_to_next_cell": 0
2223
- },
2224
- "outputs": [],
2225
- "source": [
2226
- "Auto['horsepower']\n"
2227
- ]
2228
- },
2229
- {
2230
- "cell_type": "markdown",
2231
- "id": "fd11e757",
2232
- "metadata": {},
2233
- "source": [
2234
- "We see that the `dtype` of this column is `object`. \n",
2235
- "It turns out that all values of the `horsepower` column were interpreted as strings when reading\n",
2236
- "in the data. \n",
2237
- "We can find out why by looking at the unique values."
2238
- ]
2239
- },
2240
- {
2241
- "cell_type": "code",
2242
- "execution_count": null,
2243
- "id": "57b86346",
2244
- "metadata": {
2245
- "execution": {},
2246
- "lines_to_next_cell": 0
2247
- },
2248
- "outputs": [],
2249
- "source": [
2250
- "np.unique(Auto['horsepower'])\n"
2251
- ]
2252
- },
2253
- {
2254
- "cell_type": "markdown",
2255
- "id": "f0aee233",
2256
- "metadata": {},
2257
- "source": [
2258
- "We see the culprit is the value `?`, which is being used to encode missing values.\n",
2259
- "\n"
2260
- ]
2261
- },
2262
- {
2263
- "cell_type": "markdown",
2264
- "id": "b7b032d4",
2265
- "metadata": {},
2266
- "source": [
2267
- "To fix the problem, we must provide `pd.read_csv()` with an argument called `na_values`.\n",
2268
- "Now, each instance of `?` in the file is replaced with the\n",
2269
- "value `np.nan`, which means *not a number*:"
2270
- ]
2271
- },
2272
- {
2273
- "cell_type": "code",
2274
- "execution_count": null,
2275
- "id": "a9698b26",
2276
- "metadata": {
2277
- "execution": {},
2278
- "lines_to_next_cell": 2
2279
- },
2280
- "outputs": [],
2281
- "source": [
2282
- "Auto = pd.read_csv('Auto.data',\n",
2283
- " na_values=['?'],\n",
2284
- " delim_whitespace=True)\n",
2285
- "Auto['horsepower'].sum()\n"
2286
- ]
2287
- },
2288
- {
2289
- "cell_type": "markdown",
2290
- "id": "13cb364e",
2291
- "metadata": {},
2292
- "source": [
2293
- "The `Auto.shape` attribute tells us that the data has 397\n",
2294
- "observations, or rows, and nine variables, or columns."
2295
- ]
2296
- },
2297
- {
2298
- "cell_type": "code",
2299
- "execution_count": null,
2300
- "id": "4877cb2c",
2301
- "metadata": {
2302
- "execution": {}
2303
- },
2304
- "outputs": [],
2305
- "source": [
2306
- "Auto.shape\n"
2307
- ]
2308
- },
2309
- {
2310
- "cell_type": "markdown",
2311
- "id": "3fdc6f47",
2312
- "metadata": {},
2313
- "source": [
2314
- "There are\n",
2315
- "various ways to deal with missing data. \n",
2316
- "In this case, since only five of the rows contain missing\n",
2317
- "observations, we choose to use the `Auto.dropna()` method to simply remove these rows."
2318
- ]
2319
- },
2320
- {
2321
- "cell_type": "code",
2322
- "execution_count": null,
2323
- "id": "2ba1d33d",
2324
- "metadata": {
2325
- "execution": {},
2326
- "lines_to_next_cell": 2
2327
- },
2328
- "outputs": [],
2329
- "source": [
2330
- "Auto_new = Auto.dropna()\n",
2331
- "Auto_new.shape\n"
2332
- ]
2333
- },
2334
- {
2335
- "cell_type": "markdown",
2336
- "id": "ac9748d9",
2337
- "metadata": {},
2338
- "source": [
2339
- "### Basics of Selecting Rows and Columns\n",
2340
- " \n",
2341
- "We can use `Auto.columns` to check the variable names."
2342
- ]
2343
- },
2344
- {
2345
- "cell_type": "code",
2346
- "execution_count": null,
2347
- "id": "3d03baab",
2348
- "metadata": {
2349
- "execution": {},
2350
- "lines_to_next_cell": 2
2351
- },
2352
- "outputs": [],
2353
- "source": [
2354
- "Auto = Auto_new # overwrite the previous value\n",
2355
- "Auto.columns\n"
2356
- ]
2357
- },
2358
- {
2359
- "cell_type": "markdown",
2360
- "id": "d24d4d42",
2361
- "metadata": {},
2362
- "source": [
2363
- "Accessing the rows and columns of a data frame is similar, but not identical, to accessing the rows and columns of an array. \n",
2364
- "Recall that the first argument to the `[]` method\n",
2365
- "is always applied to the rows of the array. \n",
2366
- "Similarly, \n",
2367
- "passing in a slice to the `[]` method creates a data frame whose *rows* are determined by the slice:"
2368
- ]
2369
- },
2370
- {
2371
- "cell_type": "code",
2372
- "execution_count": null,
2373
- "id": "410b4dd7",
2374
- "metadata": {
2375
- "execution": {},
2376
- "lines_to_next_cell": 0
2377
- },
2378
- "outputs": [],
2379
- "source": [
2380
- "Auto[:3]\n"
2381
- ]
2382
- },
2383
- {
2384
- "cell_type": "markdown",
2385
- "id": "4ea0be7b",
2386
- "metadata": {},
2387
- "source": [
2388
- "Similarly, an array of Booleans can be used to subset the rows:"
2389
- ]
2390
- },
2391
- {
2392
- "cell_type": "code",
2393
- "execution_count": null,
2394
- "id": "3540804d",
2395
- "metadata": {
2396
- "execution": {},
2397
- "lines_to_next_cell": 0
2398
- },
2399
- "outputs": [],
2400
- "source": [
2401
- "idx_80 = Auto['year'] > 80\n",
2402
- "Auto[idx_80]\n"
2403
- ]
2404
- },
2405
- {
2406
- "cell_type": "markdown",
2407
- "id": "a02221a2",
2408
- "metadata": {},
2409
- "source": [
2410
- "However, if we pass in a list of strings to the `[]` method, then we obtain a data frame containing the corresponding set of *columns*. "
2411
- ]
2412
- },
2413
- {
2414
- "cell_type": "code",
2415
- "execution_count": null,
2416
- "id": "66d174f1",
2417
- "metadata": {
2418
- "execution": {},
2419
- "lines_to_next_cell": 0
2420
- },
2421
- "outputs": [],
2422
- "source": [
2423
- "Auto[['mpg', 'horsepower']]\n"
2424
- ]
2425
- },
2426
- {
2427
- "cell_type": "markdown",
2428
- "id": "54bef6a3",
2429
- "metadata": {},
2430
- "source": [
2431
- "Since we did not specify an *index* column when we loaded our data frame, the rows are labeled using integers\n",
2432
- "0 to 396."
2433
- ]
2434
- },
2435
- {
2436
- "cell_type": "code",
2437
- "execution_count": null,
2438
- "id": "52789c77",
2439
- "metadata": {
2440
- "execution": {},
2441
- "lines_to_next_cell": 0
2442
- },
2443
- "outputs": [],
2444
- "source": [
2445
- "Auto.index\n"
2446
- ]
2447
- },
2448
- {
2449
- "cell_type": "markdown",
2450
- "id": "3f5fcb26",
2451
- "metadata": {},
2452
- "source": [
2453
- "We can use the\n",
2454
- "`set_index()` method to re-name the rows using the contents of `Auto['name']`. "
2455
- ]
2456
- },
2457
- {
2458
- "cell_type": "code",
2459
- "execution_count": null,
2460
- "id": "d83650bf",
2461
- "metadata": {
2462
- "execution": {}
2463
- },
2464
- "outputs": [],
2465
- "source": [
2466
- "Auto_re = Auto.set_index('name')\n",
2467
- "Auto_re\n"
2468
- ]
2469
- },
2470
- {
2471
- "cell_type": "code",
2472
- "execution_count": null,
2473
- "id": "880d79d9",
2474
- "metadata": {
2475
- "execution": {},
2476
- "lines_to_next_cell": 0
2477
- },
2478
- "outputs": [],
2479
- "source": [
2480
- "Auto_re.columns\n"
2481
- ]
2482
- },
2483
- {
2484
- "cell_type": "markdown",
2485
- "id": "dbee53b8",
2486
- "metadata": {},
2487
- "source": [
2488
- "We see that the column `'name'` is no longer there.\n",
2489
- " \n",
2490
- "Now that the index has been set to `name`, we can access rows of the data \n",
2491
- "frame by `name` using the `{loc[]`} method of\n",
2492
- "`Auto`:"
2493
- ]
2494
- },
2495
- {
2496
- "cell_type": "code",
2497
- "execution_count": null,
2498
- "id": "c01f4095",
2499
- "metadata": {
2500
- "execution": {},
2501
- "lines_to_next_cell": 0
2502
- },
2503
- "outputs": [],
2504
- "source": [
2505
- "rows = ['amc rebel sst', 'ford torino']\n",
2506
- "Auto_re.loc[rows]\n"
2507
- ]
2508
- },
2509
- {
2510
- "cell_type": "markdown",
2511
- "id": "29688cab",
2512
- "metadata": {},
2513
- "source": [
2514
- "As an alternative to using the index name, we could retrieve the 4th and 5th rows of `Auto` using the `{iloc[]`} method:"
2515
- ]
2516
- },
2517
- {
2518
- "cell_type": "code",
2519
- "execution_count": null,
2520
- "id": "a4202eb8",
2521
- "metadata": {
2522
- "execution": {},
2523
- "lines_to_next_cell": 0
2524
- },
2525
- "outputs": [],
2526
- "source": [
2527
- "Auto_re.iloc[[3,4]]\n"
2528
- ]
2529
- },
2530
- {
2531
- "cell_type": "markdown",
2532
- "id": "5427ede0",
2533
- "metadata": {},
2534
- "source": [
2535
- "We can also use it to retrieve the 1st, 3rd and and 4th columns of `Auto_re`:"
2536
- ]
2537
- },
2538
- {
2539
- "cell_type": "code",
2540
- "execution_count": null,
2541
- "id": "948b2d07",
2542
- "metadata": {
2543
- "execution": {},
2544
- "lines_to_next_cell": 0
2545
- },
2546
- "outputs": [],
2547
- "source": [
2548
- "Auto_re.iloc[:,[0,2,3]]\n"
2549
- ]
2550
- },
2551
- {
2552
- "cell_type": "markdown",
2553
- "id": "b83d56eb",
2554
- "metadata": {},
2555
- "source": [
2556
- "We can extract the 4th and 5th rows, as well as the 1st, 3rd and 4th columns, using\n",
2557
- "a single call to `iloc[]`:"
2558
- ]
2559
- },
2560
- {
2561
- "cell_type": "code",
2562
- "execution_count": null,
2563
- "id": "1cfdcc5c",
2564
- "metadata": {
2565
- "execution": {},
2566
- "lines_to_next_cell": 0
2567
- },
2568
- "outputs": [],
2569
- "source": [
2570
- "Auto_re.iloc[[3,4],[0,2,3]]\n"
2571
- ]
2572
- },
2573
- {
2574
- "cell_type": "markdown",
2575
- "id": "2bde6514",
2576
- "metadata": {},
2577
- "source": [
2578
- "Index entries need not be unique: there are several cars in the data frame named `ford galaxie 500`."
2579
- ]
2580
- },
2581
- {
2582
- "cell_type": "code",
2583
- "execution_count": null,
2584
- "id": "fd9c5cda",
2585
- "metadata": {
2586
- "execution": {},
2587
- "lines_to_next_cell": 0
2588
- },
2589
- "outputs": [],
2590
- "source": [
2591
- "Auto_re.loc['ford galaxie 500', ['mpg', 'origin']]\n"
2592
- ]
2593
- },
2594
- {
2595
- "cell_type": "markdown",
2596
- "id": "4d097282",
2597
- "metadata": {},
2598
- "source": [
2599
- "### More on Selecting Rows and Columns\n",
2600
- "Suppose now that we want to create a data frame consisting of the `weight` and `origin` of the subset of cars with \n",
2601
- "`year` greater than 80 --- i.e. those built after 1980.\n",
2602
- "To do this, we first create a Boolean array that indexes the rows.\n",
2603
- "The `loc[]` method allows for Boolean entries as well as strings:"
2604
- ]
2605
- },
2606
- {
2607
- "cell_type": "code",
2608
- "execution_count": null,
2609
- "id": "6d431cb5",
2610
- "metadata": {
2611
- "execution": {},
2612
- "lines_to_next_cell": 2
2613
- },
2614
- "outputs": [],
2615
- "source": [
2616
- "idx_80 = Auto_re['year'] > 80\n",
2617
- "Auto_re.loc[idx_80, ['weight', 'origin']]\n"
2618
- ]
2619
- },
2620
- {
2621
- "cell_type": "markdown",
2622
- "id": "838a03e0",
2623
- "metadata": {},
2624
- "source": [
2625
- "To do this more concisely, we can use an anonymous function called a `lambda`: "
2626
- ]
2627
- },
2628
- {
2629
- "cell_type": "code",
2630
- "execution_count": null,
2631
- "id": "fac41ce1",
2632
- "metadata": {
2633
- "execution": {},
2634
- "lines_to_next_cell": 0
2635
- },
2636
- "outputs": [],
2637
- "source": [
2638
- "Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]\n"
2639
- ]
2640
- },
2641
- {
2642
- "cell_type": "markdown",
2643
- "id": "08e61254",
2644
- "metadata": {},
2645
- "source": [
2646
- "The `lambda` call creates a function that takes a single\n",
2647
- "argument, here `df`, and returns `df['year']>80`.\n",
2648
- "Since it is created inside the `loc[]` method for the\n",
2649
- "dataframe `Auto_re`, that dataframe will be the argument supplied.\n",
2650
- "As another example of using a `lambda`, suppose that\n",
2651
- "we want all cars built after 1980 that achieve greater than 30 miles per gallon:"
2652
- ]
2653
- },
2654
- {
2655
- "cell_type": "code",
2656
- "execution_count": null,
2657
- "id": "b0885654",
2658
- "metadata": {
2659
- "execution": {},
2660
- "lines_to_next_cell": 0
2661
- },
2662
- "outputs": [],
2663
- "source": [
2664
- "Auto_re.loc[lambda df: (df['year'] > 80) & (df['mpg'] > 30),\n",
2665
- " ['weight', 'origin']\n",
2666
- " ]\n"
2667
- ]
2668
- },
2669
- {
2670
- "cell_type": "markdown",
2671
- "id": "d87fc459",
2672
- "metadata": {},
2673
- "source": [
2674
- "The symbol `&` computes an element-wise *and* operation.\n",
2675
- "As another example, suppose that we want to retrieve all `Ford` and `Datsun`\n",
2676
- "cars with `displacement` less than 300. We check whether each `name` entry contains either the string `ford` or `datsun` using the `str.contains()` method of the `index` attribute of \n",
2677
- "of the dataframe:"
2678
- ]
2679
- },
2680
- {
2681
- "cell_type": "code",
2682
- "execution_count": null,
2683
- "id": "213945a6",
2684
- "metadata": {
2685
- "execution": {},
2686
- "lines_to_next_cell": 0
2687
- },
2688
- "outputs": [],
2689
- "source": [
2690
- "Auto_re.loc[lambda df: (df['displacement'] < 300)\n",
2691
- " & (df.index.str.contains('ford')\n",
2692
- " | df.index.str.contains('datsun')),\n",
2693
- " ['weight', 'origin']\n",
2694
- " ]\n"
2695
- ]
2696
- },
2697
- {
2698
- "cell_type": "markdown",
2699
- "id": "8a940fd1",
2700
- "metadata": {},
2701
- "source": [
2702
- "Here, the symbol `|` computes an element-wise *or* operation.\n",
2703
- " \n",
2704
- "In summary, a powerful set of operations is available to index the rows and columns of data frames. For integer based queries, use the `iloc[]` method. For string and Boolean\n",
2705
- "selections, use the `loc[]` method. For functional queries that filter rows, use the `loc[]` method\n",
2706
- "with a function (typically a `lambda`) in the rows argument.\n",
2707
- "\n",
2708
- "## For Loops\n",
2709
- "A `for` loop is a standard tool in many languages that\n",
2710
- "repeatedly evaluates some chunk of code while\n",
2711
- "varying different values inside the code.\n",
2712
- "For example, suppose we loop over elements of a list and compute their sum."
2713
- ]
2714
- },
2715
- {
2716
- "cell_type": "code",
2717
- "execution_count": null,
2718
- "id": "a3c4060a",
2719
- "metadata": {
2720
- "execution": {},
2721
- "lines_to_next_cell": 0
2722
- },
2723
- "outputs": [],
2724
- "source": [
2725
- "total = 0\n",
2726
- "for value in [3,2,19]:\n",
2727
- " total += value\n",
2728
- "print('Total is: {0}'.format(total))\n"
2729
- ]
2730
- },
2731
- {
2732
- "cell_type": "markdown",
2733
- "id": "9117e3a1",
2734
- "metadata": {},
2735
- "source": [
2736
- "The indented code beneath the line with the `for` statement is run\n",
2737
- "for each value in the sequence\n",
2738
- "specified in the `for` statement. The loop ends either\n",
2739
- "when the cell ends or when code is indented at the same level\n",
2740
- "as the original `for` statement.\n",
2741
- "We see that the final line above which prints the total is executed\n",
2742
- "only once after the for loop has terminated. Loops\n",
2743
- "can be nested by additional indentation."
2744
- ]
2745
- },
2746
- {
2747
- "cell_type": "code",
2748
- "execution_count": null,
2749
- "id": "f2bffb69",
2750
- "metadata": {
2751
- "execution": {},
2752
- "lines_to_next_cell": 0
2753
- },
2754
- "outputs": [],
2755
- "source": [
2756
- "total = 0\n",
2757
- "for value in [2,3,19]:\n",
2758
- " for weight in [3, 2, 1]:\n",
2759
- " total += value * weight\n",
2760
- "print('Total is: {0}'.format(total))"
2761
- ]
2762
- },
2763
- {
2764
- "cell_type": "markdown",
2765
- "id": "9f99e85b",
2766
- "metadata": {},
2767
- "source": [
2768
- "Above, we summed over each combination of `value` and `weight`.\n",
2769
- "We also took advantage of the *increment* notation\n",
2770
- "in `Python`: the expression `a += b` is equivalent\n",
2771
- "to `a = a + b`. Besides\n",
2772
- "being a convenient notation, this can save time in computationally\n",
2773
- "heavy tasks in which the intermediate value of `a+b` need not\n",
2774
- "be explicitly created.\n",
2775
- "\n",
2776
- "Perhaps a more\n",
2777
- "common task would be to sum over `(value, weight)` pairs. For instance,\n",
2778
- "to compute the average value of a random variable that takes on\n",
2779
- "possible values 2, 3 or 19 with probability 0.2, 0.3, 0.5 respectively\n",
2780
- "we would compute the weighted sum. Tasks such as this\n",
2781
- "can often be accomplished using the `zip()` function that\n",
2782
- "loops over a sequence of tuples."
2783
- ]
2784
- },
2785
- {
2786
- "cell_type": "code",
2787
- "execution_count": null,
2788
- "id": "ee827a53",
2789
- "metadata": {
2790
- "execution": {}
2791
- },
2792
- "outputs": [],
2793
- "source": [
2794
- "total = 0\n",
2795
- "for value, weight in zip([2,3,19],\n",
2796
- " [0.2,0.3,0.5]):\n",
2797
- " total += weight * value\n",
2798
- "print('Weighted average is: {0}'.format(total))\n"
2799
- ]
2800
- },
2801
- {
2802
- "cell_type": "markdown",
2803
- "id": "dec18466",
2804
- "metadata": {},
2805
- "source": [
2806
- "### String Formatting\n",
2807
- "In the code chunk above we also printed a string\n",
2808
- "displaying the total. However, the object `total`\n",
2809
- "is an integer and not a string.\n",
2810
- "Inserting the value of something into\n",
2811
- "a string is a common task, made\n",
2812
- "simple using\n",
2813
- "some of the powerful string formatting\n",
2814
- "tools in `Python`.\n",
2815
- "Many data cleaning tasks involve\n",
2816
- "manipulating and programmatically\n",
2817
- "producing strings.\n",
2818
- "\n",
2819
- "For example we may want to loop over the columns of a data frame and\n",
2820
- "print the percent missing in each column.\n",
2821
- "Let’s create a data frame `D` with columns in which 20% of the entries are missing i.e. set\n",
2822
- "to `np.nan`. We’ll create the\n",
2823
- "values in `D` from a normal distribution with mean 0 and variance 1 using `rng.standard_normal()`\n",
2824
- "and then overwrite some random entries using `rng.choice()`."
2825
- ]
2826
- },
2827
- {
2828
- "cell_type": "code",
2829
- "execution_count": null,
2830
- "id": "3a097fbc",
2831
- "metadata": {
2832
- "execution": {},
2833
- "lines_to_next_cell": 2
2834
- },
2835
- "outputs": [],
2836
- "source": [
2837
- "rng = np.random.default_rng(1)\n",
2838
- "A = rng.standard_normal((127, 5))\n",
2839
- "M = rng.choice([0, np.nan], p=[0.8,0.2], size=A.shape)\n",
2840
- "A += M\n",
2841
- "D = pd.DataFrame(A, columns=['food',\n",
2842
- " 'bar',\n",
2843
- " 'pickle',\n",
2844
- " 'snack',\n",
2845
- " 'popcorn'])\n",
2846
- "D[:3]\n"
2847
- ]
2848
- },
2849
- {
2850
- "cell_type": "code",
2851
- "execution_count": null,
2852
- "id": "e064e170",
2853
- "metadata": {
2854
- "execution": {},
2855
- "lines_to_next_cell": 0
2856
- },
2857
- "outputs": [],
2858
- "source": [
2859
- "for col in D.columns:\n",
2860
- " template = 'Column \"{0}\" has {1:.2%} missing values'\n",
2861
- " print(template.format(col,\n",
2862
- " np.isnan(D[col]).mean()))\n"
2863
- ]
2864
- },
2865
- {
2866
- "cell_type": "markdown",
2867
- "id": "7a3e4dd8",
2868
- "metadata": {},
2869
- "source": [
2870
- "We see that the `template.format()` method expects two arguments `{0}`\n",
2871
- "and `{1:.2%}`, and the latter includes some formatting\n",
2872
- "information. In particular, it specifies that the second argument should be expressed as a percent with two decimal digits.\n",
2873
- "\n",
2874
- "The reference\n",
2875
- "[docs.python.org/3/library/string.html](https://docs.python.org/3/library/string.html)\n",
2876
- "includes many helpful and more complex examples."
2877
- ]
2878
- },
2879
- {
2880
- "cell_type": "markdown",
2881
- "id": "d8fd496a",
2882
- "metadata": {},
2883
- "source": [
2884
- "## Additional Graphical and Numerical Summaries\n",
2885
- "We can use the `ax.plot()` or `ax.scatter()` functions to display the quantitative variables. However, simply typing the variable names will produce an error message,\n",
2886
- "because `Python` does not know to look in the `Auto` data set for those variables."
2887
- ]
2888
- },
2889
- {
2890
- "cell_type": "code",
2891
- "execution_count": null,
2892
- "id": "c915ca52",
2893
- "metadata": {
2894
- "execution": {},
2895
- "lines_to_next_cell": 0
2896
- },
2897
- "outputs": [],
2898
- "source": [
2899
- "fig, ax = subplots(figsize=(8, 8))\n",
2900
- "ax.plot(horsepower, mpg, 'o');"
2901
- ]
2902
- },
2903
- {
2904
- "cell_type": "markdown",
2905
- "id": "63d47021",
2906
- "metadata": {},
2907
- "source": [
2908
- "We can address this by accessing the columns directly:"
2909
- ]
2910
- },
2911
- {
2912
- "cell_type": "code",
2913
- "execution_count": null,
2914
- "id": "65cd6d02",
2915
- "metadata": {
2916
- "execution": {},
2917
- "lines_to_next_cell": 0
2918
- },
2919
- "outputs": [],
2920
- "source": [
2921
- "fig, ax = subplots(figsize=(8, 8))\n",
2922
- "ax.plot(Auto['horsepower'], Auto['mpg'], 'o');\n"
2923
- ]
2924
- },
2925
- {
2926
- "cell_type": "markdown",
2927
- "id": "726836f0",
2928
- "metadata": {},
2929
- "source": [
2930
- "Alternatively, we can use the `plot()` method with the call `Auto.plot()`.\n",
2931
- "Using this method,\n",
2932
- "the variables can be accessed by name.\n",
2933
- "The plot methods of a data frame return a familiar object:\n",
2934
- "an axes. We can use it to update the plot as we did previously: "
2935
- ]
2936
- },
2937
- {
2938
- "cell_type": "code",
2939
- "execution_count": null,
2940
- "id": "76b5c0b1",
2941
- "metadata": {
2942
- "execution": {},
2943
- "lines_to_next_cell": 0
2944
- },
2945
- "outputs": [],
2946
- "source": [
2947
- "ax = Auto.plot.scatter('horsepower', 'mpg')\n",
2948
- "ax.set_title('Horsepower vs. MPG');"
2949
- ]
2950
- },
2951
- {
2952
- "cell_type": "markdown",
2953
- "id": "69c46251",
2954
- "metadata": {},
2955
- "source": [
2956
- "If we want to save\n",
2957
- "the figure that contains a given axes, we can find the relevant figure\n",
2958
- "by accessing the `figure` attribute:"
2959
- ]
2960
- },
2961
- {
2962
- "cell_type": "code",
2963
- "execution_count": null,
2964
- "id": "183a2c2b",
2965
- "metadata": {
2966
- "execution": {}
2967
- },
2968
- "outputs": [],
2969
- "source": [
2970
- "fig = ax.figure\n",
2971
- "fig.savefig('horsepower_mpg.png');"
2972
- ]
2973
- },
2974
- {
2975
- "cell_type": "markdown",
2976
- "id": "6f10cb46",
2977
- "metadata": {},
2978
- "source": [
2979
- "We can further instruct the data frame to plot to a particular axes object. In this\n",
2980
- "case the corresponding `plot()` method will return the\n",
2981
- "modified axes we passed in as an argument. Note that\n",
2982
- "when we request a one-dimensional grid of plots, the object `axes` is similarly\n",
2983
- "one-dimensional. We place our scatter plot in the middle plot of a row of three plots\n",
2984
- "within a figure."
2985
- ]
2986
- },
2987
- {
2988
- "cell_type": "code",
2989
- "execution_count": null,
2990
- "id": "75fbb981",
2991
- "metadata": {
2992
- "execution": {}
2993
- },
2994
- "outputs": [],
2995
- "source": [
2996
- "fig, axes = subplots(ncols=3, figsize=(15, 5))\n",
2997
- "Auto.plot.scatter('horsepower', 'mpg', ax=axes[1]);\n"
2998
- ]
2999
- },
3000
- {
3001
- "cell_type": "markdown",
3002
- "id": "53ffc0da",
3003
- "metadata": {},
3004
- "source": [
3005
- "Note also that the columns of a data frame can be accessed as attributes: try typing in `Auto.horsepower`. "
3006
- ]
3007
- },
3008
- {
3009
- "cell_type": "markdown",
3010
- "id": "1c4705e0",
3011
- "metadata": {},
3012
- "source": [
3013
- "We now consider the `cylinders` variable. Typing in `Auto.cylinders.dtype` reveals that it is being treated as a quantitative variable. \n",
3014
- "However, since there is only a small number of possible values for this variable, we may wish to treat it as \n",
3015
- " qualitative. Below, we replace\n",
3016
- "the `cylinders` column with a categorical version of `Auto.cylinders`. The function `pd.Series()` owes its name to the fact that `pandas` is often used in time series applications."
3017
- ]
3018
- },
3019
- {
3020
- "cell_type": "code",
3021
- "execution_count": null,
3022
- "id": "55b3a1cc",
3023
- "metadata": {
3024
- "execution": {},
3025
- "lines_to_next_cell": 0
3026
- },
3027
- "outputs": [],
3028
- "source": [
3029
- "Auto.cylinders = pd.Series(Auto.cylinders, dtype='category')\n",
3030
- "Auto.cylinders.dtype\n"
3031
- ]
3032
- },
3033
- {
3034
- "cell_type": "markdown",
3035
- "id": "adc75408",
3036
- "metadata": {},
3037
- "source": [
3038
- " Now that `cylinders` is qualitative, we can display it using\n",
3039
- " the `boxplot()` method."
3040
- ]
3041
- },
3042
- {
3043
- "cell_type": "code",
3044
- "execution_count": null,
3045
- "id": "f3d88794",
3046
- "metadata": {
3047
- "execution": {}
3048
- },
3049
- "outputs": [],
3050
- "source": [
3051
- "fig, ax = subplots(figsize=(8, 8))\n",
3052
- "Auto.boxplot('mpg', by='cylinders', ax=ax);\n"
3053
- ]
3054
- },
3055
- {
3056
- "cell_type": "markdown",
3057
- "id": "62d6582f",
3058
- "metadata": {},
3059
- "source": [
3060
- "The `hist()` method can be used to plot a *histogram*."
3061
- ]
3062
- },
3063
- {
3064
- "cell_type": "code",
3065
- "execution_count": null,
3066
- "id": "eea49f5b",
3067
- "metadata": {
3068
- "execution": {},
3069
- "lines_to_next_cell": 0
3070
- },
3071
- "outputs": [],
3072
- "source": [
3073
- "fig, ax = subplots(figsize=(8, 8))\n",
3074
- "Auto.hist('mpg', ax=ax);\n"
3075
- ]
3076
- },
3077
- {
3078
- "cell_type": "markdown",
3079
- "id": "c5a5933c",
3080
- "metadata": {},
3081
- "source": [
3082
- "The color of the bars and the number of bins can be changed:"
3083
- ]
3084
- },
3085
- {
3086
- "cell_type": "code",
3087
- "execution_count": null,
3088
- "id": "d5bcfff8",
3089
- "metadata": {
3090
- "execution": {},
3091
- "lines_to_next_cell": 0
3092
- },
3093
- "outputs": [],
3094
- "source": [
3095
- "fig, ax = subplots(figsize=(8, 8))\n",
3096
- "Auto.hist('mpg', color='red', bins=12, ax=ax);\n"
3097
- ]
3098
- },
3099
- {
3100
- "cell_type": "markdown",
3101
- "id": "60c36b6c",
3102
- "metadata": {},
3103
- "source": [
3104
- " See `Auto.hist?` for more plotting\n",
3105
- "options.\n",
3106
- " \n",
3107
- "We can use the `pd.plotting.scatter_matrix()` function to create a *scatterplot matrix* to visualize all of the pairwise relationships between the columns in\n",
3108
- "a data frame."
3109
- ]
3110
- },
3111
- {
3112
- "cell_type": "code",
3113
- "execution_count": null,
3114
- "id": "edb66cae",
3115
- "metadata": {
3116
- "execution": {},
3117
- "lines_to_next_cell": 0
3118
- },
3119
- "outputs": [],
3120
- "source": [
3121
- "pd.plotting.scatter_matrix(Auto);\n"
3122
- ]
3123
- },
3124
- {
3125
- "cell_type": "markdown",
3126
- "id": "0b162bd9",
3127
- "metadata": {},
3128
- "source": [
3129
- " We can also produce scatterplots\n",
3130
- "for a subset of the variables."
3131
- ]
3132
- },
3133
- {
3134
- "cell_type": "code",
3135
- "execution_count": null,
3136
- "id": "4f5d25d9",
3137
- "metadata": {
3138
- "execution": {},
3139
- "lines_to_next_cell": 0
3140
- },
3141
- "outputs": [],
3142
- "source": [
3143
- "pd.plotting.scatter_matrix(Auto[['mpg',\n",
3144
- " 'displacement',\n",
3145
- " 'weight']]);\n"
3146
- ]
3147
- },
3148
- {
3149
- "cell_type": "markdown",
3150
- "id": "8cae5dfc",
3151
- "metadata": {},
3152
- "source": [
3153
- "The `describe()` method produces a numerical summary of each column in a data frame."
3154
- ]
3155
- },
3156
- {
3157
- "cell_type": "code",
3158
- "execution_count": null,
3159
- "id": "ce7b23e2",
3160
- "metadata": {
3161
- "execution": {},
3162
- "lines_to_next_cell": 0
3163
- },
3164
- "outputs": [],
3165
- "source": [
3166
- "Auto[['mpg', 'weight']].describe()\n"
3167
- ]
3168
- },
3169
- {
3170
- "cell_type": "markdown",
3171
- "id": "d5042294",
3172
- "metadata": {},
3173
- "source": [
3174
- "We can also produce a summary of just a single column."
3175
- ]
3176
- },
3177
- {
3178
- "cell_type": "code",
3179
- "execution_count": null,
3180
- "id": "a6545d2f",
3181
- "metadata": {
3182
- "execution": {},
3183
- "lines_to_next_cell": 0
3184
- },
3185
- "outputs": [],
3186
- "source": [
3187
- "Auto['cylinders'].describe()\n",
3188
- "Auto['mpg'].describe()\n"
3189
- ]
3190
- },
3191
- {
3192
- "cell_type": "markdown",
3193
- "id": "c2ea7f81",
3194
- "metadata": {},
3195
- "source": [
3196
- "To exit `Jupyter`, select `File / Close and Halt`.\n",
3197
- "\n",
3198
- " \n",
3199
- "\n"
3200
- ]
3201
- }
3202
- ],
3203
- "metadata": {
3204
- "jupytext": {
3205
- "cell_metadata_filter": "-all",
3206
- "formats": "Rmd,ipynb",
3207
- "main_language": "python"
3208
- },
3209
- "kernelspec": {
3210
- "display_name": "Python 3 (ipykernel)",
3211
- "language": "python",
3212
- "name": "python3"
3213
- },
3214
- "language_info": {
3215
- "codemirror_mode": {
3216
- "name": "ipython",
3217
- "version": 3
3218
- },
3219
- "file_extension": ".py",
3220
- "mimetype": "text/x-python",
3221
- "name": "python",
3222
- "nbconvert_exporter": "python",
3223
- "pygments_lexer": "ipython3",
3224
- "version": "3.10.4"
3225
- }
3226
- },
3227
- "nbformat": 4,
3228
- "nbformat_minor": 5
3229
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Reference files/Week2_ref/Lecture_1_basics.ipynb DELETED
The diff for this file is too large to render. See raw diff