tmp
/
pip-install-ghxuqwgs
/numpy_78e94bf2b6094bf9a1f3d92042f9bf46
/doc
/source
/user
/whatisnumpy.rst
************** | |
What is NumPy? | |
************** | |
NumPy is the fundamental package for scientific computing in Python. | |
It is a Python library that provides a multidimensional array object, | |
various derived objects (such as masked arrays and matrices), and an | |
assortment of routines for fast operations on arrays, including | |
mathematical, logical, shape manipulation, sorting, selecting, I/O, | |
discrete Fourier transforms, basic linear algebra, basic statistical | |
operations, random simulation and much more. | |
At the core of the NumPy package, is the `ndarray` object. This | |
encapsulates *n*-dimensional arrays of homogeneous data types, with | |
many operations being performed in compiled code for performance. | |
There are several important differences between NumPy arrays and the | |
standard Python sequences: | |
- NumPy arrays have a fixed size at creation, unlike Python lists | |
(which can grow dynamically). Changing the size of an `ndarray` will | |
create a new array and delete the original. | |
- The elements in a NumPy array are all required to be of the same | |
data type, and thus will be the same size in memory. The exception: | |
one can have arrays of (Python, including NumPy) objects, thereby | |
allowing for arrays of different sized elements. | |
- NumPy arrays facilitate advanced mathematical and other types of | |
operations on large numbers of data. Typically, such operations are | |
executed more efficiently and with less code than is possible using | |
Python's built-in sequences. | |
- A growing plethora of scientific and mathematical Python-based | |
packages are using NumPy arrays; though these typically support | |
Python-sequence input, they convert such input to NumPy arrays prior | |
to processing, and they often output NumPy arrays. In other words, | |
in order to efficiently use much (perhaps even most) of today's | |
scientific/mathematical Python-based software, just knowing how to | |
use Python's built-in sequence types is insufficient - one also | |
needs to know how to use NumPy arrays. | |
The points about sequence size and speed are particularly important in | |
scientific computing. As a simple example, consider the case of | |
multiplying each element in a 1-D sequence with the corresponding | |
element in another sequence of the same length. If the data are | |
stored in two Python lists, ``a`` and ``b``, we could iterate over | |
each element:: | |
c = [] | |
for i in range(len(a)): | |
c.append(a[i]*b[i]) | |
This produces the correct answer, but if ``a`` and ``b`` each contain | |
millions of numbers, we will pay the price for the inefficiencies of | |
looping in Python. We could accomplish the same task much more | |
quickly in C by writing (for clarity we neglect variable declarations | |
and initializations, memory allocation, etc.) | |
:: | |
for (i = 0; i < rows; i++): { | |
c[i] = a[i]*b[i]; | |
} | |
This saves all the overhead involved in interpreting the Python code | |
and manipulating Python objects, but at the expense of the benefits | |
gained from coding in Python. Furthermore, the coding work required | |
increases with the dimensionality of our data. In the case of a 2-D | |
array, for example, the C code (abridged as before) expands to | |
:: | |
for (i = 0; i < rows; i++): { | |
for (j = 0; j < columns; j++): { | |
c[i][j] = a[i][j]*b[i][j]; | |
} | |
} | |
NumPy gives us the best of both worlds: element-by-element operations | |
are the "default mode" when an `ndarray` is involved, but the | |
element-by-element operation is speedily executed by pre-compiled C | |
code. In NumPy | |
:: | |
c = a * b | |
does what the earlier examples do, at near-C speeds, but with the code | |
simplicity we expect from something based on Python. Indeed, the NumPy | |
idiom is even simpler! This last example illustrates two of NumPy's | |
features which are the basis of much of its power: vectorization and | |
broadcasting. | |
Vectorization describes the absence of any explicit looping, indexing, | |
etc., in the code - these things are taking place, of course, just | |
"behind the scenes" in optimized, pre-compiled C code. Vectorized | |
code has many advantages, among which are: | |
- vectorized code is more concise and easier to read | |
- fewer lines of code generally means fewer bugs | |
- the code more closely resembles standard mathematical notation | |
(making it easier, typically, to correctly code mathematical | |
constructs) | |
- vectorization results in more "Pythonic" code. Without | |
vectorization, our code would be littered with inefficient and | |
difficult to read ``for`` loops. | |
Broadcasting is the term used to describe the implicit | |
element-by-element behavior of operations; generally speaking, in | |
NumPy all operations, not just arithmetic operations, but | |
logical, bit-wise, functional, etc., behave in this implicit | |
element-by-element fashion, i.e., they broadcast. Moreover, in the | |
example above, ``a`` and ``b`` could be multidimensional arrays of the | |
same shape, or a scalar and an array, or even two arrays of with | |
different shapes, provided that the smaller array is "expandable" to | |
the shape of the larger in such a way that the resulting broadcast is | |
unambiguous. For detailed "rules" of broadcasting see | |
`numpy.doc.broadcasting`. | |
NumPy fully supports an object-oriented approach, starting, once | |
again, with `ndarray`. For example, `ndarray` is a class, possessing | |
numerous methods and attributes. Many of its methods mirror | |
functions in the outer-most NumPy namespace, giving the programmer | |
complete freedom to code in whichever paradigm she prefers and/or | |
which seems most appropriate to the task at hand. | |