NumPy
We’ve just used NumPy to load files, but in fact it has many many more uses.
The NumPy documentation can be found here.
To see the power of NumPy we’re going to create an array containing some data
import numpy as np
temperatures = np.array([200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400])Remember, you only need to import a module once per notebook!
Say we wanted to square each element, since this is a NumPy array we can simply use
temperatures**2array([ 40000, 48400, 57600, 67600, 78400, 90000, 102400, 115600, 129600, 144400, 160000])
NumPy arrays behave very similarly to vectors. If you are not overly familiar with vectors or linear algebra, do not panic. For the purposes of this course we won’t be worrying too much about the mathematical background describing why NumPy arrays work like they do, but rather how we can use them to our advantage.
Vector arithmetic
We’ve seen that NumPy arrays can used with the power operator ** to raise each element with a power, but what other features do they have?
Create the following NumPy arrays
array_1 = np.array([1, 5, 7])
array_2 = np.array([3, 1, 2])and carry out the following operations
array_1 * 2array_1 + 10array_1 - array_2array_1 + array_2array_1 / array_2array_1 * array_1
Notice that every single operation is defined - they all give a result rather than an error. In every case the operations are applied in what we refer to as an element-wise fashion - each element is independently affected by the operation.
Take the first operation - multiplying the array by two. The output of this operation is an array which still has three elements, but each value has been doubled.
If we now look at the result of adding two NumPy arrays we see that the first element of array_1 is added to the first element of array_2. Similarly, the second elements of each array are summed, and the third and so-on.
As an equation this is
\[\begin{bmatrix} a_{1} \\ a_{2} \\ \vdots \\ a_{N} \end{bmatrix} + \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{N} \end{bmatrix} = \begin{bmatrix} a_{1} + b_{1} \\ a_{2} + b_{2} \\ \vdots \\ a_{N} + b_{N} \end{bmatrix},\]
which is precisely how vector addition works, hence the term vector arithmetic.
Homogeneity
A rather important feature of NumPy arrays is that, unlike the other data structures we have looked at, they must contain homogeneous data - i.e. the data must all be of the same type.
For a list we’re free to mix types
example_list = [1, 'a', True]
print(example_list)but this is not possible for a NumPy array
example_array = np.array(example_list)
print(example_array)['1' 'a' 'True']
You can see that each element has been turned into a string (notice the quotation marks around '1' and 'True'). This is usually not what we want, so if you need a sequence of different data types, use a list, not a NumPy array.
Indexing and slicing
We’ve now seen how NumPy arrays behave quite differently from other data structures with respect to mathematical operations, but in many other ways they are no different to lists.
We can access any element of a NumPy array by indexing it just like a list
example_array = np.array([1, 2, 3, 4, 5])
example_array[2]3
Similarly, we can slice a NumPy array to retrieve only certain elements
example_array[1:4]array([2, 3, 4])
example_array[3:0:-1]array([4, 3, 2])
Just like we can have nested lists, we can have multidimensional NumPy arrays
nested_list = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
example_array = np.array(nested_list)
print(example_array)[[1 2 3]
[4 5 6]
[7 8 9]]
Again, these can be indexed just like regular list objects
example_array[0]array([1, 2, 3])
To index multidimension arrays we can use the same syntax as a list
example_array[2][1]but NumPy also allows us to use a more compact syntax
example_array[2, 1]Just like one-dimensional NumPy arrays behave like vectors, two-dimensional arrays behave like matrices. You can therefore think of array slicing syntax as
name_of_array[row_index, column_index]For example we obtain the element in the third row and first column as
example_array[2, 0]If you are not particularly keen on vectors or matrices, you do not have to use this mental model all of the time. For the most part, you can think of NumPy arrays as fancy lists that allow you to peform certain mathematical operations on sequences of numbers very quickly and efficiently.
More NumPy functions
Aside from the array function, NumPy also provides us with many more useful functions that are well worth being aware of.
Previously, we learnt about the math library, which gives us access to various mathematical functions that go beyond the simple operators available in Python automatically such as + or /:
import math
math.log(2)The math functions are designed for int and float objects, but they do not work with data structures such as a list.
Similarly, NumPy arrays are incompatible with math.log
example_array = np.array([1, 2, 3, 4, 5])
math.log(example_array)---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[32], line 3
1 example_array = np.array([1, 2, 3, 4, 5])
----> 3 math.log(example_array)
TypeError: only length-1 arrays can be converted to Python scalarsHowever, NumPy gives us the function np.log which does work with NumPy arrays
np.log(example_array)array([0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791])
Functions from the math module will only work on single values, whereas those from NumPy are usually compatible with sequences of values.
All of the functions math.sin, math.cos, math.exp, ... have equivalents in NumPy, but there are also functions such as np.mean, np.sum, and np.std (standard deviation) too.
You might be wondering why there is a NumPy version of the sum function, given that the built-in version works just fine on NumPy arrays.
sum(example_array)1156
The NumPy version of the sum function calculates the sum over a given axis. In our one-dimensional example, this makes no difference, but if we have a two-dimensional array it can be quite useful.
To see the difference, lets use sum on a two-dimensional array.
example_array = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
sum(example_array)array([ 6, 8, 10, 12])
The built-in sum function is only able to add up the rows.
Whereas with the np.sum function we have different options for how to sum
option_1 = np.sum(example_array, axis=0)
option_2 = np.sum(example_array, axis=1)
option_3 = np.sum(example_array)
print(f'Sum along axis 0 = {option_1}')
print(f'Sum along axis 1 = {option_2}')
print(f'Sum along all values = {option_3}')Sum along axis 0 = [ 6 8 10 12]
Sum along axis 1 = [10 26]
Sum along all values = 36
We can sum only along the rows, only along the columns, or just sum up every element into a single number by using the axis keyword argument.
In this course, you almost certainly won’t have to worry about the different ways in which you can sum over a NumPy array, but for your information if you’re interested:
axis=0corresponds to summing across rows (which is what the built-insumfunction does).axis=1corresponds to summing across columns.- Providing no
axisargument at all allows us to sum all values.
arange and linspace
We were introduced to the range function back in Session 4, this allows us to generate sequences of integers
for number in range(0, 22, 2):
print(number)Here we’ve printed all of the even numbers between 0 and 20 by passing the appropriate arguments to range (remember that the optional third argument is the step size between values).
NumPy provides its own version of this function called arange which returns a NumPy array
np.arange(0, 22, 2)array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20])
The arguments passed to np.arange function have exactly the same format as the built-in range function, except that rather than returning a sequence of int objects, we now get a NumPy array.
To go along with the np.arange function, we also have the linspace function which returns a given number of linearly spaced values between a minimum and maximum value.
So NumPy arrays are quite a versatile data-structure, and are particularly useful for scientific computing where we might want to apply the same operation to many elements in a relatively compact fashion.