NumPy

We’ve just used NumPy to load files, but in fact it has many many more uses.

Note

The NumPy documentation can be found here.

To see the power of NumPy we’re going to create an array containing some data

import numpy as np

temperatures = np.array([200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400])
NoteSyntax

The syntax for creating a NumPy array is very similar to that of a list, but we simply wrap the list in a call to np.array.

np.array([...values...])
Tip

Remember, you only need to import a module once per notebook!

Say we wanted to square each element, since this is a NumPy array we can simply use

temperatures**2
array([ 40000,  48400,  57600,  67600,  78400,  90000, 102400, 115600, 129600, 144400, 160000])

How would you square each element of the following list?

temperatures = [200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400]

Since this is a list we cannot use ** 2 since the power operator is not defined for lists. Instead we must use a comprehension

temperatures = [
    t**2
    for t in temperatures
]

NumPy arrays behave very similarly to vectors. If you are not overly familiar with vectors or linear algebra, do not panic. For the purposes of this course we won’t be worrying too much about the mathematical background describing why NumPy arrays work like they do, but rather how we can use them to our advantage.

Vector arithmetic

We’ve seen that NumPy arrays can used with the power operator ** to raise each element with a power, but what other features do they have?

Create the following NumPy arrays

array_1 = np.array([1, 5, 7])
array_2 = np.array([3, 1, 2])

and carry out the following operations

  1. array_1 * 2
  2. array_1 + 10
  3. array_1 - array_2
  4. array_1 + array_2
  5. array_1 / array_2
  6. array_1 * array_1

Notice that every single operation is defined - they all give a result rather than an error. In every case the operations are applied in what we refer to as an element-wise fashion - each element is independently affected by the operation.

Take the first operation - multiplying the array by two. The output of this operation is an array which still has three elements, but each value has been doubled.

Compare the above result of that of the following operation

[1, 5, 7] * 2

The list is doubled in length - a completely different result compared to the same operation applied to an equivalent NumPy array.

If we wanted to multiply each element by two we’d have to use a list comprehension. By now you should realise that these become very tedious very quickly and also make the code harder to read.

If we now look at the result of adding two NumPy arrays we see that the first element of array_1 is added to the first element of array_2. Similarly, the second elements of each array are summed, and the third and so-on.

As an equation this is

\[\begin{bmatrix} a_{1} \\ a_{2} \\ \vdots \\ a_{N} \end{bmatrix} + \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{N} \end{bmatrix} = \begin{bmatrix} a_{1} + b_{1} \\ a_{2} + b_{2} \\ \vdots \\ a_{N} + b_{N} \end{bmatrix},\]

which is precisely how vector addition works, hence the term vector arithmetic.

Homogeneity

A rather important feature of NumPy arrays is that, unlike the other data structures we have looked at, they must contain homogeneous data - i.e. the data must all be of the same type.

For a list we’re free to mix types

example_list = [1, 'a', True]

print(example_list)

but this is not possible for a NumPy array

example_array = np.array(example_list)

print(example_array)
['1' 'a' 'True']

You can see that each element has been turned into a string (notice the quotation marks around '1' and 'True'). This is usually not what we want, so if you need a sequence of different data types, use a list, not a NumPy array.

Indexing and slicing

We’ve now seen how NumPy arrays behave quite differently from other data structures with respect to mathematical operations, but in many other ways they are no different to lists.

We can access any element of a NumPy array by indexing it just like a list

example_array = np.array([1, 2, 3, 4, 5])
example_array[2]
3

Similarly, we can slice a NumPy array to retrieve only certain elements

example_array[1:4]
array([2, 3, 4])
example_array[3:0:-1]
array([4, 3, 2])

Just like we can have nested lists, we can have multidimensional NumPy arrays

nested_list = [[1, 2, 3], 
               [4, 5, 6], 
               [7, 8, 9]]

example_array = np.array(nested_list)

print(example_array)
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Again, these can be indexed just like regular list objects

example_array[0]
array([1, 2, 3])

To index multidimension arrays we can use the same syntax as a list

example_array[2][1]

but NumPy also allows us to use a more compact syntax

example_array[2, 1]

Just like one-dimensional NumPy arrays behave like vectors, two-dimensional arrays behave like matrices. You can therefore think of array slicing syntax as

name_of_array[row_index, column_index]

For example we obtain the element in the third row and first column as

example_array[2, 0]

If you are not particularly keen on vectors or matrices, you do not have to use this mental model all of the time. For the most part, you can think of NumPy arrays as fancy lists that allow you to peform certain mathematical operations on sequences of numbers very quickly and efficiently.

More NumPy functions

Aside from the array function, NumPy also provides us with many more useful functions that are well worth being aware of.

Previously, we learnt about the math library, which gives us access to various mathematical functions that go beyond the simple operators available in Python automatically such as + or /:

import math

math.log(2)

The math functions are designed for int and float objects, but they do not work with data structures such as a list.

What happens when you run the following code - why?

numbers = [1, 2, 3, 4, 5]

math.log(numbers)

How might this be changed so that we take the natural logarithm of each element?

We can use a list comprehension

log_numbers = [
    math.log(value)
    for value in numbers
]

Similarly, NumPy arrays are incompatible with math.log

example_array = np.array([1, 2, 3, 4, 5])

math.log(example_array)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[32], line 3
      1 example_array = np.array([1, 2, 3, 4, 5])
----> 3 math.log(example_array)

TypeError: only length-1 arrays can be converted to Python scalars

However, NumPy gives us the function np.log which does work with NumPy arrays

np.log(example_array)
array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791])
Important

Functions from the math module will only work on single values, whereas those from NumPy are usually compatible with sequences of values.

All of the functions math.sin, math.cos, math.exp, ... have equivalents in NumPy, but there are also functions such as np.mean, np.sum, and np.std (standard deviation) too.

Calculate the sum, mean, and standard deviation of the following values using a NumPy array

200, 220, 198, 181, 201, 156

You might be wondering why there is a NumPy version of the sum function, given that the built-in version works just fine on NumPy arrays.

sum(example_array)
1156

The NumPy version of the sum function calculates the sum over a given axis. In our one-dimensional example, this makes no difference, but if we have a two-dimensional array it can be quite useful.

To see the difference, lets use sum on a two-dimensional array.

example_array = np.array([[1, 2, 3, 4], 
                          [5, 6, 7, 8]])

sum(example_array)
array([ 6,  8, 10, 12])

The built-in sum function is only able to add up the rows.

Whereas with the np.sum function we have different options for how to sum

option_1 = np.sum(example_array, axis=0)
option_2 = np.sum(example_array, axis=1)
option_3 = np.sum(example_array)

print(f'Sum along axis 0 = {option_1}')
print(f'Sum along axis 1 = {option_2}')
print(f'Sum along all values = {option_3}')
Sum along axis 0 = [ 6  8 10 12]
Sum along axis 1 = [10 26]
Sum along all values = 36

We can sum only along the rows, only along the columns, or just sum up every element into a single number by using the axis keyword argument.

In this course, you almost certainly won’t have to worry about the different ways in which you can sum over a NumPy array, but for your information if you’re interested:

  • axis=0 corresponds to summing across rows (which is what the built-in sum function does).
  • axis=1 corresponds to summing across columns.
  • Providing no axis argument at all allows us to sum all values.

arange and linspace

We were introduced to the range function back in Session 4, this allows us to generate sequences of integers

for number in range(0, 22, 2):
    print(number)

Here we’ve printed all of the even numbers between 0 and 20 by passing the appropriate arguments to range (remember that the optional third argument is the step size between values).

NumPy provides its own version of this function called arange which returns a NumPy array

np.arange(0, 22, 2)
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

The arguments passed to np.arange function have exactly the same format as the built-in range function, except that rather than returning a sequence of int objects, we now get a NumPy array.

To go along with the np.arange function, we also have the linspace function which returns a given number of linearly spaced values between a minimum and maximum value.

What array does the following code generate?

np.linspace(0, 1, 5)

What do each of the input arguments correspond to?

NoteSyntax

The general syntax for np.linspace is

np.linspace(start, end, number_of_values)

where, unlike in range and arange, the end value is included in the returned array.

So NumPy arrays are quite a versatile data-structure, and are particularly useful for scientific computing where we might want to apply the same operation to many elements in a relatively compact fashion.