# About this notebook

Here we will learn about one of the most important packages of Python,
because it circumvents one of Python's biggest issues.
Since Python is not compiled, loops and thus operations on sequences (e.g. lists) are kind of slow
compared to languages with compilers and allocated memory.
`numpy` solves this issue with its main data structure, the `array`, that is similar to lists,
but allows for fast operations on its items.

With the information presented here and additional numpy functions you can find online,
you'll have all basics ready to solve the numpy tasks in `03_exercises_numpy_scipy_visualization.ipynb`.


# 0. Import packages

Pure Python is not what you'll want to use when you do scientific programming. One of Python's strengths are the numerous packages that you can use.
Normally, you'd need to install these packages on your computer, but google colab has many packages pre-installed.

We'll present the most important packages for scientific computation today!

In [None]:
# different ways to import packages
import scipy # import the package 'scipy'
import numpy as np # import the package numpy as abbreviation 'np'
from matplotlib import colors # import class 'colors' from the matplotlib package
from matplotlib.colors import LogNorm
import matplotlib.pyplot as plt # import class pyplot of matplotlib as abbrev. 'plt'


... -> how you need to import the package, classes, functions, ... depends a bit on the package itself. When in doubt, just google it :) 'stackoverflow' usually knows the answer!

# 1.Numpy

In [None]:
import numpy as np # numpy is usually imported as 'np'

In general: there is a numpy function for (almost) everything and numpy functions are always faster than looping over array elements!

## the array

In [None]:
# setup
a = [1, 2, 5, 3, 7, 10] # list
b = np.array(a) # instantiation of the array class of numpy
print(a*3) # repeat list 3 times
print(b*3) # multiply every item with 3

[1, 2, 5, 3, 7, 10, 1, 2, 5, 3, 7, 10, 1, 2, 5, 3, 7, 10]
[ 3  6 15  9 21 30]


### the main argument for numpy

it's fast, because its functions are implemented in a very efficient way

**Use numpy functions wherever you can, instead of lists & loops!**

In [None]:
def list_add(list_1, list_2):
    list_3 = []
    for l1, l2 in zip(list_1, list_2):
        list_3.append(l1+l2)
    return list_3

In [None]:
a = list(range(5000))
b = list(range(1, 5001))

In [None]:
%%timeit
list_add(a, b)

144 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [None]:
a = np.arange(5000)
b = np.arange(1, 5001)

In [None]:
%%timeit
a+b

1 µs ± 8.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


## lots of useful functions

In [None]:
# type casting: all items in an array should have the same type!
# you can pre-define the type when instanciating the array
a_list = [1, 1.9, 2.3]
a = np.array(a_list, dtype=float)
print(a) # the floats were 'cut off' to int precision
# you can access the array's type:
print(a.dtype)


[1.  1.9 2.3]
float64


In [None]:
# if you don't give a type, it will be casted
a = np.array(a_list)
print(a, a.dtype)

[1.  1.9 2.3] float64


### 2D arrays

In [None]:
# Creating 1D and 2D arrays
# arrays can have as many dimensions as you like!
# (lists can do that, too. but it tends to get messy)

a = np.array([[1,2,3], [4,5,6]])
b = np.array([4,5,6])
print(a)
print(a[0][1])
print(b[1])


[[1 2 3]
 [4 5 6]]
2
5


In [None]:
# you can access a couple of important properties like this:
print(a.shape, b.shape)
print(a.ndim, b.ndim)


(2, 3) (3,)
2 1


In [None]:
print(a*b)
print(b*a) # normal multiplication is commutative! it's not implemented as matrix multiplication
# but rather as element-wise multiplication that results in a matching shape

[[ 4 10 18]
 [16 25 36]]
[[ 4 10 18]
 [16 25 36]]


In [None]:
# you can call all kinds of basic mathematical operations on arrays
# they are executed element-wise, if there are two arrays with the same length
print(b)
print(b*b)
print(b ** 2)
print(b - 3)

[4 5 6]
[16 25 36]
[16 25 36]
[1 2 3]


In [None]:
a = np.array([[1, 2], [4, 5]])
b = np.array([[4, 5], [7, 8]])
print(a * b)
print(a @ b) # this is the equivalent of matrix multiplication
print(b @ a) # which is NOT commutating

[[ 4 10]
 [28 40]]
[[18 21]
 [51 60]]
[[24 33]
 [39 54]]


### sequence functions

In [None]:
a = np.arange(20) # built-in range function
# similar behavior as range(20)!
# but it returns an array instead of a generator, so it consumes more memory
print(a)
print()
# reshape the array into a 2D form
a = a.reshape( (5,4) )
print(a)
print()
# reshape into a different 2D form
a = a.reshape( (2,10) )
print(a)

### ... and back to 1D:
print(a.flatten())


[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [None]:
print(a.reshape((3,3))) # wrong shape doesn't work, unsurprisingly

ValueError: cannot reshape array of size 20 into shape (3,3)

In [None]:
# some default array constructors
ones = np.ones( (3,4), dtype=int ) # 1 with given shape
zeros = np.zeros_like(ones) # 0 in the same shape as given array
print(ones)
print(zeros)

[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]
[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]


In [None]:
# conventient sequence generators in numpy
print(np.arange(10))
print(np.linspace(0, 3, 5))
print(np.logspace(0, 3, 5))
print(np.log10(np.logspace(0, 3, 5)))

[0 1 2 3 4 5 6 7 8 9]
[0.   0.75 1.5  2.25 3.  ]
[   1.            5.62341325   31.6227766   177.827941   1000.        ]
[0.   0.75 1.5  2.25 3.  ]


### math functions

In [None]:
# build-in efficient mathematical functions in numpy
rng = np.arange(3)
print(rng)
print(np.sin(rng))
print(np.exp(rng))
print(np.power(10, rng))

[0 1 2]
[0.         0.84147098 0.90929743]
[1.         2.71828183 7.3890561 ]
[  1  10 100]


### sorting

In [None]:
# sorting
randnr = np.random.randint(0,10,3) # generate some random integers (see below)
order = np.argsort(randnr) # returns the indices for sorted array
print("numbers:", randnr)

print("sorted numbers:", np.sort(randnr))

print("indices for sorting the numbers", order)
print(randnr[order])
# ... or directly:

numbers: [1 5 0]
sorted numbers: [0 1 5]
indices for sorting the numbers [2 0 1]
[0 1 5]


### masks

In [None]:
a = np.random.randint(0,10,10) #  generate some random integers

In [None]:
print(a)

[5 8 1 2 6 8 1 2 1 9]


In [None]:
print(a>4)

[ True  True False False  True  True False False False  True]


In [None]:
# data cuts and selection
b = np.arange(10)*1./4
print(a, b)
# simple mask selection
print(a[a>4]) # print all elements of a that are larger than 4
print(np.where(a<4, a, b)) # give directly an alternative 'b' if condition is not met
# the arrays need to have the same length here!

[5 8 1 2 6 8 1 2 1 9] [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.   2.25]
[5 8 6 8 9]
[0.   0.25 1.   2.   1.   1.25 1.   2.   1.   2.25]


## statistical functions

In [None]:
a = np.random.uniform(0,1,1000) # uniform random numbers between 0 and 1
np.sum(a) # sum

493.0386450424794

In [None]:
np.mean(a), np.median(a), np.std(a) # mean, median, standard deviation

(0.4930386450424794, 0.4944301015830824, 0.27904388361367816)

## 1.2 random numbers

numpy has a couple of useful random number functions. More elaborate functions can be found in 'scipy.stats' (we will work with scipy later on)

In [None]:
print(np.random.randint(0, 10, size=10)) # random integer

[3 7 9 2 7 9 7 8 0 4]


In [None]:
print(np.random.uniform(low=10, high=20, size=4)) # random uniform numbers
print(np.random.normal( loc=3, scale=5, size=(3,4))) # random normal numbers

[17.11312806 10.82612668 16.22535048 16.23275977]
[[ 1.80184525  4.47064923 -8.16619781 14.14000314]
 [ 7.24689405 -7.01816295  0.66046916 -5.91051168]
 [ 3.56125214  3.93933437  3.87858367  5.66685642]]
