These lecture notebooks are here:

https://github.com/spacetelescope/pylunch/tree/master/session3

A prettified version is here:

http://spacetelescope.github.io/pylunch

If you have not yet signed to the mailing list, please do so here:

http://bit.ly/stsci-pylunch-signup

If you want to do a more challenging set of exercises:

https://github.com/spacetelescope/pylunch/blob/master/session3/numpy100-qs.ipynb

Numpy

NumPy is an acronym for "Numeric Python" or "Numerical Python". It is an open source extension module for Python, which provides fast precompiled functions for mathematical and numerical routines. Furthermore, NumPy enriches the programming language Python with powerful data structures for efficient computation of multi-dimensional arrays and matrices. The implementation is even aiming at huge matrices and arrays. Besides that the module supplies a large library of high-level mathematical functions to operate on these matrices and arrays.

Important subtle difference from e.g., MATLAB or IDL: numpy arrays are not part of the core language. So they can be developed, extend, and modified without installing a new python.

Advantages of using Numpy with Python:

  • array-oriented computing
  • efficiently implemented multi-dimensional arrays
  • designed for scientific computation

But before we dive into Numpy, let's take a detour through Python data types...

Python Data Types

Python has five standard data types.

  • Number
  • String
  • List
  • Tuple
  • Dictionary
In [29]:
# Numbers can be integers (including long), float, complex or boolean
a = 5
print('Integer:', a, type(a))

b = 51924361948403939480293840938
print('Long integer:', b, type(b))

c = 10.7
print('Float', c, type(c))

t, f = True, False
print('Boolean: ',t, type(t))

d = 9.322e-36j
print('Complex:', d, type(d))
Integer: 5 <class 'int'>
Long integer: 51924361948403939480293840938 <class 'int'>
Float 10.7 <class 'float'>
Boolean:  True <class 'bool'>
Complex: 9.322e-36j <class 'complex'>
In [30]:
# Strings are straight-forward as we saw with 'Hellow World!'

string = 'A moose once bit my sister.'
print(string)
A moose once bit my sister.
In [31]:
# Lists are the ones most similar to arrays, but not quite.
ll = [1.2, 23.6, 'foo', 11] ### <---- lists use square brackets!
print(ll)

# Lists are easy to append! Use in cases where you do not know the size of the input array!
ll.append('temp')
print(ll)

import numpy as np
tt = np.random.rand(20)
num = []
for t in tt:
    if t > 0.5:
        num.append(t)
        
print(len(num))
[1.2, 23.6, 'foo', 11]
[1.2, 23.6, 'foo', 11, 'temp']
6
In [32]:
# Tuples: similar to lists, but have an interesting quality: once created they cannot be changed.
# i.e., they are "immutable", they cannot be sorted, appended, etc.
# This is good for certain cases (e.g. they can be keys to a dictionary, let you "protect" data), 
# but generally are useful for scientific computing than lists/arrays

tup = (1,2,3,6.7) ### <---- use rounded brackets for tuples
print(tup, type(tup))
(1, 2, 3, 6.7) <class 'tuple'>
In [33]:
# Dictionaries: we are just mentioning here that they exist. More elsewhere.

dd = {}
dd['lock'] = 1
dd['key'] = 2
print(dd)
{'key': 2, 'lock': 1}

Numpy Data Types

Numpy arrays are a different data type, beyond the five above: the ndarray. Numpy arrays can only contain one type of data but there are lots of options as to what that type is. A full list of Numpy data types can be found here:

http://docs.scipy.org/doc/numpy/user/basics.types.html

  • Always use the smallest data type that is appropriate for your data.
  • Do not append to numpy arrays: it is hugely inefficient! This is what lists are good for.
In [67]:
# values evenly spaced within an interval, specify the STEP:
# np.arange(start, stop, step)
np.arange(0,10,1, dtype=np.float) # if you don't specify the data type, Python will use the one that takes the least space
Out[67]:
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
In [68]:
# values evenly spaced within an interval, specify the NUMBER OF VALUES:
# np.linspace(star, stop, num=10)
np.linspace(0,9,10, dtype=np.int)

# there is also np.logspace
#np.logspace(0,1,10)
Out[68]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [69]:
# create array from existing data:

a = [1,2,3,4,5]
b = np.array(a)
b
Out[69]:
array([1, 2, 3, 4, 5])
In [70]:
# another way to get a pre-filled array is to set all values to ones, zeros, or leave them empty:
a = np.ones(10)
print('Ones:', a)

b = np.zeros(10)
print('Zeros:', b)

c = np.empty(10, dtype=np.str)
print('Empty:', c)
Ones: [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
Zeros: [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
Empty: ['' '' '' '' '' '' '' '' '' '']
In [73]:
# if you already have an array A and want one that is the same size but with different values,
# here are a couple handy ways to accomplish this

a = [[1,2,3],[4,5,6],[7,8,9]]

b = np.zeros_like(a, dtype=np.float)
print('Zeros: ', b)

c = np.ones_like(a, dtype=np.float)
print('Ones: ', c)

d = np.empty_like(a, dtype=np.float)
print('Empty: ', d)
Zeros:  [[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
Ones:  [[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
Empty:  [[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]

You can explore other options on your own.

Numpy Exercises

1) Create a one-dimensional array, print the length and shape of the array

2) Create a two-dimensional array, print the length and shape of the array

3) Add the two arrays created above -- what happens?

4) Create a 100 x 100 array of integers, and trim off the top/bottom rows, and left/right columns

5) Write out and use an index array to select out positive values from this array

np.array([1, -1, -2, 3, -5])

6) Experiment with arange, ones, and zeros to create arrays of different shapes

7) Using a boolean array mask, select out the elements of the following array between 5 and 10:

np.array([0.6429498677659073, 1.150547235455569, 1.1915607017440888, 8.283179653420964, 5.1635384867953595, 8.06221365954315, 5.941607350505754, 9.426996923221827, 9.828300195624534, 8.061581259382875, 9.350471376998248, 2.5337332496612266, 3.8933693630535062, 7.854245437743151, 0.7965058455412621, 2.7207245408915623, 4.693244676240291, 1.3620057998648716, 8.880004623574631, 6.504379354779315])

Boolean Array Indexing or Why You Should Never Use np.where

If you come from IDL, you probably LOVE the "where" function. A similar function exists in Numpy:

In [39]:
ll = np.linspace(0,20,20)
idx = np.where(ll > 10)
print('Indexes: ', idx)
print('Selection: ', ll[idx])
Indexes:  (array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),)
Selection:  [ 10.52631579  11.57894737  12.63157895  13.68421053  14.73684211
  15.78947368  16.84210526  17.89473684  18.94736842  20.        ]

But. You should not use it. Why? Because boolean arrays.

In [40]:
a = np.array([1,2,3,5,8,13])
a > 3 # the result is a boolean array!
Out[40]:
array([False, False, False,  True,  True,  True], dtype=bool)
In [47]:
# expressions can be combined: 
# "|" == "or"
# "&" == "and"
# must use | and & with numpy arrays

print((a > 3) | (a == 1))

print((a > 2) & (a < 10))
[ True False False  True  True  True]
[False False  True  True  True False]
In [48]:
# "~" is the inverse operator:
~np.array([True, True, False])
Out[48]:
array([False, False,  True], dtype=bool)
In [ ]:
# How is this useful? 

idx = (ra > 11.1324) & (ra < 31.5134)
selected = ra[idx]
not_selected = ra[~idx]
In [1]:
x = np.random.randn(100000)
In [2]:
%timeit np.where(x<3)
The slowest run took 6.62 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 96.7 µs per loop
In [3]:
%timeit x<0
The slowest run took 6.38 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 28.3 µs per loop

Numpy Array Indexing

  • zero-indexed
  • non-inclusiveness
  • y1:y2, x1:x2

Other Handy Numpy Functions

In [ ]:
# Randoms
# https://docs.scipy.org/doc/numpy-dev/reference/routines.random.html
In [ ]:
# Linear algebra:
# https://docs.scipy.org/doc/numpy-dev/reference/routines.linalg.html
In [ ]:
# Stats
# https://docs.scipy.org/doc/numpy-dev/reference/routines.statistics.html

Other Resources

  • Numpy documentation:

http://docs.scipy.org/doc/numpy/

  • Numpy documentation quickstart:

http://docs.scipy.org/doc/numpy/user/quickstart.html

  • Numpy documentation basics:

http://docs.scipy.org/doc/numpy/user/basics.html

  • Good Tutorial:

http://www.python-course.eu/numpy.php

  • Another tutorial, which has a similar approach to ours:

http://cs231n.github.io/python-numpy-tutorial/