A tool for making Python as fast as C/Fortran

In what way is Python "slow"?

  • It is interpreted*: python is a piece of software (written in C) that executes python code on-the-fly. C/Fortran instead are compiled directly to machine-understadable binary code.

* Actually Python is sorta compiled. But the above is still mostly true.

  • In practice this often isn't important because the basic operations are still the same. But numerical for loops (especially with function calls) can be a problem.
  • This is why numpy was invented - it basically does efficient looping. But sometimes it's not well suited to a problem...

Which is clearer?

I want to know the values from values, which are aligned to ids2, corresponding to the IDs in ids1.

In [ ]:
sorti = np.argsort(ids2)
idsforvals = np.searchsorted(ids2[sorti], ids1)
new_array = values[sorti][idsforvals]

# there's actually a better way to do this in newer numpy versions... 
# but it's a contrived example to start with.
In [ ]:
for i in ids1:
    for j in ids2:
        if i == j:
            new_array[i] = values[j]

Sometimes loops are just easier to understand... and they can also be faster if they are at "close-to-metal" speed (i.e., C or Fortran)

What does Cython do about that?

  • Cython is an "optimising static compiler" for Python
  • It can be run on regular Python... But that usually only has a small effect (not worth the trouble), because the C it makes still does everything python does.
  • But you can tell it to behave "like C" by specifying types for variables. This can speed things up by orders of magnitude. Turns Cython into a "hybrid" language mixing syntax of Python with capabilities of C.
  • Bonus: you can also call C functions directly
  • The price is flexibility (and clarity): you can no longer use Python techniques like freely casting an integer to a float

Now lets do a "hello world" example

  • First make sure you have cython and pkg-config installed (conda install cython pkg-config should work).
  • Look at the file hello_world.pyx in the spacetelescope/pylunch repo.
  • Open up a terminal and go to the repo and do cython hello_world.pyx. It should create a hello_world.c file.
  • Now compile that file. This can be tricky to know the right flags. The pkg-config program can help you with that - this should work (Note that python3 should be changed to python if you're on python 2.x):

     clang -bundle $(pkg-config --libs --cflags python3) ./hello_world.c -o hello_world.so

    If you have trouble, check out http://cython.readthedocs.io/en/latest/src/reference/compilation.html. If you still have trouble, try the method in http://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html, or just wait for a bit.

  • Once you've got it compiled, you should have a hello_world.so file, which acts like a Python package. So start up python and do:

    >>> import hello_world
    >>> hello_world.do_hello()
    hello world from python
    hello world from c

There's a much easier way when experimenting: the cython notebook extension!

In [1]:
%load_ext Cython
In [2]:
%%cython

from libc.stdio cimport printf

def do_hello():
    print("hello world from python")
    printf("hello world from c\n")
In [3]:
do_hello()
hello world from python

Note that the C-printed line is missing from the notebook. Watch out for subtle quirks like this!

(In this case it's because the C print function goes straight to the terminal rather than getting caught by the notebook - you should see it in your terminal if that's how you started the notebook server.)

What if you want to use it in multiple notebooks or a releasable package?

In a real code that you're releasing or packaging up for your own use over multiple notebooks, you'll want to use a proper setup.py, as then you won't have to manually compile anything - the python building packages will take care of that.

To continue our "hello world" example, you'll see a setup.py file along side the hello_world.pyx. Examine that file with your favorite text editor, and modify it to compile the hello_world.pyx. Then you just do python setup.py build. That will create a build/lib.<a-bunch-of-stuff-that-depends-on-your-computer> directory. cd into that one, start up python and do:

>>> import hello_world
>>> hello_world.do_hello()

which should output the exact same thing as we saw above.

You can see a bit more detail about this in the Cython docs, or you can try using the Astropy affiliated package template, which comes out-of-the-box ready to include Cython code along side regular Python.

Why would we want to do this?

Consider the two examples below, one in Python and the other in Cython

In [4]:
def fib_py(n):
    """Compute the Fibonacci series up to n."""
    a, b = 0, 1
    while b < n:
        a, b = b, a + b
    return b
In [5]:
%%cython

def fib_cy(n):
    """Compute the Fibonacci series up to n."""
    a, b = 0, 1
    while b < n:
        a, b = b, a + b
    return b
In [17]:
print('Python version:')
%timeit -n 1 -r 1 fib_py(1000000)

print('Cython version:')
%timeit -n 1 -r 1 fib_cy(1000000)
Python version:
1 loop, best of 1: 4.29 µs per loop
Cython version:
1 loop, best of 1: 2.3 µs per loop

OK, so Cython gives a 2x speedup...

Aside

The `-n 1 -r 1` bit above makes it do just *one* run of the code. That's important here because some machines will cache the results, giving inaccurate timing measurements of the actual algorithm. Sometimes that's *not* what you want, because the caching actually helps in real practical code... But for this example it's deceptive.

How can we do better?

  • cython -a is your friend. It shows where cython produces a lot of python-related function calls, and is a hint that you can add C types to speed things up. The "game" is to try to turn all the yellow to white in performance-sensitive parts.
In [7]:
%%cython -a

def fib_cy(n):
    """Compute the Fibonacci series up to n."""
    a, b = 0, 1
    while b < n:
        a, b = b, a + b
    return b
Out[7]:
Cython: _cython_magic_22b8b43cd7c9ba617dc48be29183d5ba.pyx

Generated by Cython 0.23.4

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

 1: 
+2: def fib_cy(n):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_1fib_cy(PyObject *__pyx_self, PyObject *__pyx_v_n); /*proto*/
static char __pyx_doc_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_fib_cy[] = "Compute the Fibonacci series up to n.";
static PyMethodDef __pyx_mdef_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_1fib_cy = {"fib_cy", (PyCFunction)__pyx_pw_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_1fib_cy, METH_O, __pyx_doc_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_fib_cy};
static PyObject *__pyx_pw_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_1fib_cy(PyObject *__pyx_self, PyObject *__pyx_v_n) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fib_cy (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_fib_cy(__pyx_self, ((PyObject *)__pyx_v_n));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_fib_cy(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_n) {
  PyObject *__pyx_v_a = NULL;
  PyObject *__pyx_v_b = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fib_cy", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba.fib_cy", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_a);
  __Pyx_XDECREF(__pyx_v_b);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple_ = PyTuple_Pack(3, __pyx_n_s_n, __pyx_n_s_a, __pyx_n_s_b); if (unlikely(!__pyx_tuple_)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_22b8b43cd7c9ba617dc48be29183d5ba_1fib_cy, NULL, __pyx_n_s_cython_magic_22b8b43cd7c9ba617d); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_fib_cy, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 3:     """Compute the Fibonacci series up to n."""
+4:     a, b = 0, 1
  __pyx_t_1 = __pyx_int_0;
  __Pyx_INCREF(__pyx_t_1);
  __pyx_t_2 = __pyx_int_1;
  __Pyx_INCREF(__pyx_t_2);
  __pyx_v_a = __pyx_t_1;
  __pyx_t_1 = 0;
  __pyx_v_b = __pyx_t_2;
  __pyx_t_2 = 0;
+5:     while b < n:
  while (1) {
    __pyx_t_2 = PyObject_RichCompare(__pyx_v_b, __pyx_v_n, Py_LT); __Pyx_XGOTREF(__pyx_t_2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 5; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_3 = __Pyx_PyObject_IsTrue(__pyx_t_2); if (unlikely(__pyx_t_3 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 5; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    if (!__pyx_t_3) break;
+6:         a, b = b, a + b
    __pyx_t_2 = __pyx_v_b;
    __Pyx_INCREF(__pyx_t_2);
    __pyx_t_1 = PyNumber_Add(__pyx_v_a, __pyx_v_b); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF_SET(__pyx_v_a, __pyx_t_2);
    __pyx_t_2 = 0;
    __Pyx_DECREF_SET(__pyx_v_b, __pyx_t_1);
    __pyx_t_1 = 0;
  }
+7:     return b
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_b);
  __pyx_r = __pyx_v_b;
  goto __pyx_L0;

Looks like we spend a lot of time doing the "guts" of the work in the last one. Can we add some C types where we know they should exist?

In [8]:
%%cython -a

def fib_cy2(n):
    """Compute the Fibonacci series up to n."""
    cdef int a, b
    
    a, b = 0, 1
    while b < n:
        a, b = b, a + b
    return b
Out[8]:
Cython: _cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27.pyx

Generated by Cython 0.23.4

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

 1: 
+2: def fib_cy2(n):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_1fib_cy2(PyObject *__pyx_self, PyObject *__pyx_v_n); /*proto*/
static char __pyx_doc_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_fib_cy2[] = "Compute the Fibonacci series up to n.";
static PyMethodDef __pyx_mdef_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_1fib_cy2 = {"fib_cy2", (PyCFunction)__pyx_pw_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_1fib_cy2, METH_O, __pyx_doc_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_fib_cy2};
static PyObject *__pyx_pw_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_1fib_cy2(PyObject *__pyx_self, PyObject *__pyx_v_n) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fib_cy2 (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_fib_cy2(__pyx_self, ((PyObject *)__pyx_v_n));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_fib_cy2(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_n) {
  int __pyx_v_a;
  int __pyx_v_b;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fib_cy2", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27.fib_cy2", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple_ = PyTuple_Pack(3, __pyx_n_s_n, __pyx_n_s_a, __pyx_n_s_b); if (unlikely(!__pyx_tuple_)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_6fb1cbaec6ed5cb545ad1a6b1595db27_1fib_cy2, NULL, __pyx_n_s_cython_magic_6fb1cbaec6ed5cb545); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_fib_cy2, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 3:     """Compute the Fibonacci series up to n."""
 4:     cdef int a, b
 5: 
+6:     a, b = 0, 1
  __pyx_t_1 = 0;
  __pyx_t_2 = 1;
  __pyx_v_a = __pyx_t_1;
  __pyx_v_b = __pyx_t_2;
+7:     while b < n:
  while (1) {
    __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_b); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PyObject_RichCompare(__pyx_t_3, __pyx_v_n, Py_LT); __Pyx_XGOTREF(__pyx_t_4); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __pyx_t_5 = __Pyx_PyObject_IsTrue(__pyx_t_4); if (unlikely(__pyx_t_5 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    if (!__pyx_t_5) break;
+8:         a, b = b, a + b
    __pyx_t_2 = __pyx_v_b;
    __pyx_t_1 = (__pyx_v_a + __pyx_v_b);
    __pyx_v_a = __pyx_t_2;
    __pyx_v_b = __pyx_t_1;
  }
+9:     return b
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_4 = __Pyx_PyInt_From_int(__pyx_v_b); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_r = __pyx_t_4;
  __pyx_t_4 = 0;
  goto __pyx_L0;

Still spending some time dealing with the n, as it's a python integer. lets cast it in advance before the inner loop.

In [9]:
%%cython -a

def fib_cy3(n):
    """Compute the Fibonacci series up to n."""
    cdef int a = 0
    cdef int b = 1
    cdef int c_n = int(n)
    
    while b < c_n:
        a, b = b, a + b
    return b
Out[9]:
Cython: _cython_magic_b2ecfce6889f340454f93e2264899042.pyx

Generated by Cython 0.23.4

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

 01: 
+02: def fib_cy3(n):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_b2ecfce6889f340454f93e2264899042_1fib_cy3(PyObject *__pyx_self, PyObject *__pyx_v_n); /*proto*/
static char __pyx_doc_46_cython_magic_b2ecfce6889f340454f93e2264899042_fib_cy3[] = "Compute the Fibonacci series up to n.";
static PyMethodDef __pyx_mdef_46_cython_magic_b2ecfce6889f340454f93e2264899042_1fib_cy3 = {"fib_cy3", (PyCFunction)__pyx_pw_46_cython_magic_b2ecfce6889f340454f93e2264899042_1fib_cy3, METH_O, __pyx_doc_46_cython_magic_b2ecfce6889f340454f93e2264899042_fib_cy3};
static PyObject *__pyx_pw_46_cython_magic_b2ecfce6889f340454f93e2264899042_1fib_cy3(PyObject *__pyx_self, PyObject *__pyx_v_n) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fib_cy3 (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_b2ecfce6889f340454f93e2264899042_fib_cy3(__pyx_self, ((PyObject *)__pyx_v_n));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_b2ecfce6889f340454f93e2264899042_fib_cy3(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_n) {
  int __pyx_v_a;
  int __pyx_v_b;
  int __pyx_v_c_n;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("fib_cy3", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("_cython_magic_b2ecfce6889f340454f93e2264899042.fib_cy3", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple_ = PyTuple_Pack(4, __pyx_n_s_n, __pyx_n_s_a, __pyx_n_s_b, __pyx_n_s_c_n); if (unlikely(!__pyx_tuple_)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_b2ecfce6889f340454f93e2264899042_1fib_cy3, NULL, __pyx_n_s_cython_magic_b2ecfce6889f340454); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_fib_cy3, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 03:     """Compute the Fibonacci series up to n."""
+04:     cdef int a = 0
  __pyx_v_a = 0;
+05:     cdef int b = 1
  __pyx_v_b = 1;
+06:     cdef int c_n = int(n)
  __pyx_t_1 = PyNumber_Int(__pyx_v_n); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyInt_As_int(__pyx_t_1); if (unlikely((__pyx_t_2 == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_c_n = __pyx_t_2;
 07: 
+08:     while b < c_n:
  while (1) {
    __pyx_t_3 = ((__pyx_v_b < __pyx_v_c_n) != 0);
    if (!__pyx_t_3) break;
+09:         a, b = b, a + b
    __pyx_t_2 = __pyx_v_b;
    __pyx_t_4 = (__pyx_v_a + __pyx_v_b);
    __pyx_v_a = __pyx_t_2;
    __pyx_v_b = __pyx_t_4;
  }
+10:     return b
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_b); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 10; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;
In [10]:
print('Baseline Python:')
%timeit -n 1 -r 1  fib_py(100000000)
print('No C types:')
%timeit -n 1 -r 1  fib_cy(100000000)
print('Some C types:')
%timeit -n 1 -r 1  fib_cy2(100000000)
print('All C types:')
%timeit -n 1 -r 1  fib_cy3(100000000)
Baseline Python:
1 loop, best of 1: 4.24 µs per loop
No C types:
1 loop, best of 1: 2.56 µs per loop
Some C types:
1 loop, best of 1: 2.08 µs per loop
All C types:
1 loop, best of 1: 1.1 µs per loop

So that a reasonable speedup (4-5x), just from adding c types.

Exercise

Can you convert this into something that runs faster in Cython? http://cython.readthedocs.io/en/latest/src/reference/language_basics.html might help here. (I managed a ~50x speedup! But maybe you can do even better?)

In [11]:
def primes_py(kmax):
    p = [0]*1000
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

My answer is at the end of the notebook. Don't peek until you're ready!

Numpy and Cython

  • Where Cython really shines is in combination with numpy. You stick with numpy for general most things, but drop into highly optimized Cython for "hotspots" or places where it's much clearer.

Let's see how the "typed memoryview" approach works:

In [12]:
def ap2b_py(a ,b):
    return a + 2*b
In [13]:
%%cython
import numpy as np


def ap2b_cy(a , b):
    # they start out as regular numpy arrays, so they know their own length
    
    cdef int i
    cdef int n = len(a)
    
    newarr = np.empty(n)
    
    cdef double[:] a_memview = a
    cdef double[:] b_memview = b
    cdef double[:] new_memview = newarr
    
    for i in range(n):
        new_memview[i] = a_memview[i] + 2.0 * b_memview[i]
        
    return newarr
In [14]:
import numpy as np
a, b = np.random.randn(2, 10000000)

%timeit ap2b_py(a, b)
%timeit ap2b_cy(a, b)
10 loops, best of 3: 62.8 ms per loop
10 loops, best of 3: 36.4 ms per loop

This ~ factor-of-2 speed improvement also comes at a factor-of-2 memory improvement, as there is no intermediate numpy array.

Some more complex (but perhaps a bit more practical) examples of optimized Cython

Note that some of these are not the memoryview form described above. But they give you a flavor of the sort of performance improvements you might get

  • One final tip for working with numpy and Cython. Consider adding this to cython functions:
import cython

@cython.cdivision(True) 
@cython.wraparound(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
def my_cy_func(...):
    ...

We won't dwell on the details of this - if you care you can read more at http://cython.readthedocs.io/en/latest/src/reference/compilation.html#compiler-directives. Just know that they will often speed things up a bit more, but at the price of a higher chance of crashing python.

Interfacing with C code

  • Cython also gives you the capability to call C directly. The main trick is that you need to include something like this in the Cython file:
    cdef extern from "myheaders.h":
      double c_function(int arg1, double arg2)
    you can then just use c_function(a, b) in a Cython function and it will work.
  • The (mental) complexity scales roughly with the complexity of the C code. E.g., functions with scalar arguments are easy, double * often works (even with numpy arrays), but double **: here there be dragons.

Other options

Cython is not the only way to speed numerical code up in python.

And there are probably more. But we only have so much time!

For the remaining time...

Explore! If you have a C code of your own, perhaps you can try writing a Python interface to it. If you don't, you could work through some of the earlier examples, or try a couple astronomy-centric exercises - you might start by writing it in Python and then adapting it to Cython...:

  • Write a Cython function to simulate a ground-based image of a nearby galaxy, given a star catalog (in the real case this would be from e.g. HST or JWST, but you could just make a random set of x/y positions, e.g. do x, y = np.random.randn(2, n)). It's probably easiest to assume Gaussian seeing, but you can always try whatever you favorite seeing kernel is. (There's a solution to this at the very end taken from my own science notebooks.)
  • Write a Cython function that convolves an image (a 2D numpy array) with a small kernel (another 2D numpy array, say 5x5). (You'd probably want to use astropy.convolution for this in a real science situation... But that is, itself implemented in Cython, so you could look at astropy's code for ideas)

Answer to Exercise

In [15]:
%%cython

def primes_cy(int kmax):
    cdef int n, k, i
    cdef int p[1000]
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result
In [16]:
%timeit primes_py(1000)
%timeit primes_cy(1000)
10 loops, best of 3: 86 ms per loop
100 loops, best of 3: 1.96 ms per loop

Possible answer to "simulate galaxy" exercise

(This is just one of many ways to do it... But I had to do this myself once, so here's how I did it:)

In [ ]:
%%cython

import cython
import numpy as np
from libc.math cimport exp

cdef double PI = 3.141592653589793

@cython.cdivision(True) 
cdef inline double ngauss2d(double x, double y, double sig):
    cdef double tsigsq = 2.0*sig*sig
    return exp(-(x*x + y*y)/tsigsq)/(tsigsq * PI)

@cython.wraparound(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
def convolve(data, xgrid, ygrid, fluxes, cenx, ceny, fwhm):
    """
    Convolves a star catalog with Gaussian seeing to produce an artificial
    ground-based/distant image.
    
    Parameters
    ----------
    data: 2D array
        An array to add the simulated image to
    xgrid: 2D array
        The x-values of each pixel in `data` - must be the same shape as `data`.
    ygrid: 2D array
        The y-values of each pixel in `data` - must be the same shape as `data`.
    fluxes: 1D array
        The flux in each star
    cenx: 1D array
        The x-position of the center of each star - must be the same shape as `fluxes`.
    ceny: 1D array
        The y-position of the center of each star - must be the same shape as `fluxes`.
    fwhm: float
        The FWHM of the "seeing". E.g., the gaussian convolution kernel. 
        (In the same units as `xgrid`, `ygrid`, `cenx`, and `ceny`).
    """
    from math import log
    import time
    
    cdef double[:, :] data_view = data
    cdef double[:, :] xgrid_view = xgrid
    cdef double[:, :] ygrid_view = ygrid
    cdef double[:] fluxes_view = fluxes
    cdef double[:] cenx_view = cenx
    cdef double[:] ceny_view = ceny
    
    cdef int I, J, S
    cdef double sig = fwhm/(2*(2*log(2))**0.5)
    cdef double starflux, starx, stary, dx, dy
    
    I = data.shape[0]
    J = data.shape[1]
    S = fluxes.shape[0]
    
    if xgrid.shape != (I, J) or ygrid.shape != (I, J):
        raise ValueError("xgrid or ygrid don't match data!")
        
    if cenx.shape != (S,) or ceny.shape != (S,):
        raise ValueError("cenx or ceny don't match fluxes!")
    
    for s in range(S):
        starflux = fluxes_view[s]
        starx = cenx_view[s]
        stary = ceny_view[s]
        
        for i in range(I):
            for j in range(J):
                dx = starx + xgrid_view[i, j]
                dy = stary + ygrid_view[i, j]
                data_view[i,j] += starflux * ngauss2d(dx, dy, sig)