Sunday, December 8, 2013

Writing Python extension modules in C

In this short guide, we shall look at how to develop an extension module for Python in C. This requirement may come in couple of ways.

You may need to have a Python module, which has to be super-fast. Don't mislead yourself by this statement. Python it self can be used to write fast algorithms and modules. But let's be little
realistic about this. Python isn't anyway a compiled language like C. It's an interpreted language. Therefore, a hardcore algorithm written in C would easily surpass performance of the same written in Python. In that sense, you might want to write a C program for that, and still use Python's ease to develop an application fast.

Another common reason you may want to have an extension module is that, you may already have a library written in C which will fulfill your program requirements. Then, you may use that code with little modification to adapt to Python API, so that Python code can call that libraries' methods. After all, Python extension module is not more than a C library. In Unix/Linux, the the dynamic libraries are Shared Object or .so files, whereas in Windows they are usually referred to as Dynamic Link Libraries or dll.

In this we would build a sample module to be invoked in Python using C. For this exercise, you need to have Python interpreter and its header files along with. In Linux, you can install python dev packages using,
$ sudo apt-get install python-dev
In Windows, the header files must be available with the binary installer package itself. Developing an extension module for Python involves three steps. Namely,
  1. Set of C functions that would need to be invoked from Python 
  2. A table of mappings to Python methods to C functions 
  3. An initialization process 
Let me demonstrate with a sample module called mathC which is going to have an add, complex add and a method to find logarithm. Again, don't mislead yourself. You don't need to write an extension module yourself to find the logarithm of a value in Python. But this is merely for demonstration purpose. Simply, the C program would be like the following.
#include <Python.h>
#include <math.h>

static PyObject *mathC_add(PyObject *self, PyObject *args) {
    double d1, d2 ;
    if(!PyArg_ParseTuple(args, "dd", &d1, &d2)) {
        return  NULL ;
    }

    return Py_BuildValue("d", d1 + d2) ;
}

static PyObject *mathC_addComplex(PyObject *self, PyObject *args, PyObject *kw) {
    char *kwlist[] = { "real1", "complex1", "real2", "complex2", NULL } ;
    double r1, r2=0, c1=0, c2=0 ;
    if(!PyArg_ParseTupleAndKeywords(args, kw, "d|ddd", kwlist, &r1, &c1, &r2, &c2)) {
        return  NULL ;
    }

    return Py_BuildValue("(dd)", (r1 + r2), (c1 + c2)) ;
}

static PyObject *mathC_log(PyObject *self, PyObject *args) {
    double d1 ;
    if(!PyArg_ParseTuple(args, "d", &d1)) {
        return  NULL ;
    }

    return Py_BuildValue("d", log(d1)) ;
}

static PyMethodDef mathC_methods[] = {
    { "add", (PyCFunction) mathC_add, METH_VARARGS, NULL },
    { "addComplex", (PyCFunction) mathC_addComplex, METH_VARARGS | METH_KEYWORDS, NULL },
    { "log", (PyCFunction) mathC_log, METH_VARARGS, NULL },
    { NULL, NULL, 0, NULL }
};

PyMODINIT_FUNC initmathC() {
    Py_InitModule3("mathC", mathC_methods, "My mathC extension module");
}
Read the code carefully. First of all, we need include Python.h before hand. That exposes, the required methods for us to communicate back and forth with python and C. Next thing is, three functions corresponding to three methods that we need to invoke from python. All the functions we need to invoke must have either of the following three function prototypes in C.
PyObject *funct(PyObject *self, PyObject *args);
PyObject *funcWithKeywords(PyObject *self, PyObject *args, PyObject *kw);
PyObject *functWithNoArgs(PyObject *self);
Function names can be anything that we like to have. But normally, it's recommended to have a convention like {moduleName}_{nameDescribingPurpose}. That's why I have three functions; mathC_add, mathC_complex and mathC_log. Normally, the functions would take above first form, as in mathC_add function. That function would accept any number of arguments from python and that is equivalent to a python tuple. Let's get into the implementation of the mathC_add function. I have declared two double variables to get it from arguments. Those are for the two operands to be added. Next, we need to parse the arguments from the tuple args. For that Python API provids us a function named PyArg_ParseTuple. This function accepts the PyObject * parameter arg as the first argument. Second argument is a C style format specifier. But note that here we don't use % sign, instead only the character. 'd' for double, 'i' for integer and 's' for strings or (char *) and so on. This function return 0 when operation fails and we have a check to determine that and return NULL from C function, when failed, so that python interpreter can throw an exception saying the error. Next when succeeded in parsing the tuple, we can do our operations, here in our case adding the two variable. Returning from C to python needs to be compatible with python. For this, Python API provides us a method called Py_BuildValue, which also takes the format specifier for the return value and the actual return value. Easy huh!!!

OK.. Now let's look into the second function, mathC_addComplex. This function is supposed to add two complex numbers. It will take four arguments, orderly, first complex numbers' real part, then complex part and then second complex numbers' real part and the complex part. But, note that this function has the above second prototype. This is to support pythons' keyword arguments. The acceptable keyword list needs to be specified in a null terminated array like kwlist. As of here, the python interpreter can pass arguments with keywords; real1, complex1, real2, complex2. Then I've declared four variables to capture. Here I have initialized r2, c1 and c2 to zero and left r1 to uninitialized. This, is to enforce the python programmer to pass at least one value, which is the real part of the first complex number. That enforcement actually happens in the format specifier in the PyArg_ParseTupleAndKeywords function. It's specified as d|ddd. All the specifier right to | are optional. All the specifier to left of the | are required. Note the function PyArg_ParseTupleAndKeywords. It's different from PyArg_ParseTuple. It needs to know about the keyword list and the keyword arguments. The PyObject * kw is passed as the second argument. And, finally we are returning a tuple. For that, see the format specifier in the return statement. (dd) says the tuple contains two values and those are double. The last function mathC_log is very self explainable. Only difference is that it uses C math libraries' log function. Voila, all three functions are now ready to serve as extensions to python. We are done with the first step of building an extension module.

The second step is to specify the table of mapping. This is specified in an array, which can have any name we like, but of type PyMethodDef provided in Python API. This is a structure of the following form.
 struct PyMethodDef {
  char *ml_name;
  PyCFunction ml_meth;
  int ml_flags;
  char *ml_doc;
 };
ml_name is the method name, that we are going to use from python. ml_meth is the equivalent C function name. ml_flags identify which of the three prototypes we are using in C function. Acceptable values are METH_VARARGS for the first prototype, METH_VARARGS | METH_KEYWORDS for the second and METH_NOARG for the third. The final char * parameter is for a docstring of the method, which python uses, which can be NULL. So to specify our second function, we need to have a structure like following.
{ "addComplex", (PyCFunction) mathC_addComplex, METH_VARARGS | METH_KEYWORDS, NULL }
Now the above mathC_methods specifies together with NULL values for the termination. Now the second step is done.

The last and final step involves expoting our initialization function, so that when this module is imported, the python interpreter knows what methods are available. This initialization method needs to be named init{moduleName}. Since, our module is mathC, this method is named initmathC. Note PyMODINIT_FUNC macro. That's a platform independant way to export this method so that it could be called from outside when we build this library into a dynamic library, .so or .dll. To initialize we use the function called Py_InitModule3 as follows.
Py_InitModule3("mathC", mathC_methods, "My mathC extension module");
This methods accepts three arguments; module name, method declaration array and a doc string for the module. Phew!! all steps are done.

OK.. we are not done yet though. Next we need to build the library and test it by importing. We can build the library in two ways. First is the classic way of building. In Linux/Unix we can build with following gcc command. Of course, you need to have gcc, C compiler installed for doing that. If you don't have the required tools installed, see one of my previous post "Installing Developer Tools in Linux".
$ gcc -shared -I/usr/include/python3.1 mathC.c -o mathC.so -fPIC
In windows, having Visual C++ compiler installed, with following command you can build the library into a dll.
cl /LD /IC:\Python31\include mathC.c C:\Python31\libs\python31.lib
Note it assumes the python 3 is installed, and change to the path where Python header files are residing in your system.

Next to import this module into python code, either your library needs to be in the directory where your python code is. Otherwise it must be in one of the sys.path directories. You can manually copy the library file to one of that directories.

But the second method can save you with all these manual process. It's with using the pythons' setup script with the help of distutils package. If you haven't done the above, you can proceed with the following setup script in a file called setup.py.
from distutils.core import setup, Extension
setup(name=’mathC’, version=’1.0’, ext_modules=[Extension(‘mathC’,[‘math.C.c’])])
It's all self explanatory. You can now install the package using the following command.
$ python setup.py install
In Unix, you may need to provide root access for this. In Windows that's not a problem in general.
Now all done. Let's check our extension module in Python. If you have not placed the library into one of the sys.path or site.packages directory, you may have to navigate to the directory where the library is residing. If you have placed or installed in the previous way with the setup.py all is fine.
[shazni@wso2-ThinkPad-T530 cmath]$ python
Python 2.7.4 (default, Sep 26 2013, 03:20:26) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mathC
>>> mathC.add(5.4, 2.3)
7.7
>>> mathC.log(15.4)
2.7343675094195836
>>> mathC.addComplex(1.1, 1.2, 1.3, 1.4)
(2.4000000000000004, 2.5999999999999996)
>>> mathC.addComplex(1.1)
(1.1, 0.0)
>>> mathC.addComplex(real1=1.1, real2=1.2, complex1=1.3, complex2=1.4)
(2.3, 2.7)
Great!!!. All happens as expected. Hope you enjoyed learning how to develop an extension module. This is a great technique, which can make beautiful python even more beautiful.

No comments:

Post a Comment