Python C API & PyMVA
Understanding Python bindings and PyMVA in ROOT
Using python’s easy interface is fun!
It is as much fun developing the interface for that too! Or is it?
It’s often not. It’s not easy to write efficient and robust modules in python. It’s easy to see why. While a lot of modules can be implemented in pure python, they often lag in performance. If all you care about is performance from your code, one option is to simply write code in a low level language like C++ and use it in other C++ modules. But, it’s really painful to go back to doing everything in C++ when your cool machine learning friends are using magical python libraries (I mean PyTorch :p). After using python for a while, you know how simple, powerful, and flexible it is, meeting all your Machine Learning requirements. There is little reason to go back.
For High Energy Physics, the go-to framework for big data analysis has been CERN’s ROOT framework. ROOT is a massive C++ library that even predates the STL in some areas. ROOT has a sub-module, TMVA, which houses the machine learning framework specifically developed for HEP data.
But when you are a Python developer with a C or C++ library, you’d like to use that from Python too!
Python Bindings
Python bindings allow you to call functions and pass data from Python to C or C++, letting you take advantage of the strengths of both languages. Almost all the scientific packages that you may have used in Python are developed in C++. A few examples are Numpy, Pandas, PyTorch. Numpy, for example, achieves a lot of its performance by carefully managing (and reusing) memory, and calling it from Python avoids the garbage collection overheads of writing the same functions in Python.
There are many slick tools and solutions for developing the python bindings, and if you are interested, I’d recommend reading this awesome blog by Henry Schreiner explaining various binding tools in detail.
The Python API is incorporated in a C source file by including the header “Python.h”.
#include <Python.h>
This will import everything needed to extend Python with C. Everything in python is an object and is represented as PyObject
here. That includes everything from int
, str
, list
, dict
and None
etc.
All user-visible symbols defined by Python.h have a prefix of Py
or PY
, except those defined in standard header files.
Py_Initialize();
This initializes the python interpreter and is required for everything we do with our TMVA PyTorch Interface.
If and when a function fails, it should set an exception condition and return an error value (usually a NULL
pointer). Exceptions are stored in a static global variable inside the interpreter; if this variable is NULL
no exception has occurred.
Anyways in PyMVA we are intrigued with not only making C functions callable from Python, but, the inversion is additionally utilizable.
Calling Python functions from C
This is especially the case for the TMVA PyTorch Interface that requires PyTorch based functions from user. Other uses are also imaginable.
Calling a Python function is relatively easy. First, the Python program must somehow pass you the Python function object. You should provide a function (or some other interface) to do this. It is generally a good practice to check the function object by PyCallable_Check()
. Finally, when the function is called, save a pointer to the Python function object in a global variable — or wherever you see fit.
Later, when it is time to call the function, you call either of the following C functions depending on the use:
-
PyObject_CallObject()
: Call a callable Python object callable, with arguments given by the tuple args. If no arguments are needed, then args can beNULL
. -
PyObject_CallFunction()
: Call a callable Python object callable, with a variable number of C arguments. The C arguments are described using aPy_BuildValue()
style format string. The format can beNULL
, indicating that no arguments are provided. -
PyObject_CallMethod()
: Call the method named name of object obj with a variable number of C arguments. The C arguments are described by aPy_BuildValue()
format string that should produce a tuple. -
PyObject_CallFunctionObjArgs()
: Call a callable Python object callable, with a variable number ofPyObject*
arguments. The arguments are provided as a variable number of parameters followed byNULL
. -
PyObject_CallMethodObjArgs()
: Calls a method of the Python object obj, where the name of the method is given as a Python string object in name. It is called with a variable number ofPyObject*
arguments. The arguments are provided as a variable number of parameters followed byNULL
.
These functions on execution return a Python object pointer, this is the return value of the Python function.
ROOT PyMVA
PyMVA is a set of plugins for TMVA package based on Python that consists a set of classes that engage TMVA and allows new methods of classification and regression using external frameworks and modules.
PyMethodBase
is the virtual base class for all PyMVA methods. It handles initialization of python interpreter, serialization, deserialization and has utilities for executing Python code in TMVA leveraging the C API for python discussed above.
PyMethodBase
is the backbone for all PyMVA methods. Our PyTorch Interface in TMVA (under development) is built on top of PyMVA, PyMethodBase
particularly!