Python Garbage Collection

15 Jul 2015

In this post we’ll explore how garbage collection works in two of the Python implementations:

  • CPython - Reference counting
  • PyPy - Incremental min mark

#Motivation

Most of the time we’re working on business logic using some language, in our case Python, using language provided abstractions without thinking about the layers that are below.

businnes

In this blog post we’re interested in knowing more about how Python manages the memory life cycle that are used in our programs. As far as we know, in languages like C/C++, the programmer is the one in charge of dealing with memory, allocating with the precise size and after using it, freeing that memory. Since it’s a manual work, memory leaks, a resource that is not released after using it, are a common problem in those languages. There are patterns in C++, like RAII, to solve this problems, as well as other techniques like uniqueptr and sharedptr, but still it’s a common problem.

###Ownership

1
2
3
4
5
6
int * func ( void )
{
    int * num = malloc (10 * sizeof ( int ));
    /* ... */
    return num ;
}

There can be other problems like dangling pointers, we’re allocating memory on the [stack] and returning the reference that is invalidated just after the function reach the end. So that memory address is invalid so we’ll get a SEGFAULT or erroneous data.

###Dangling pointers

1
2
3
4
5
6
int * func ( void )
{
    int num = 1234;
    /* ... */
    return &num ;
} // end of function scope, so num is garbage now.

On a high level perspective those languages seems to be very difficult to handle, but depending on the problem, having full control of memory is imperative, think about an embedded system environment or a very demanding application like a videogame, performance is a must. In those environments being able of define the memory layout taking into account caches and so on is key.

We’re lucky and Python interpreter does that job for us so we don’t have to manually manage memory. That’s because Python has a garbage collector.

What is Garbage Collection?

The first mention of garbage collection was done in 1960 by John McCarthy, the creator of LISP, in his paper Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I where he describes the formalism aroud LISP, and as a side note he defines the term. We already called this process “garbage collection”, but I guess I chickened out of using it in the paper—or else the Research Laboratory of Electronics grammar ladies wouldn’t let me. Funny, right? academia and industry ended up using garbage collection as the formal term.

Academics define Garbage collection as:

Garbage collection is automatic memory management. While the mutator runs , it routinely allocates memory from the heap. If more memory than available is needed, the collector reclaims unused memory and returns it to the heap.

And there are other terms that we need to define as well.

Mutator:

The part of a running program which executes application code.

Basically our running program.

Heap

A data structure in which objects may be allocated or deallocated in any order.

Collector

The part of a running program responsible of garbage collection.

There is a lot of research around garbage collection, one good resource if you’re interested in going deeper is The Garbage Collection Handbook.

I would like to stress that there are some trade-offs when we’re running a garbage collected language, we can summarize those trade-offs as:

  • Additional resources consumption.
  • Performance impacts.
  • Unpredictability on when the GC is performed (depending on the algorithm).

Once we know the basics we can start studying how all this works in Python.

CPython Implementation - Reference Counting

CPython implements Reference Counting algorithm. It consists in having a counter in each object that tracks the number of references to it held by other objects. In CPython this counter is placed in PyObject struct with ob_refcnt name.

1
2
3
4
5
    typedef struct _object {
      _PyObject_HEAD_EXTRA
      Py_ssize_t ob_refcnt;
      struct _typeobject *ob_type;
    } PyObject;

Knowing how many references an object has, the algorithm is very straightforward. Each time than an object creates a reference, ob_refcnt gets incremented by 1.

So, for example:

1
2
foo = Foo()
my_list = []

on line 1 foo ob_refcnt is 1 just because the module has a reference to all the objects that live in it, as well as my_list ob_refct is 1.

refcnt1

1
2
3
foo = Foo()
my_list = []
my_list.append(foo)

Once foo is appended to my_list foo refcnt is equals to 2 because now, foo has two incoming references one is held by __main__ module and the other one by my_list.

refcnt2

We can observe how this happen under the cover in C.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
static int
app1(PyListObject *self, PyObject *v)
{
    Py_ssize_t n = PyList_GET_SIZE(self);

    assert (v != NULL);
    if (n == PY_SSIZE_T_MAX) {
        PyErr_SetString(PyExc_OverflowError,
            "cannot add more objects to list");
        return -1;
    }

    if (list_resize(self, n+1) == -1)
        return -1;

    Py_INCREF(v);
    PyList_SET_ITEM(self, n, v);
    return 0;
}

This is the a function that given a list self and an element v it places the object v in the last position. Observe line 16, PY_INCREF(v) that’s the place where we keep track of the new reference by my_list.

#define Py_INCREF(op) (                         \
    _Py_INC_REFTOTAL  _Py_REF_DEBUG_COMMA       \
    ((PyObject*)(op))->ob_refcnt++)

As you can see, Py_INCREF macro is a simple counter increment.

Once an object is not used anymore, list implementation should do the inverse operation and decrement ob_refcnt counter by one.

foo = Foo()
my_list = []
my_list.append(foo)
my_list[0] = None

If we change the element in the position of foo in my_list we end up with the same picture as before. Both foo and my_list ob_refcnt have 1 as value. We reduced by 1 the counter on foo once we changed the value of my_list[0].

refcnt1

That’s because list implementation takes care of managing the reference count. If we take a look into the C implementation of SetItem, we can observe on line 13 that Py_XDECREF is called, that macro decrements by 1 ob_refcnt on v and if this value reaches 0, the object is freed because no one is interested anymore in it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int
PyList_SetItem(PyObject *op, Py_ssize_t i,
               PyObject *newitem)
{
    PyObject *olditem;
    PyObject **p;
    .
    .
    .
    p = ((PyListObject *)op) -> ob_item + i;
    olditem = *p;
    *p = newitem;
    Py_XDECREF(olditem);
    return 0;
}

This is PY_DECREF macro implementation, it just decrements the counter and if it reaches 0, it deallocates the object.

1
2
3
4
5
6
7
8
9
#define Py_DECREF(op)                                   \
    do {                                                \
        PyObject *_py_decref_tmp = (PyObject *)(op);    \
        if (_Py_DEC_REFTOTAL  _Py_REF_DEBUG_COMMA       \
        --(_py_decref_tmp)->ob_refcnt != 0)             \
            _Py_CHECK_REFCNT(_py_decref_tmp)            \
        else                                            \
        _Py_Dealloc(_py_decref_tmp);                    \
    } while (0)

It seems to be pretty straightforward, but if we stop a little and try to think, there is still some problems to tackle. What happens if there is a cycle?

foo = Foo()
my_list = []
my_list.append(foo)
foo.list = my_list

In this chunk of code we have a typical example of a cycle.

cycle1

foo has a reference to my_list and vice versa. Now, let’s imagine that our module __main__ don’t hold a reference to those objects anymore.

1
2
del foo
del my_list

cycle2

Seems like a dead end 😱. Those two objects will never be freed, so we end up having one of those memory leaks.

This is one of the trade offs of reference counting, on common cases we get direct deallocation without GC pauses and it’s a simple algorithm. But we need to tackle cycles.

How does python deal with cycles on garbage collection?

It implements an algorithm to handle cycles. How does it work? Every time that we instantiate an object its reference is appended into a global double linked list that holds all the references of living objects. There are 3 lists, each list contains objects that belongs to one generation, from younger to older objects. Those lists have this shape.

PyGC_Head

1
2
3
4
5
6
7
8
typedef union _gc_head {
    struct {
        union _gc_head *gc_next;
        union _gc_head *gc_prev;
        Py_ssize_t gc_refs;
    } gc;
    double dummy;  /* force worst-case alignment */
} PyGC_Head;

Just a double linked list. But this list will be very helpful to deal with cycles as we’ll see soon. As was told before, each time a new object is allocated, garbage collector should be aware of this new kid in town, basically it’s added to PyGC_Head younger generation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
PyObject *
PyTuple_New(Py_ssize_t size)
{
    PyTupleObject *op;
    Py_ssize_t i;
    if (size < 0) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (size < PyTuple_MAXSAVESIZE && (op = free_list[size]) != NULL) {
        free_list[size] = (PyTupleObject *) op->ob_item[0];
        numfree[size]--;
    {
        /* Check for overflow */
        if ((size_t)size > ((size_t)PY_SSIZE_T_MAX - sizeof(PyTupleObject) -
                    sizeof(PyObject *)) / sizeof(PyObject *)) {
            return PyErr_NoMemory();
        }
        op = PyObject_GC_NewVar(PyTupleObject, &PyTuple_Type, size);
        if (op == NULL)
            return NULL;
    }
    for (i=0; i < size; i++)
        op->ob_item[i] = NULL;
        .
        .
        .
    _PyObject_GC_TRACK(op);
    return (PyObject *) op;
}

Here for example a new list is being allocated, notice the call _PyObject_GC_TRACK(op); in line number 21, this is how the garbage collector tracks the new objects. When a cycle detection is fired? Every time that _PyObject_GC_Malloc(nbytes):Modules/gcmodule.c:1718 is performed, the garbage collector checks if the number of objects in the youngest generation is bigger greater than the threshold, if that condition is true a cycle detection starts.

The algorithm is done generation by generation and only if the number of objects that live in that generation is greater than the threshold.

The firs step is to update the actual value of ob_refcnt from the PyObject on PyGC_Head gc_refs variable.

generations

So we end up with this picture, all the objects of the first generation has gc_refs variable updated.

The next step is iterate over all the objects of the generation and using tp_traverse function that is implemented by each type and encapsulates how to visit all the referred objects that a object has and takes a callback to perform some action. Some tp_traverse functions will look like:

1
2
3
4
5
6
7
8
9
10
class dict(object):
    def tp_traverse(self, fn):
        for k, v in self.items():
            fn(k)
            fn(v)

class list(object):
    def tp_traverse(self, fn):
        for v in self:
            fn(v)

This is to illustrate how it works, but those functions are implemented in C.

And how does the callback look like? What it does is decrement by one the number of references only on the objects that have a value higher than 0. So for example if we have this cycle

cycle1

foo only has a reference to my_list so when the algorithm is traversing it decrements by one ob_refcnt of my_list.

So we end up breaking the cycle.

cycle1 cycle1 cycle1

cycle1 cycle1