Rhope Blog

Blog Home Downloads Documentation RSS

Taking Out the Garbage

by Mike Pavone

There are two main approaches to automatic memory management: reference counting and garbage collection. Reference counting is probably the simpler of the two the implement. Each object gets a reference counter. When a new reference to the object is created the counter is incremented and when a reference is destroyed the counter is decremented. When the counter reaches zero the object is freed. Garbage collection is a little more complicated. Basically, unreferenced objects are allowed to pile up for a while until some threshold is reached and then the garbage collector goes through the stack and globals looking for live objects (i.e. ones that are still pointed to directly or indirectly by a stack or global variable) which are kept. Everything else is freed. The exact mechanism by which that is accomplished varies somewhat based on the type of garbage collector used, but that's the basic idea.

As always, when there's more than one way to implement a feature there are some tradeoffs involved. The three biggest problems with reference counting are performance, cyclic references and the space overhead of the counter itself. There's not much to say about the performance issue. Updating a counter every time you copy or destroy a pointer adds a significant amount of overhead, especially when it needs to be done atomically. Similarly the space overhead is a simple to understand problem. Cyclic references are a little more complicated. The simplest instance of this is when an object contains a pointer to itself. In such a case, the reference count of the object will never reach zero unless the pointer to itself is manually removed. For that simple case there is a simple solution, but an object can contain a pointer to another object that points back to the first or their could be a whole chain of objects that eventually point back to the beginning. You can either put the burden of managing the problem on the shoulders of the user of the language via weak references and/or having them manually break cycles or you use a garbage collector.

Garbage collectors aren't without their flaws either. To my knowledge, the types of garbage collectors that have the best performance characteristics pause the execution of the entire program to run the garbage collector and collection is done serially. {http://en.wikipedia.org/wiki/Amdahls_law Amdahl's Law} tells us that the maximum speedup from adding more cores/processors is limited by the percentage of the computation that is serial and garbage collection is often not an insignificant percentage. There are parallel garbage collectors, but my understanding is that they are currently considerably slower than the state of the art in non-parallel collectors. Another smaller problem with garbage collection is that in order to be fast, you typically have to let it accumulate a significant amount of garbage between collections which requires the set of allocated memory to be quite a bit larger than the actual memory being used. The final major problem is that the garbage collector is really only take into account memory resources in use when deciding when to do a collection, but things like file handles and other I/O related resources are often held by objects as well. In a garbage collected environment, it might be quite a long time between when an object goes out of scope and when its destructor is finally called. This can lead to all sorts of problems.

Currently, Rhope uses reference counting. This is largely due to the fact that I didn't know the first thing about implementing a garbage collector when I started work on the language, but in some ways it fits the language well. At the moment, it's not really possible to create cyclic references as most operations are either completely functional or don't take arguments that could have references to other objects. The one exception to this is the GUI API, but even there it's not likely to be an issue in most circumstances so the biggest issue with reference counting isn't a major issue for Rhope. Reference counting brings a couple of benefits too one being that it allows the language to make updates in place in some situations and the other is that I/O objects don't have to be explicitly freed/closed.

The problem is at some point I want to add the ability for a user to define mutable objects. Once I do that I can't really ignore the problem of cyclic references anymore. For the most part I'm okay with the tradeoffs of switching to a garbage collector, but I'm not happy about the idea of having to explicitly free I/O resources. I did come across one research paper about a {http://www.cs.umass.edu/~emery/classes/cmpsci691s-fall2004/papers/ulterior-reference-counting.pdf hybrid reference counter/generational collector} that looks interesting, but I haven't looked at it closely enough to see if it does anything to solve the I/O object problem. Fortunately I have a while to look for a good solution as I want to get at least a basic native code compiler working before I worry about mutable user defined objects.

comment by Mike Pavone

Seems I messed up the link syntax, oops. I really need to add an edit and/or preview post feature to this blog.

Username: Password:
Don't have an account? Register now!