I, Hacker

Hungry hungry macros 
« Back to blog

Partially Destructive Garbage Collection

While lying in bed and trying to sleep last night, I had an idea I'd
like some feedback on. The idea is partially destructive garbage
collection, which allows the runtime to handle caching, instead of the
programmer. With PDGC, the runtime is allowed to destroy an object
and keep only the relevant data for recreation around. If you're
running a photo management application, for instance, you may want to
keep JPEGs and thumbnail bitmaps in memory, while not keeping the full
bitmap around. In theory, this is a problem that PDGC can solve for
you.
 
At compile-time, the compiler can look at your class and determine
whether or not your object's creation has side-effects, or depends on
factors outside of its control (e.g. IO), and make a determination if
it's partially destroyable. If it is, then the compiler embeds code
into the constructor (and other relevant methods*) to save the data
needed to recreate the object. At run-time, the frequency of access,
object creation time, etc will be tracked, and the PDGC can make the
decision to destroy the object or keep it around. If the object is
destroyed, it doesn't entirely go away; rather, it's replaced with a
stand-in object that stores the relevant data for recreation.
 
Further ideas:

  • You could allow the PDGC to only destroy certain object properties, rather than the full object
  • A more general approach to language-level caching would allow the objects to be saved to disk and loaded lazily. This is something I plan to explore as part of my Renraku research kernel, the details of which I'll be posting on soon.


 
General open questions:

  • What research has been done on this previously? What were the findings?
  • How much of this belongs in the compiler and how much in the runtime? This obviously comes down to the implementation, but I believe there needs to be research into good, general approaches.


 
Implementation questions:

  • Should this be involved with the GC at all? In my mind, they're related, but in a specific implementation they may well be completely separated.
  • What factors are used in determining whether or not an object persists? Factors I can think of are: frequency of access, cost of object recreation, size, amount of free memory, amount of free swap space.
  • Should this be behind-the-scenes compiler magic, or should it be more user controllable? E.g. should the user be able to decide if an object is destroyable at a given point? Should the user be able to override the object recreation portion, rather than letting the constructor do its thing?


 
I'd love to hear your take on it.
 
Happy Hacking,
- Cody Brocious (Daeken)

Loading mentions Retweet

Comments (8)

Jul 02, 2009
brendanjohnston said...
What about Weak References:

http://en.wikipedia.org/wiki/Weak_reference
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ref/WeakReference.html

It seems like you are proposing a layer on top of them. Perhaps with something powerful like Clojure you could do it as a library.

Jul 02, 2009
Cody Brocious said...
Weak references certainly can be used to implement this, although I'm personally thinking about implementing it on top of my own weak refs with finer grained control. Not sure about Java's implementation (never used it), but most don't allow you to have good control over the lifetime of objects.

Thanks for your input :)

Jul 02, 2009
nothingmuch said...
This can be transposed into a simple caching problem, I don't think language level support is necessary.

In Perl using Moose (that's the background I come from, but it should translate well to other dynamic languages), I would implement this as a lazily defined attribute containing a weak reference, and have the builder method store the resulting object in a shared cache before reconstructing it.

The cache would be responsible for the garbage collection policy, whatever it may be, the class's implementation takes care of constructing/fetching this data. Since it's a weak reference, once the cache expires this data the reference in the object is cleared and memory is reclaimed.

From an API POV there is no way to know whether or not the lazy attribute is merely returning the ready data, or locating/reconstructing it.

So basically what you end up with is memoization with a bounded cache and an expiry policy, where the parameterizable key is the object (or its implicit surrogate key in the form of constructor parameters).

The syntactic sugar that Moose provides makes it pretty easy to get the boilerplate for the reconstruction taken care of, and the shared cache provides fine grained control that would be hard if this functionality was intrinsic in the language.

For a more transparent approach you can transform the metaclass, converting simple lazy attributes into weak references which store the result of the vivification in a cache. The code of the class does not need to be aware of this change.

With respect to GC integration, the memory allocator could signal to the cache by means of a hook API that there is no more reclaimable memory, at which point a cache cleanup can be made to attempt to reclaim space (instead of allocating more memory). I don't think there is a need for any more language support than that (and even this can probably be better done with just good cache configuration).

Lastly, this reminded me of another project making interesting memory tradeoffs: http://code.google.com/p/compcache/

Jul 02, 2009
Cody Brocious said...
I agree that this will not be as good as a well designed caching system, but that's not really the goal. The goal is to provide a Good Enough (TM) system of caching (particularly the invalidation side of things) and allow a bit of tunability. GCs rarely give you better performance than finely tuned manual memory management, but they're easy. Same thing here -- I just want a system that automatically handles a general caching system, and you can go from there.
Jul 03, 2009
nothingmuch said...
Based on your reply and the discussion on HN it seems like you'd want to infer the construction logic from an imperative constructor. That's is a lot harder than using a simple LRU cache.

Furthremore, by using immutable data (e.g. purely functional objects) this can be made even simpler.

Implementing a garbage collector that detects live but wasteful data in addition to dead data, especially if you want to make good choices about which objects you evict, especially if you're extending a iterative or worse, a concurrent garbage collector is quite a feat.

I don't think there's much of a case for implementing this as a builtin feature in a language, it'll be less tunable, more complicated, and harder to implement.

A cache doesn't need to be "well designed" or even a "system", it can be just very simple too. Similarly to what you're describing it can also be handled completely outside of the point of view of the class whose data is being cached by using wrapping.

The difference is that this approach can be localized, whereas what you're describing is global. Without making good choices about where the tradeoff is made this GC could become pathologically slow due to repeated reconstruction of popular data (and the frequency of use doesn't necessarily correlate to the reference count), while not helping with memory contention.

Properly instrumenting in order to provide good results is a lot more difficult when you need to do it for potentially every live object in the system, as opposed to only for the ones you are pretty sure will be worth it.

A Good Enoughâ„¢ solution is usually easier to get by using targetted methodologies as opposed to a sweeping change in behavior. The real question is "good enough for what?". When you have found the problematic data, then you can address it specifically.

This is the same kind of tradeoff as optimizing only the hot path as demonstrated using a profiler, as opposed to prematurely optimizing.

Jul 03, 2009
ironfroggy said...
Have pondered this before. I think you need to look at it from a low-level perspective, first. Should memory managers be adding memdroppable() and memswappable() to teach the OS what you need, what you can deal with being swapped, and what you can have freed? Note that you'd probably need to return some handle from memdroppable, which you can use to check if its still there and to reclaim it when you actually use it.
Jul 03, 2009
nothingmuch said...
madvise()
Oct 29, 2009
Andreas said...
So...what is the kernel architecture of renraku???
Is it a monolithic kernel or a microkernel???

Leave a comment...

 
Got an account with one of these? Login here, or just enter your comment below.
Posterous-login    twitter