Garbage collection (computer science)

In computing, garbage collection (also known as GC) is a form of automatic memory management. The garbage collector or collector attempts to reclaim the memory used by objects that will never be accessed again by the application or mutator. Garbage collection was invented by John McCarthy around 1959 to solve the problems of manual memory management in his recently devised Lisp programming language.

Many computer languages require garbage collection, either as part of the language specification (e.g. Java, Dylan) or effectively for practical implementation (e.g. formal languages like lambda calculus); these are said to be garbage-collected languages. Other languages were designed for use with manual memory management, but have garbage collected implementations (e.g., C, C++). In either case, it is far easier to implement garbage collection as part of the language's compiler and run-time system, but post hoc GC systems exist, including ones that do not require recompilation. The garbage collector will almost always be closely integrated with the memory allocator.

The Symbolics Lisp machine was unique in being the only successful commercial system that had hardware support for garbage collection.

The basic principle of how a garbage collector works is:

  1. Determine what data objects in a program will not be accessed in the future
  2. Reclaim the storage used by those objects

Although in general it's impossible to know the moment an object has been used for the last time, garbage collectors use conservative estimates that allow them to identify when an object could not possibly be referenced in the future. For example, if there are no references to an object in the system, then it can never be referenced again. This is the criterion used by most modern garbage-collection systems, but see Disadvantages of Tracing Garbage Collectors below.

While garbage collection assists the management of memory, the feature is also almost always necessary in order to make a programming language type safe, because it prevents several classes of runtime errors. For example, it prevents dangling pointer errors, where a reference to a deallocated object is used.

Contents

Tracing garbage collectors

Tracing garbage collectors are the most common type of garbage collector. They focus on determining which objects are reachable by the mutator, and then discard all remaining objects.

Reachability of an object

More formally, objects can be reachable in only two ways:

  1. A distinguished set of objects are assumed to be reachable -- these are known as the roots. In a typical system these objects will be the machine registers, the stack, the instruction pointer, the global variables; in other words, everything that a program can reach directly.
  2. Anything referenced from a reachable object is itself reachable. This is referred to as transitivity.

Informally, a reachable object is one that the program could get to by starting at an object it can reach directly, and then following a chain of pointer references.

Basic algorithm

Tracing garbage collectors perform garbage collection cycles. A cycle is started when the collector decides (or is notified) that it needs to reclaim storage, which in particular happens when the system is low on memory. All tracing garbage collectors use some variant of tri-colour marking which works as follows:

  1. Create initial white, grey, and black sets; these sets will be used to maintain progress during the cycle. Initially the white set or condemned set is the set of objects that are candidates for having their memory recycled. The black set is the set of objects that cheaply can be proven to have no references to objects in the white set; in many implementations the black set starts off empty. The grey set is all the remaining objects that may or may not have references to objects in the white set (and elsewhere). These sets partition memory; every object in the system, including the root set, is in precisely one set.
  2. (This step is repeated until the grey set is empty.) Pick an object from the grey set. Blacken this object (move it to the black set), by greying all the white objects it references directly.
  3. When there are no more objects in the grey set, then all the objects remaining in the white set are provably not reachable and the storage occupied by them can be reclaimed.

The tri-colour marking algorithm preserves an important invariant:

No black object points directly to a white object.

Notice how this invariant is preserved: The initial partition has no objects that reference white objects; this can trivially be achieved by making the initial black set empty. Subsequently whenever an object is made black any white objects that it references are made grey, ensuring that the invariant remains true. When the last step of the algorithm is reached, because the tricolour invariant holds, none of the objects in the black set point to any of the objects in the white set (and there are no grey objects) so the white objects must be unreachable from the roots, and the system calls their finalisers and frees their storage.

Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.

Variants of the basic algorithm

The basic algorithm has several variants.

  • There is the issue of whether the garbage collector moves objects in memory (that is, changes their address) or not: moving or non-moving GC.
  • Some collectors can correctly identify all pointers (references) in an object; these are called "precise" (or "exact"; or "accurate") collectors, the opposite being a "conservative" or "partly conservative" collector. "Conservative" collectors have to assume that any bit pattern in memory is a pointer if (when interpreted as a pointer) it would point into any allocated object. Thus, conservative collectors may have some false negatives, where storage is not released because of accidental fake pointers. However, this is not a big drawback in practice.
  • There is the issue of whether the garbage collector can run interleaved or in parallel with any of the rest of the system: the simplest garbage collectors stop the rest of the system while they perform a cycle; they are non-incremental, whereas incremental collectors interleave their work with units of activity from the rest of the system. Some incremental collectors can run fully parallel in a separate thread; these can theoretically run on a separate CPU, but cache issues make this less practical than it may initially appear. Incremental collectors generally require some form of read barrier or write barrier so they can do necessary work just before the mutator accesses memory; such barriers can be implemented in software (as part of the language) or hardware (by protecting memory pages). Note that incremental collectors still have to scan the root set (strictly the unprotectable set) in one go while all mutator threads are paused; this is known as the flip as the mutator itself changes from grey to black.

Mark and sweep

Tracing collectors can also be divided by considering how the three sets (of white, grey, and black objects) are implemented. A mark-sweep GC maintains a bit (or two) with each object to record whether it is white or black; the grey set is either maintained as a separate list or using another bit. A Copying GC is a moving GC which, in its simplest form, moves all reachable objects to a single memory area, and then reuses all memory outside this area. Copying GC has the important benefit of resisting fragmentation and improving locality of reference.

Generational GC (aka Ephemeral GC)

There is the issue of when to perform a cycle and what objects to place in the initial grey set. A simple collector will always condemn (put in the white set) everything except the root set, and will put the root set in the grey set, leaving the initial black set empty.

Statistically speaking, the objects most recently created in the runtime system are also those most likely to quickly become unreachable (known as infant mortality or the generational hypothesis). A Generational GC divides objects into generations and will generally only place the objects of a subset of all the generations into the initial white set. This can result in faster collection cycles.

Disadvantages of tracing garbage collectors

The primary problems with tracing garbage collection arise from the nature of how it is invoked. A collection cycle can occur at any time and may take an arbitrarily long amount of time. While acceptable in many applications, when timing and quick response are critical, like in real-time computing, it may cause disaster. Incremental garbage collection helps alleviate this problem, but it is still difficult to make hard real-time guarantees.

A more fundamental issue is that garbage collectors violate locality of reference, since they deliberately go out of their way to find bits of memory that haven't been accessed recently. This is an issue because modern computer architectures get more and more of their performance from manifold intricate levels of caching, which depend crucially for their effectiveness on the assumption of locality of reference. In other words, some garbage collection methods may be cache-hostile. However, certain garbage collection methods result in better locality of reference than explicit garbage collection. Generational garbage collection is cache friendly, and copying collectors automatically defragment memory helping to keep related data together.

The other main problem is that a large amount of garbage can build up very quickly, particularly in functional programming languages, before the garbage collector has a chance to collect any of it, resulting in allocation inefficiencies, a temporary bloating in the image size, and a long collection cycle once it occurs; such garbage collection is not prompt.

To see how tracing garbage collectors are not prompt, consider this example:

 loop
     x := new object
     x := 0

With a naïve tracing garbage collector, each time through the loop more and more memory is allocated, until all the memory in the system is consumed and the garbage collector is forcibly invoked. It's clear why this is unacceptable when we realize that this code never needs more than the memory required for one object!

(That's a bad example because it's a case where GC can easily be much faster than explicit memory management. With GC, "new" just increments a pointer in the young generation pool, which is about as fast as stack allocation. Collection time is proportional to number of live objects, not number of allocated objects, so if no new live objects are created by the loop, collection takes very little time. In contrast, explicit memory management requires one call to "new" and "free" per loop iteration, and at least one of those must be slower than the GC version of "new". Still, explicit memory management is probably better if you need hard realtime guarantees. )

Finally, a fault shared by almost all garbage collectors is the conservative assumption that memory is still in use if it is still reachable. In many systems, references are kept long after they cease to be used. A popular research area is to find less conservative ways of collecting objects that will never be used again; that is, to safely collect objects to which references still exist. In systems with standard garbage-collection, freeing of this memory can be accelerated by explicitly destroying pointers, but this forces some of the responsibility of memory management back on the programmer.

As one simple example, consider the command-line arguments passed to the startup function. Typically, the program which uses them will parse these arguments and then perform a command accordingly, never touching the arguments themselves again. But they are still kept in memory, because references to them still exist at the bottom of the call stack, in the entry function. Although these objects occupy minimal memory, similar situations often occur with very large or numerous objects.

(That's a bad example because in programs without GC, the command-line arguments are never freed either, there's no safe way to free them, but in a program with GC, the command-line arguments will sometimes get freed if they become inaccessible, so GC wins in that case. A better example is a symbol table that maps names to values. Nothing in the symbol table can ever be collected until the entire symbol table is inaccessible, so it may still be necessary to do some explicit memory management with reference counting or similar. In some cases, weak references can be used to avoid explicit memory management.)

All of these are often used as arguments against garbage collection and for explicit memory handling, particularly in time-critical applications. However, other forms of garbage collection do not necessarily share all of these faults, although the last is nearly universal, but they may have different faults of their own; all options should be considered when deciding whether to use garbage collection.

Reference counting

In contrast to garbage collection, reference counting is a form of automatic memory management where each object has a count of the number of references to it, and the object's memory is reclaimed when the count reaches zero. Reference counting has some advantages: it is easy to implement and its execution time is more predictable because there is no need to pause the program to look for unreferenced objects (but note that a single overwritten reference can still make arbitrarily large numbers of objects available for recycling). However, the performance of systems using reference counting can be slower than other garbage collection techniques, because of the need to update reference counts whenever a reference is changed. Reference counting also requires some space in each object. Simple reference counting also fails to reclaim the memory used by data structures that have cycles (such as doubly linked lists). See reference counting for more information.

Implementations

Programming languages whose standard implementations include automatic garbage collection:

See also

External links

fr:Ramasse-miettes he:איסוף זבל (תכנות) ja:ガベージコレクション pl:Garbage collection

Navigation

  • Art and Cultures
    • Art (https://academickids.com/encyclopedia/index.php/Art)
    • Architecture (https://academickids.com/encyclopedia/index.php/Architecture)
    • Cultures (https://www.academickids.com/encyclopedia/index.php/Cultures)
    • Music (https://www.academickids.com/encyclopedia/index.php/Music)
    • Musical Instruments (http://academickids.com/encyclopedia/index.php/List_of_musical_instruments)
  • Biographies (http://www.academickids.com/encyclopedia/index.php/Biographies)
  • Clipart (http://www.academickids.com/encyclopedia/index.php/Clipart)
  • Geography (http://www.academickids.com/encyclopedia/index.php/Geography)
    • Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
    • Maps (http://www.academickids.com/encyclopedia/index.php/Maps)
    • Flags (http://www.academickids.com/encyclopedia/index.php/Flags)
    • Continents (http://www.academickids.com/encyclopedia/index.php/Continents)
  • History (http://www.academickids.com/encyclopedia/index.php/History)
    • Ancient Civilizations (http://www.academickids.com/encyclopedia/index.php/Ancient_Civilizations)
    • Industrial Revolution (http://www.academickids.com/encyclopedia/index.php/Industrial_Revolution)
    • Middle Ages (http://www.academickids.com/encyclopedia/index.php/Middle_Ages)
    • Prehistory (http://www.academickids.com/encyclopedia/index.php/Prehistory)
    • Renaissance (http://www.academickids.com/encyclopedia/index.php/Renaissance)
    • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
    • United States (http://www.academickids.com/encyclopedia/index.php/United_States)
    • Wars (http://www.academickids.com/encyclopedia/index.php/Wars)
    • World History (http://www.academickids.com/encyclopedia/index.php/History_of_the_world)
  • Human Body (http://www.academickids.com/encyclopedia/index.php/Human_Body)
  • Mathematics (http://www.academickids.com/encyclopedia/index.php/Mathematics)
  • Reference (http://www.academickids.com/encyclopedia/index.php/Reference)
  • Science (http://www.academickids.com/encyclopedia/index.php/Science)
    • Animals (http://www.academickids.com/encyclopedia/index.php/Animals)
    • Aviation (http://www.academickids.com/encyclopedia/index.php/Aviation)
    • Dinosaurs (http://www.academickids.com/encyclopedia/index.php/Dinosaurs)
    • Earth (http://www.academickids.com/encyclopedia/index.php/Earth)
    • Inventions (http://www.academickids.com/encyclopedia/index.php/Inventions)
    • Physical Science (http://www.academickids.com/encyclopedia/index.php/Physical_Science)
    • Plants (http://www.academickids.com/encyclopedia/index.php/Plants)
    • Scientists (http://www.academickids.com/encyclopedia/index.php/Scientists)
  • Social Studies (http://www.academickids.com/encyclopedia/index.php/Social_Studies)
    • Anthropology (http://www.academickids.com/encyclopedia/index.php/Anthropology)
    • Economics (http://www.academickids.com/encyclopedia/index.php/Economics)
    • Government (http://www.academickids.com/encyclopedia/index.php/Government)
    • Religion (http://www.academickids.com/encyclopedia/index.php/Religion)
    • Holidays (http://www.academickids.com/encyclopedia/index.php/Holidays)
  • Space and Astronomy
    • Solar System (http://www.academickids.com/encyclopedia/index.php/Solar_System)
    • Planets (http://www.academickids.com/encyclopedia/index.php/Planets)
  • Sports (http://www.academickids.com/encyclopedia/index.php/Sports)
  • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
  • Weather (http://www.academickids.com/encyclopedia/index.php/Weather)
  • US States (http://www.academickids.com/encyclopedia/index.php/US_States)

Information

  • Home Page (http://academickids.com/encyclopedia/index.php)
  • Contact Us (http://www.academickids.com/encyclopedia/index.php/Contactus)

  • Clip Art (http://classroomclipart.com)
Toolbox
Personal tools