C++ Virtual Functions
Virtual functions are at the heart of object-oriented programming and runtime polymorphism in C++. Countless programmers rely on them for creating and operating intuitively on large class hierarchies. They are a vital part of the language. But how are they actually implemented by the compiler?
Their implementation details is a common C++ question. The usual answer involves the mention of a pointer to a table of functions. But what exactly does that table contain? What part of the implementation details are done at compile time and what is done at runtime? In this article I’ll take a closer look at what happens behind the scenes when virtual functions are involved.
It’s important to note that the C++ Standard does not specify how virtual functions should be implemented so it’s entirely up to each compiler how they solve it.
For reference, at the time of writing I’m using the following compiler1 and architecture:
$ clang++ --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Let’s begin with some background.
Polymorphism
C++ effectively supports three types of polymorphism:
- Function overload (compile time)
- Templates (compile time)
- Virtual functions (runtime)
Virtual functions allows for late binding of function calls based on object type. It comes into play when a derived object is addressed via a pointer or a reference to a base class. They effectively enable a inerhitable common interface with potentially overriden implementation in the derived classes.
In order to support this late binding of function calls the compiler needs to augment the qualifying objects with information so that the function calls will be possible at runtime. In order to understand this augmentation, let’s first look at how class object are represented in memory.
Object Memory Layout
Only nonstatic data members are part of an object. Member functions and static data members, despite being part of the class declaration, are “hosted” outside the object. The nonstatic data members are laid out in memory in the order of their declaration:
class Foo {
public:
void SomeFunction();
private:
static const int n { 42 };
int p { 5 };
int q { 7 };
};
Foo f;
f will be represented in memory as:
0: +---+
| p |
4: +---+
| q |
8: +---+
We can confirm this with a debugger:
(lldb) print sizeof(f)
(unsigned long) $4 = 8
(lldb) x/8b &f
0x7fff5fbffb70: 0x05 0x00 0x00 0x00 0x07 0x00 0x00 0x00
| | | |
.................. ...................
p q
Indeed it’s only the nonstatic data members that
contributes the the object size. Well, that and any compiler augmentation
that may go into it - potential padding of the nonstatic data members, as
well as the virtual pointer: vptr
.
We can check for compiler added padding by inspecting the objects memory as before:
class Foo {
public:
char c[3] { 0, 0, 0}; // 3 bytes
int p { 5 }; // 4 bytes
};
Foo f;
(lldb) print sizeof(f)
(unsigned long) $0 = 8
(lldb) x/8b &f
0x7fff5fbffb60: 0x00 0x00 0x00 0x5f 0x05 0x00 0x00 0x00
| | | | | |
.............. .... ...................
c padding p
Here the compiler has added 1 byte of padding to align c on a 4-byte boundary.
Base class nonstatic data members are contained directly in the derived class object:
class Base {
public:
int x { 3 };
};
class Derived : public Base {
public:
int p { 5 };
int q { 7 };
};
Derived d;
Again, we verify using a debugger:
(lldb) print sizeof(d)
(unsigned long) $0 = 12
(lldb) x/12b &d
0x7fff5fbffb60: 0x03 0x00 0x00 0x00 0x05 0x00 0x00 0x00
0x7fff5fbffb68: 0x07 0x00 0x00 0x00
Base class nonstatic data members are laid out in the derived object exactly as they are in the base class, including any padding:
class Base {
public:
char c[3] { 0, 0, 0 };
int x { 3 };
};
class Derived : public Base {
public:
char d;
};
Derived d;
(lldb) print sizeof(d)
(unsigned long) $1 = 12
(lldb) x/12b &d
0x7fff5fbffb50: 0x00 0x00 0x00 0x00 0x03 0x00 0x00 0x00
| | | | | |
.............. .... ...................
c padding x
0x7fff5fbffb58: 0x00 0x00 0x00 0x00
| | | |
.... ..............
d padding
Here we might expect that the size of a Derived object would be 8 bytes, the total size of the nonstatic data members in the two classes. However, Base has been padded with 1 byte to align c on a 4-byte boundary. This padding is carried over to the derived class. At this point the Derived object is now 9 bytes, which the compiler pads with an additional 3 bytes to align d on a 4-byte boundary. Hence the final size of 12 bytes, with effectively 4 bytes wasted due to alignment padding.
That may sound insignificant but imagine that Derived was instead a Particle in a particle system. Imagine further that there was 500,000 particles active in this system, then we’d be wasting 2 MB due to padding. 2 MB might not sound too bad either, but when you consider that the total memory usage in this case is 6 MB and you’re wasting 30% of that on padding you realise that these things adds up quickly.
Of course there’s a good reason for the compiler adding this padding - performance. The CPU’s load and store operations performs the best when it’s working with its “natural data size”, which is a word.2
Now, let’s see what happens when we add a virtual function:
class Foo {
public:
virtual ~Foo();
int p { 5 }; // 4 bytes
};
Foo f;
First let’s check the size of the object.
(lldb) print sizeof(f)
(unsigned long) $0 = 16
Interesting, 16 bytes yet we only have a 4-byte data member. This implies that the compiler has augmented our object. We can guess with what at this point, padding and a virtual pointer due to the virtual function being present. Let’s have a look at the object:
(lldb) x/16b &f
0x7fff5fbffb48: 0x30 0x10 0x00 0x00 0x01 0x00 0x00 0x00
| |
.......................................
virtual pointer
0x7fff5fbffb50: 0x05 0x00 0x00 0x00 0xff 0x7f 0x00 0x00
| | | |
................... ...................
p padding
This memory dump also highlights an important fact; the compiler has inserted
the vptr
at the start of the object. Why? For performance reasons.
Let’s take a closer look at the virtual pointer and virtual table layout.
Virtual Pointer and Virtual Table
As soon as a class either derive from a virtual base class or has
virtual functions either directly or from inheritance the compiler will
synthesize a pointer into the class object. This is the virtual pointer,
vptr
, and it points to a virtual table, vtable
. The compiler will
add code to the constructor to initialize it, and to the destructor for
deletion.
The virtual table contains the following:
- Virtual function dispatch information
- Offsets to virtual base class subobjets and top of table
- Object
Run-Time Type Information
(RTTI
)
The set of virtual functions you can invoke on an object is known at compile time and it’s invariant, meaning it can’t change during runtime. Thus the virtual table is set up during compilation. Each virtual function gets assigned a fixed position in the virtual table that remains the same throughout class inheritance.
The compiler will transform a virtual function call:
// Assuming SomeFunction() is a virtual functions, this call
ptr->SomeFunction();
// will be tranformed into something like this:
(*ptr->__vptr[n])(ptr)
Where n is the associated slot in the virtual table. Note how the pointer
itself is passed as the first argument to the function; that corresponds to the
this
pointer.
The virtual table is ordered based on the function declaration order in the class. For example:
class Foo {
public:
virtual ~Foo();
virtual void SomeFunction();
int p { 5 };
int q { 5 };
};
Foo f;
// Virtual table for f (simplified):
[ 0 ] - ~Foo()
[ 1 ] - SomeFunction()
As always a debugger is our friend:
(lldb) x/4w &f
0x7fff5fbffb68: 0x00001030 0x00000001 0x00000005 0x00000005
| | | | | |
..................... .......... ..........
virtual pointer p q
Let’s look at virtual table associated with f:
(lldb) x/5a 0x100001030
0x100001030: 0x0000000100000f00 a.out`Foo::~Foo() at foo.cc:8
0x100001038: 0x0000000100000f50 a.out`Foo::~Foo() at foo.cc:8
0x100001040: 0x0000000100000f80 a.out`Foo::SomeFunction() at foo.cc:9
0x100001048: 0x00007fff78b3bb48 vtable for __cxxabiv1::__class_type_info + 16
0x100001050: 0x0000000100000fb0 a.out`typeinfo name for Foo
Oh, interesting. There’s two destructors in the virtual table. How come? It
turns out that destructors come in pairs as a complete destructor
and a
deleting destructor
. The first one destroys the object without calling delete
on it, and the second calls deletes on the object after its destroyed.
The __cxxabiv1
shown in the table is a compiler internal namespace, and in
clang’s case is where we find support for dynamic_cast
.
However, where’s the RTTI
stored? It’s actually stored at a
negative index in the virtual table. So the virtual pointer and table setup
actually looks like:
f:
+--------+ +---------------+
| __vptr |----+ | offset_to_top | -2
+--------+ | +---------------+
| p | | | RTTI | -1
+--------+ | +---------------+
| q | +---> | ~Foo() | 0
+--------+ +---------------+
| SomeFunction | 1
+---------------+
Now that we have a good understanding of how objects are laid out in memory and what the virtual table looks like, let’s see how inheritance influences things.
Single Inheritance
The vptr
and vtable
behaves much as can be expected during single
inheritance from a base class with virtual functions. The derived class' virtual
table contains either pointers to the base class functions, or it’s own if it
has overriden them. If the derived class adds virtual functions of its own they
will be added after the base class functions in the virtual table.
Simplified it looks like this:
class Base {
public:
virtual ~Base();
virtual void SomeFunction();
};
class Derived : public Base {
public:
virtual ~Derived();
virtual void AnotherFunction();
};
Derived d;
// Virtual table for d:
[ 0 ] - Derived::~Derived
[ 1 ] - Base::SomeFunction
[ 3 ] - Derived::AnotherFunction
// If 'd' had overriden SomeFunction() it would look like this:
[ 0 ] - Derived::~Derived
[ 1 ] - Derived::SomeFunction
[ 2 ] - Derived::AnotherFunction
This is pretty much as expected. It gets slightly more complicated with multiple inheritance.
Multiple Inheritance
Remeber how with single inheritance the base data member are contained at the start of the derived object? This effectively means that under single inheritance the Base and Derived part of the object points to the same memory.
This is not the case for subsequent base classes in mulitple inheritance. And
therein lies the complexities - multiple inheritance requires patching the
location of the this
pointer, as well as the pointer addressed via subsequent
base class subobjects. The vptr
and vtable
handling also gets
more complicated - we’re going to have to store more information, and we’re
going to end up with two or more virtual pointers!
Let’s first consider the base object pointer patching. Let’s say we have a class hierarchy like this:
class Base0 {
public:
virtual ~Base0();
};
class Base1 {
public:
virtual ~Base1();
};
class Derived : public Base0, public Base1 {
public:
virtual ~Derived();
};
In memory a Derived object will be laid out like this:
Derived:
+---------+ 0
| Base0 |
+---------+
| Base1 |
+---------+
| Derived |
+---------+ n
That means we can easily do a conversion from Derived to Base0 because the start of Derived and the start of Base0 points to the same address:
Derived* d = new Derived;
Base0* b = d;
However, what happens if we want to assign a Derived object to a Base1 pointer which is not at the same address? The compiler will add an offset:
// For this:
Derived* d = new Derived;
Base1* b = d;
// the compiler will transform the code to (via vtable):
Base1* b = d + sizeof(Base0);
A similar patching process also happens on function calls where a base virtual
function is called via a pointer to a derived object. This is the patching of
the this
pointer, and it’s handled by a thunk
.
A thunk
is a short code snippet that’s associated with
a function. It is called before the function to do any pointer patching
required before it transfers control to the actual function. For simplicity we
can imagine it looks like this:
// vtable with thunk:
[ 0 ] - __function_thunk
[ 1 ] - function
// then for a virtual function call needing pointer adjustment:
ptr->function();
// becomes:
ptr->__function_thunk(ptr);
__function_thunk:
ptr += offset;
function(ptr);
Both of these transformation happens at runtime because the type of object being addressed is not known at compile time.3
Finally let’s look at what happens in terms of the vptr
and vtable
setup by
examining the multiple inheritance hierarchy defined above:
(lldb) p sizeof(d)
(unsigned long) $0 = 16
(lldb) x/4w &d
0x7fff5fbffb68: 0x00001040 0x00000001 0x00001060 0x00000001
| | | |
..................... .....................
Base0 _vptr Base1 _vptr
The derived class object ends up with a virtual table for each base
class that has one. This set is made up of a primary virtual table
and
secondary virtual table
. The secondary tables have the same content as the
primary one, except that the RTTI
is that of the derived class instead of the
base.
It looks like this:
+----------------+
| offset_to_top |
Derived: +----------------+
+-------------+ | Derived RTTI |
| _vptr_Base0 |---+ +----------------+
+-------------+ +--> | Base0 virtuals |
| ... | +----------------+
+-------------+ | ... |
| _vptr_Base1 |---+ +----------------+
+-------------+ | | offset_to_top |
| ... | | +----------------+
+-------------+ | | Derived RTTI |
| +----------------+
+--> | Base1 virtuals |
+----------------+
| ... |
+----------------+
The reason we end up with multiple virtual pointers and tables is to support the object address adjustment mentioned above. If we pass a pointer-to-Derived object to a function taking a pointer-to-Base1 pointer we pass in an object whose address has been adjusted to start at _vptr_Base1. Thus any virtual function calls will map into the correct slot in the virtual table.
This is also the reason we end up with the same content in the virtual tables - for better runtime performance. If the entries wasn’t duplicated then more runtime pointer adjustments would have to take place. With this setup we just need one adjustment, and then call into virtual table as normal.
Finally, let’s take a look at virtual inheritance.
Virtual Inheritance
Let’s consider the simple case with only one virtual base class:
class Base {
public:
~Base();
int p { 5 };
};
class Derived : virtual public Base {
public:
~Derived();
int q { 7 };
};
Derived d;
As usual, let’s check the size and memory layout:
(lldb) p sizeof(d)
(unsigned long) $0 = 16
(lldb) x/4w &d
0x7fff5fbffb68: 0x00001028 0x00000001 0x00000007 0x00000005
| | | | | |
..................... .......... ..........
virtual pointer q p
Oh, this is interesting. We immediately see that the virtual base nonstatic data members are laid out in memory after the derived class members. This is different from non-virtual inheritance where the base class members came first. This means that our base and derived doesn’t start on the same address, like it does for non-virtual inheritance. The virtual base subobject is contained directly in Derived, which makes sense since there can only be one copy of a virtual base subobject.
Let’s take a look at the virtual table:
(lldb) x/7a 0x0000000100001028
0x100001028: 0x0000000100001028 VTT for Derived
0x100001030: 0x00007fff7d76cb48 vtable for __cxxabiv1::__class_type_info + 16
0x100001038: 0x0000000100000fa3 a.out`typeinfo name for Base
0x100001040: 0x00007fff7d76cc28 vtable for __cxxabiv1::__vmi_class_type_info + 16
0x100001048: 0x0000000100000f9a a.out`typeinfo name for Derived
0x100001050: 0x0000000100000000 a.out`_mh_execute_header
0x100001058: 0x0000000100001030 typeinfo for Base
Curious, what’s this VTT for Derived? During object construction the object
takes the form of the current class for whose constructor is being executed. So
during the Base constructor execution the Dervied object we’re creating is of
type Base. During this construction process the compiler needs to make sure that the
virtual pointer points to the correct virtual table. This information is stored
in the Virtual Table Table
, or VTT
, in the form of construction vtable
as
well as the non-construction virtual tables.
Finally let’s take a look at the classic diamond shaped inheritance graph:
class Root {
public:
virtual ~Root();
int a { 3 };
};
class Left : virtual public Root {
public:
virtual ~Left();
int b { 5 };
};
class Right : virtual public Root {
public:
virtual ~Right();
int c { 7 };
};
class Derived : public Left, public Right {
public:
virtual ~Derived();
int d { 9 };
};
Derived d;
Let’s inspect the memory layout:
(lldb) p sizeof(d)
(unsigned long) $0 = 40
(lldb) x/10w &d
0x7fff5fbffb20: 0x00001028 0x00000001 0x00000005 0x00000000
| | | | | |
..................... .......... ..........
Left __vptr Left: a padding
0x7fff5fbffb30: 0x00001040 0x00000001 0x00000007 0x00000009
| | | | | |
..................... .......... ..........
Right __vptr Right: c Derived: d
0x7fff5fbffb40: 0x00000003 0x00007fff
| | | |
.......... ..........
Root: a padding
Here we again see that there’s only one virtual base object and that it’s contained directly in the Derived object. What does the virtual table look like in this instance? It’s similar to the one for regular multiple inheritance except we have another offset pointer, this time to the virtual base subject contained in Derived. The virtual table looks like this:
+---------------+
| vbase_offset |
+---------------+
| offset_to_top |
+---------------+
| RTTI |
+---------------+
| Left entries |
+---------------+
| ... |
+---------------+
| vbase_offset |
+---------------+
| offset_to_top |
+---------------+
| RTTI |
+---------------+
| Right entries |
+---------------+
| ... +
+---------------+
| vbase_offset |
+---------------+
| offset_to_top |
+---------------+
| RTTI |
+---------------+
|Derived Entries|
+---------------+
| ... |
+---------------+
That’s quite a lot of information. With this it’s also easy to see the potential memory overhead of supporting large inheritance graphs with virtual base classes, especially if there’s a lot of virtual functions.
Please note that for all the multiple inheritance examples in this article, if Derived had added any virtual functions of its own we would’ve gotten yet another virtual pointer and virtual table entries for that as well, as demonstrated by this last diagram. It’s handled in the same way as the the other virtual pointers.
Summary
Virtual functions are at the heart of designing intuitive class hierarchy interfaces. The implementation support for them is quite intuitive and allows for good runtime performance, at the cost of some memory overhead. When designing designing these class hierarchies its worth considering the object layout to minimize wasting memory due to padding for alignment.
While there is some runtime overhead for invoking virtual functions, don’t assume they are much more expensive than normal function calls without proper profiling.
- A class has one or more
virtual pointers
andvirtual tables
when it has virtual functions, or if it has a virtual base class - The virtual pointer is initialised by the constructor
- The virtual table is constructed during compilation
- Each virtual function has a fixed index into the virtual table
- There may be runtime base pointer offset and
this
pointer patching - There can only be one virtual base subobject and it’s contained directly in the most derived class object
- The compiler pads objects for efficient load/store operations
- During construction of a hierarchy with virtual base classe(s) the compiler
makes use of
virtual table table (VTT)
to pointvptr
to correctvtable
-
See my article Embracing Compiler Errors For Fun And Profit for more details on recommended compiler flags to use. ↩︎
-
The size of a word is architecture dependent. ↩︎
-
Generally speaking this is the case. ↩︎