In this article, I’d like to explain how MSVC handles the Run-Time Type Information (RTTI). In C++, RTTI can be used by dynamic_cast, typeid or exception handling. I won’t go into details about each of the mentioned use cases but rather I’ll focus on MSVC implementation of the RTTI.
For this article, I’m going to assume x64 architecture is used. The x32 doesn’t differ much when compared to x64, but to keep things more compact I decided to focus only on the more relevant architecture.
The RTTI data is generated for polymorphic types. This means we need to add at least one virtual function to a struct/class for the compiler to generate the data that we can inspect. The RTTI data is going to be placed in the .rdata section of the application.
Let’s take a look at what happens when we have this example:
struct ParentA
{
virtual ~ParentA() = default;
};
struct ParentB
{
virtual ~ParentB() = default;
};
struct SomeClass : ParentA, ParentB
{
virtual ~SomeClass() = default;
virtual int getNum() { return 2; }
};
int main()
{
ParentB* obj = new SomeClass;
delete obj;
}
By using an undocumented compiler’s switch /d1reportSingleClassLayoutSomeClass
(you can read more about it here), we can inspect the SomeClass' layout:
class SomeClass size(16):
+---
0 | +--- (base class ParentA)
0 | | {vfptr}
| +---
8 | +--- (base class ParentB)
8 | | {vfptr}
| +---
+---
SomeClass::$vftable@ParentA@:
| &SomeClass_meta
| 0
0 | &SomeClass::{dtor}
1 | &SomeClass::getNum
SomeClass::$vftable@ParentB@:
| -8
0 | &thunk: this-=8; goto SomeClass::{dtor}
As we can see, the SomeClass' instance will contain two vfptr
entries. The vfptr
is a pointer to vftable
which contains virtual function pointers. The first vfptr
is going to be shared with the ParentA
sub-object, while the next one is part of the ParentB
sub-object.
Here you can see how the layout looks like when represented by a simple diagram:
vftable
+--------------+
+-->| thunk dtor | 0
| +--------------+
| | meta | -8
| +--------------+
|
SomeClass' instance | vftable
+---------------+ | +--------------+
8 | vfptr +----+ | getNum | 8
+---------------+ +--------------+
0 | vfptr +------->| dtor | 0
+---------------+ +--------------+
| meta | -8
+--------------+
You might start wondering, what is this meta
field below the first vftable
entry. Now, we’re getting into the MSVC ABI’s specific approach to handle the RTTI. The meta
field is a pointer to the RTTICompleteObjectLocator
which, as the name suggests, helps us to locate the complete object. In simple terms, the complete object is the object that we’ve created, in this case, it’s an instance of SomeClass. The sub-objects are ParentA and ParentB which are part of the complete object.
One thing that can be unclear is, what exactly is this
thunk dtor
. This is a small piece of code that adjusts this pointer, in this example by -8, before jumping to a specified function, in this case, SomeClass' destructor. This is necessary because SomeClass' functions, during their execution, expect this pointer to point at the beginning of the SomeClass' object. Here is howthunk dtor
looks like in assembly:
sub rcx,8
jmp SomeClass::`scalar deleting destructor'
scalar deleting destructor is a MSVC way to combine descructor with an operatior delete. If you’re interested to know more, you can read about it here.
RTTICompleteObjectLocator
We know now that the meta
field points to RTTICompleteObjectLocator
. Let’s take a peek at how this structure is defined in the code:
typedef const struct _s_RTTICompleteObjectLocator
{
unsigned long signature;
unsigned long offset;
unsigned long cdOffset;
int pTypeDescriptor;
int pClassDescriptor;
int pSelf;
} _RTTICompleteObjectLocator;
signature
for x64 is set toCOL_SIG_REV1
which meanspTypeDescriptor
,pClassDescriptor
andpSelf
are going to be image base relative offsets.offset
is the offset from the complete object to the current sub-object from which we’ve takenRTTICompleteObjectLocator
.cdOffset
is the constructor displacement’s offset. It’s relevant only in particular situations when using virtual inheritance. This is Microsoft’s specific way to optimize data generation needed to handle some cases when virtual inheritance is used. I won’t go into details about it in this article but if you’re curious to know more, I’ve prepared a simple example as well as you can check this document which I found on the internet.pTypeDescriptor
contains the offset from the image base to complete the object’sTypeDescriptor
.pClassDescriptor
contains the offset from the image base toRTTIClassHierarchyDescriptor
.pSelf
contains the offset from image base to the currentRTTICompleteObjectLocator
. This gives us a simple way to get the image base which we can use to getpTypeDescriptor
andpClassDescriptor
.
Given all that, we need to understand what is the TypeDescriptor
and RTTIClassHierarchyDescriptor
to have the full picture.
TypeDescriptor
Let’s take a look at the TypeDescriptor
definition first:
typedef struct TypeDescriptor
{
const void* pVFTable;
void* spare;
char name[];
} TypeDescriptor;
pVFTable
points totype_info
’s vftable.spare
is an unused field. Currently, it’s always set to nullptr.name
contains mangled type’s name. The returned value equals what we get when we call raw_name() on thetype_info
instance.
Now it’s a good time to recap what we’ve gathered so far. We can go through meta
field in vftable
to RTTICompleteObjectLocator
. From there we can get TypeDescriptor
by using RTTICompleteObjectLocator
’s pTypeDescriptor
field.
We should be able to get the mangled name of the complete object’s type in this overcomplicated way, Yay!
Here is a small extension to the simple program presented earlier:
// Get the meta entry in vftable
_RTTICompleteObjectLocator* col = reinterpret_cast<_RTTICompleteObjectLocator***>(obj)[0][-1];
// Calculate image base by subtracting the RTTICompleteObjectLocator's pSelf offset from RTTICompleteObjectLocator's pointer
uintptr_t imageBase = reinterpret_cast<uintptr_t>(col) - col->pSelf;
// Get the type descriptor by adding TypeDescriptor's offset to the image base
TypeDescriptor* tDesc = reinterpret_cast<TypeDescriptor*>(imageBase + col->pTypeDescriptor);
// At the end, we can get the type's mangled name
const char* colName = tDesc->name;
In order to get the access to all definitions such as RTTICompleteObjectLocator
or TypeDescriptor
, you’ll have to include <ehdata.h>
and <rttidata.h>
.
RTTIClassHierarchyDescriptor
Here’s how RTTIClassHierarchyDescriptor
’s definition looks like:
typedef const struct _s_RTTIClassHierarchyDescriptor {
unsigned long signature;
unsigned long attributes;
unsigned long numBaseClasses;
int pBaseClassArray;
} _RTTIClassHierarchyDescriptor;
signature
is currently always set to 0.attributes
is a bitfield. Possible values are:CHD_MULTINH
- set when hierarchy contains multiple inheritance.CHD_VIRTINH
- set when hierarchy contains at least one virtual base.CHD_AMBIGUOUS
- set when the current type contains an ambiguous base class.
numBaseClasses
is the number ofRTTIBaseClassDescriptor
entries insideRTTIBaseClassArray
.pBaseClassArray
is the image base relative offset toRTTIBaseClassArray
.
Next, let’s check RTTIBaseClassDescriptor
and RTTIBaseClassArray
:
typedef const struct _s_RTTIBaseClassArray {
int arrayOfBaseClassDescriptors[];
} _RTTIBaseClassArray;
arrayOfBaseClassDescriptors
holds the array of image base relative offsets to one or moreRTTIBaseClassDescriptor
s. As the name suggests, the array holds information about base classes except for the first element which describes the complete object type.
typedef const struct _s_RTTIBaseClassDescriptor {
int pTypeDescriptor;
unsigned long numContainedBases;
PMD where;
unsigned long attributes;
int pClassDescriptor;
} _RTTIBaseClassDescriptor;
pTypeDescriptor
is the type descriptor for the currently processed type within the class hierarchy.numContainedBases
has the number of bases for the current type.where
is an additional, inlined structure of typePMD
which I’ll describe later.attributes
is a bitfield. Possible values are:- BCD_NOTVISIBLE - set when the current base class is not inherited publicly.
- BCD_AMBIGUOUS - current base class is ambiguous in the class hierarchy.
- BCD_PRIVORPROTBASE - current base class is inherited privately.
- BCD_PRIVORPROTINCOMPOBJ - part of a privately inherited base class hierarchy.
- BCD_VBOFCONTOBJ - current base class is virtually inherited.
- BCD_NONPOLYMORPHIC - the name suggests that it should be set for a non-polymorphic base class. However, during my research, I wasn’t able to create a scenario where this bit would be set :C.
- BCD_HASPCHD - indicates that
RTTIClassHierarchyDescriptor
is present for current type andpClassDescriptor
contains valid offset.
pClassDescriptor
contains the image base relative offset to theRTTIClassHierarchyDescriptor
for the current type.
typedef struct PMD
{
int mdisp;
int pdisp;
int vdisp;
} PMD;
mdisp
is the offset to the current sub-object, relative to the complete object.pdisp
is the offset tovbptr
. Keep in mind thatvbptr
is not the same asvfptr
.vbptr
is a pointer to the additional table calledvbtable
used with virtual inheritance. It is necessary to locate the virtual bases. If virtual inheritance is not used, this will hold a value of -1.vdisp
is the offset withinvbtable
. If virtual inheritance is not used, this will hold a value of 0.
Whew, that’s a lot of information. It might be hard to grasp all connections using only struct definitions so I’ve prepared this diagram which depicts our simple example:
SomeClass'
RTTIBaseClassDescriptor
+------------------+
+----------------------------------------------------------- | pClassDescriptor |
| +------------------+-------------+ SomeClass'
| | attributes | BCD_HASPCHD | TypeDescriptor
| +------------------+-------------+ +----------+-----------------+
v | where | (0, -1, 0) | | name | .?AUSomeClass@@ |
SomeClass' SomeClass' +------------------+-------------+ +----------+-----------------+
RTTIClassHierarchyDescriptor RTTIBaseClassArray |numContainedBases | 2 | | spare |
+---------------+ +-----------------------------+ +------------------+-------------+ +----------+
|pBaseClassArray|----------------->| arrayOfBaseClassDescriptors |----------->| pTypeDescriptor |------------------->| pVFTable |
+---------------+-------------+ +-----------------------------+ +------------------+ +----------+
|numBaseClasses | 3 | | ... |------+
+---------------+-------------+ +-----------------------------+ | ParentA in SomeClass
| attributes | CHD_MULTINH | | ... |--+ | RTTIBaseClassDescriptor
+---------------+-------------+ +-----------------------------+ | | +------------------+
| signature | 0 | | | | pClassDescriptor |
+---------------+-------------+ | | +------------------+-------------+ ParentA's
| | | attributes | BCD_HASPCHD | TypeDescriptor
| | +------------------+-------------+ +----------+-----------------+
| | | where | (0, -1, 0) | | name | .?AUParentA@@ |
| | +------------------+-------------+ +----------+-----------------+
| | |numContainedBases | 0 | | spare |
| | +------------------+-------------+ +----------+
| +---->| pTypeDescriptor |------------------->| pVFTable |
| +------------------+ +----------+
|
| ParentB in SomeClass
| RTTIBaseClassDescriptor
| +------------------+
| | pClassDescriptor |
| +------------------+-------------+ ParentB's
| | attributes | BCD_HASPCHD | TypeDescriptor
| +------------------+-------------+ +----------+-----------------+
+-------->| where | (8, -1, 0) | | name | .?AUParentB@@ |
+------------------+-------------+ +----------+-----------------+
|numContainedBases | 0 | | spare |
+------------------+-------------+ +----------+
| pTypeDescriptor |------------------->| pVFTable |
+------------------+ +----------+
By looking at the SomeClass' RTTIClassHierarchyDescriptor
, we can see that it describes three classes and the attribute points out that we deal with multi inheritance. The RTTIBaseClassArray
starts with our complete object which has 2 bases described next. The first ParentA
at offset 0 and the third, ParentB
at offset 8. At each entry, we also have an access to TypeDescriptor
which gives us the access to mangled names.
Important note
You might assume that having a pointer to a polymorphic type’s instance will always have a first entry vfptr
. However, it’s not always the case in MSVC. When using a virtual inheritance, a complete object’s vfptr
is merged with a virtual base class when no virtual function is introduced.
Consider this simple example:
struct VParent
{
virtual ~VParent() = default;
int a = 0xDEADC0DE;
};
struct VSomeClass : virtual VParent
{
virtual ~VSomeClass() = default;
int b = 0xBADDCAFE;
};
There’s one virtual function, the destructor which is overridden by the child. The VSomeClass doesn’t introduce any new virtual function. The layout for VSomeClass
is presented below:
class VSomeClass size(32):
+---
0 | {vbptr}
8 | b
| <alignment member> (size=4)
+---
+--- (virtual base VParent)
16 | {vfptr}
24 | a
| <alignment member> (size=4)
+---
Here’s the same layout represented as a simple diagram:
VSomeClass
+--------+-------+
24 | a | - |
+--------+-------+
16 | vfptr |
+----------------+
8 | b | - |
+--------+-------+
0 | vbptr |
+----------------+
As presented above, the first entry is vbptr
, not vfptr
. That’d create an issue when assuming vfptr
is always the first entry. Where vfptr
is going to end up depends on the whole object layout.
Does that mean we can’t write the generic code to access RTTI? It so happens that we can do a little trick with vbtable
.
The first entry of vbtable
is the offset from vbptr
to the beginning of the complete object. After that, we have a series of offsets from the vbptr
to the nth virtual base.
...
+---------------------+
16 | virtualBaseOffset |
+---------------------+
0 | topOffset |
+---------------------+
If the topOffset
is 0, we deal with the case where the first entry is vbptr
. We can add the first virtual base offset to the current pointer where we’re going to have an access to the vfptr
.
Otherwise, if the offset is different than 0, we’ve actually accessed first entry of the vftable
which contains some virtual function pointer.
Keep in mind that this is going to work only with polymorphic types. Here’s how it’d look like in the code:
#pragma warning (push)
#pragma warning (disable:4200) // Allow an array with variable length
struct vbtable
{
int topOffset;
int virtualBaseOffsets[];
};
#pragma warning (pop)
int main()
{
...
// Some check to see if the type is polymorphic
...
// Assume that the first entry is vbtable
vbtable* virtualBaseTable = reinterpret_cast<vbtable**>(obj)[0];
// If the offset equals 0 then we can add the first virtual base offset, otherwise, we already point at vfptr.
if (virtualBaseTable->topOffset == 0)
objAddr += virtualBaseTable->virtualBaseOffsets[0];
...
}
Final words
By default, the compiler will generate the RTTI data. We can tell the compiler not to generate that data by using a compiler’s switch /GR-.
When disabled, the meta
field contains a nullptr.
With all that knowledge, we’re able to get some basic information about polymorphic types in x64 applications built with MSVC. It might be useful during reverse engineering or just for a better understanding of what the compiler has to generate to properly support dynamic_cast, typeid, or exception handling.