I recently found myself needing reference counting semantics for arbitrary objects in kernel space. C++ libraries include std::shared_ptr
and friends, but we don’t have the luxury of using that in a kernel. Thankfully, building a discount reference counting smart pointer isn’t actually all that hard.
In fact, the full re-implementations of shared_ptr
, weak_ptr
and enable_shared_from_this
take up only a few hundred lines of reasonably well commented template code in the kernel.1 The behavior and use is virtually identical to the stdlib equivalents, just with different names (rt::SharedPtr
) and fewer fancy optimizations. This ended up actually being less code than all of the different reference counting mechanisms different types of objects contained.
Design
The basic idea behind how std::shared_ptr
works is that each object allocation that is represented by a smart pointer is associated with an information block that holds reference counts and a deleter that invokes the object’s destructor and deallocates its memory. Any shared pointers that are subsequently created (by copying the original pointer, for example) will share that same information block.
Inside that block are two reference counters: the strong reference count (\(R_{\text{strong}}\)) and weak reference count (\(R_{\text{weak}}\).) Additionally, a deleter object, which implements operator()()
that will by default, just invoke operator delete
. However, for other allocators – in the kernel, sometimes custom slab allocators are used – this behavior can be customized.
Essentially, \(R_{\text{strong}}\) serves as the ref count for the pointee, while \(R_{\text{weak}}\) can be thought of as a ref count for the information block itself. It is valid for the strong ref count to be zero, leading to deleter invocation, while the weak count is nonzero keeping the info block alive. In fact, this is exactly how weak pointers are implemented.2
Initialization
When the smart pointer is created for the first time, we set both \(R_{\text{strong}}\) and \(R_{\text{weak}}\) to 1. Additionally, the pointer to the object is stored inside the pointer.
Copying
Any time an additional pointer is created from another pointer (by copying or assignment) \(R_{\text{strong}}\) is incremented. It’s important that this increment operation is atomic, so that other threads see the effects correctly. This is accomplished with the clang atomic builtins (think __atomic_add_fetch()
and friends) and on amd64 amounts to nothing more than the lock
prefix on the addition. This is essentially free for cache line aligned values.
Deallocation
Since the smart pointer is an object itself, it has its own destructor that gets invoked when it goes out of scope. In that destructor, \(R_{\text{strong}}\) is atomically decremented and tested for zero. Once it reaches zero, we’ll invoke the deleter so the object itself is destroyed.
At this time, we’ll also atomically decrement \(R_{\text{weak}}\); if that also reaches zero, the information block is deallocated. This is why the counter starts off as 1, not 0, when the smart pointer is initially created. It can be thought of as a reference counter for the information block itself. Each shared pointer could increment this count by one, but instead there is a single +1 for all shared pointers.
This weak ref count is used to implement weak pointers; they don’t prevent the object from being deallocated, but they keep the information block around so that the weak pointer knows the actual object was deallocated.
Implementation
The overall design of reference counting pointers in C++ is relatively simple. I’ll quickly go through some key functions and some decisions behind them in this implementation. Keep in mind this implementation is meant to be used in a kernel, where we don’t have the STL or any C++ runtime support.
Structures
To make things work, there’s a few different structures at work beyond the user visible SharedPtr<T>
.
Deleters
A common theme throughout the STL is the concept of deleters, which are small classes that encapsulate the specifics of memory management. This is particularly important in a kernel, as many allocations do not come off the shared kernel heap, but private object-specific allocators.
1
2
3
4
5
6
template<class U>
struct DefaultDeleter {
void operator()(U *obj) const {
delete obj;
}
};
Above is the default deleter implementation, which simply invokes operator delete
on the pointee.
Info Block
Reference counts for an allocation are stored in an info block, which is allocated the first time a smart pointer is created.
1
2
3
4
5
6
7
struct InfoBlock {
size_t useCount = 1;
size_t weakCount = 1;
virtual ~InfoBlock() = default;
virtual void destroy() = 0;
};
However, we need to store more than just reference counts: each info block has an associated deleter. To properly support polymorphic types, inside the SharedPtr
class, we define the real info block that is allocated when a pointer is created:
1
2
3
4
5
6
7
8
9
10
11
12
13
template<class T>
class SharedPtr {
template<class U, class Deleter>
struct InfoBlockImpl: public InfoBlock {
U *ptr;
Deleter deleter;
InfoBlockImpl(U *_ptr, Deleter _deleter) : ptr(_ptr), deleter(_deleter) {}
virtual void destroy() override {
this->deleter(this->ptr);
}
};
};
This struct is templated on type U
, which is a secondary type that is related by inheritance to type T
that the SharedPtr
points to, as well as the type of deleter. To illustrate this, imagine a simple class hierarchy like the following:
graph TB;
A[Pland]
B[Vegetal]
C[Fruit]
A-->B
A-->C
It’s totally valid to assign a pointer to Pland
with an object of instance Vegetal
. However, these objects have different destructors, and may even be allocated from different pools. While the differing allocation pools are an extreme example, it should be obvious that a deleter for Pland
should not be invoked for Vegetal
, and even though Vegetal
and Fruit
are both subclasses of Pland
, their internal layouts may be different and require different deleters.
Functions
Beyond the above structures, we need a little bit of actual code to make the pointers do… pointer things. But it’s surprisingly simple.
Construction
The simplest, most basic case for creating a SharedPtr
is to create a new instance for an already allocated object, using either the default deleter or a custom deleter. This boils down to allocating a new instance of the info block, which automatically initializes its ref counts to 1.
Since this is the first time the pointer is created, we don’t have to use any atomic operations since it’s not possible for another thread to be accessing this instance before the constructor returns.
1
2
3
4
5
6
7
8
9
10
11
template<class U, class Deleter>
SharedPtr(U *_ptr, Deleter d) : info(new InfoBlockImpl<U, Deleter>(_ptr, d)), ptr(_ptr) {
}
template<class U>
explicit SharedPtr(U *_ptr) : info(new InfoBlockImpl<U, DefaultDeleter<U>>(_ptr, DefaultDeleter<U>())), ptr(_ptr) {
}
~SharedPtr() {
this->decrementStrongRefs();
}
A quick note about the explicit
keyword: this prevents the given constructor from being used during an implicit conversion or as a copy constructor. This means that creating a SharedPtr
from a raw pointer cannot happen “by accident.” Since additional metadata is not stored in the pointed to object, but alongside it, this would result in multiple info blocks for the same object in memory, leading to possible memory corruption and use-after-free bugs.
Lastly, the destructor simply decrements \(R_{\text{strong}}\). This is what makes these pointers so simple to use: they follow the RAII paradigm and extend it to raw pointers. Simply letting the SharedPtr
instance go out of scope will drop its reference and deallocate the object if needed.
Reference Counting
The remainder of the pointer implementation boils down to appropriately placed increment/decrement calls. Two internal methods are provided to do both of these as atomic operations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void incrementStrongRefs() {
if(this->info) {
__atomic_add_fetch(&this->info->useCount, 1, __ATOMIC_RELAXED);
}
}
void decrementStrongRefs() {
if(this->info) {
if(!__atomic_sub_fetch(&this->info->useCount, 1, __ATOMIC_RELEASE)) {
this->info->destroy();
this->ptr = nullptr;
if(!__atomic_sub_fetch(&this->info->weakCount, 1, __ATOMIC_RELEASE)) {
delete this->info;
this->info = nullptr;
}
}
}
}
To simplify the code presented, some error checking (to do with integer overflows) is omitted. Improper implementation of these methods can result in security bugs.
Incrementing is relatively straight forward; if an info block exists in the pointer (it has been assigned to or it was created with an object) it’s simply atomically incremented.
More involved is decrementing the ref count. First, we’ll decrement \(R_{\text{strong}}\) atomically. If the result of this is zero, we invoke the deleter and invalidate the pointee inside the SharedPtr
instance. Next we’ll decrement \(R_{\text{weak}}\) and deallocate the info block if that is zero.
Assignment
To realize the potential of these smart reference counting pointers, we have to support copying and assigning them. Thankfully, this is reasonably simple: all that’s needed is to decrement the old info block’s \(R_{\text{strong}}\), copy the source’s pointee and info block pointers, then increment \(R_{\text{strong}}\).
1
2
3
4
5
6
7
8
SharedPtr(const SharedPtr &ptr) : info(ptr.info), ptr(ptr.ptr) {
this->incrementStrongRefs();
}
template<class U>
SharedPtr(const SharedPtr<U> &ptr) : info(ptr.info), ptr(ptr.ptr) {
this->incrementStrongRefs();
}
Assignment operators are implemented essentially the same as the copy constructors, except that there’s a possibility the smart pointer being assigned to is valid and points to an object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SharedPtr& operator=(const SharedPtr &ptr) {
if(this != &ptr) {
this->decrementStrongRefs();
this->info = ptr.info;
this->ptr = ptr.ptr;
this->incrementStrongRefs();
}
return *this;
}
SharedPtr& operator=(std::nullptr_t) {
this->decrementStrongRefs();
this->ptr = nullptr;
this->info = nullptr;
return *this;
}
A special overloaded version of assignment to nullptr_t
is implemented so a pointer can be forced to drop its reference before going out of scope by assigning nullptr
to it. This is the same behavior as std::shared_ptr<T>.reset()
exposes.
Syntax Sugar
Lastly, we need to implement a few remaining operators that allow the SharedPtr
instance to behave as if it were a raw pointer to T
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bool operator==(const SharedPtr &ptr) const {
return (this->ptr == ptr.ptr);
}
explicit operator void*() const {
return this->ptr;
}
explicit operator bool() const {
return (this->ptr != nullptr);
}
T* operator->() const {
return this->ptr;
}
T& operator*() const {
return *this->ptr;
}
There are two conversion operators provided: one to bool
allows very simple checking of a pointer to nullptr
, and one to void *
that permits casting the SharedPtr
to void *
for display and debugging purposes.
Conclusion
Proper generic reference counting smart pointers are far simpler to implement than I anticipated. They are much easier to use if there’s an arbitrary number of objects that must support reference counting.
While this implementation is missing several features of the STL’s std::shared_ptr
template, it does what it needs to for use in kernel space. It is not as optimized (a separate allocation is made for the info block and the pointee) and likely falls apart with more complex inheritance situations.
This implementation has support for weak pointers, and for converting a plain this
pointer back into a smart pointer, with some special support from the class. The implementations of both of these are relatively trivial and should be easily understandable given the background on the SharedPtr
internals provided above.
Overall, this was a fun little adventure in understanding how the C++ library features I’ve taken for granted over the years are implemented internally. As with all of the kush-os code, the smart pointer implementation is available under the ISC license.
In more general terms, I’m hoping to keep up the pace of work on the OS for the next few weeks – while I still have zero responsibilities and alarm clocks are nothing but a distant memory. I recently finished the initial amd64 port (and essentially a rewrite of most of the kernel) and have a few bugs to resolve before I’m back to a fully functional kernel that just needs more userspace implemented. I’m hopeful that I’ll have something pretty to show off by the time I start my big boi(tm) job in a few weeks.
In this post, I’ll focus only on the
shared_ptr
knockoff part of that code. ↩The kernel does have a
rt::WeakPtr
class that does exactly that. Its implementation is not described in this article, but a quick look at the code should hopefully be sufficient. ↩