# Building std::shared_ptr

I recently found myself needing reference counting semantics for arbitrary objects in kernel space. C++ libraries include std::shared_ptr and friends, but we don’t have the luxury of using that in a kernel. Thankfully, building a discount reference counting smart pointer isn’t actually all that hard.

In fact, the full re-implementations of shared_ptr, weak_ptr and enable_shared_from_this take up only a few hundred lines of reasonably well commented template code in the kernel.1 The behavior and use is virtually identical to the stdlib equivalents, just with different names (rt::SharedPtr) and fewer fancy optimizations. This ended up actually being less code than all of the different reference counting mechanisms different types of objects contained.

# Design

The basic idea behind how std::shared_ptr works is that each object allocation that is represented by a smart pointer is associated with an information block that holds reference counts and a deleter that invokes the object’s destructor and deallocates its memory. Any shared pointers that are subsequently created (by copying the original pointer, for example) will share that same information block.

Inside that block are two reference counters: the strong reference count ($R_{\text{strong}}$) and weak reference count ($R_{\text{weak}}$.) Additionally, a deleter object, which implements operator()() that will by default, just invoke operator delete. However, for other allocators – in the kernel, sometimes custom slab allocators are used – this behavior can be customized.

Essentially, $R_{\text{strong}}$ serves as the ref count for the pointee, while $R_{\text{weak}}$ can be thought of as a ref count for the information block itself. It is valid for the strong ref count to be zero, leading to deleter invocation, while the weak count is nonzero keeping the info block alive. In fact, this is exactly how weak pointers are implemented.2

## Initialization

When the smart pointer is created for the first time, we set both $R_{\text{strong}}$ and $R_{\text{weak}}$ to 1. Additionally, the pointer to the object is stored inside the pointer.

## Copying

Any time an additional pointer is created from another pointer (by copying or assignment) $R_{\text{strong}}$ is incremented. It’s important that this increment operation is atomic, so that other threads see the effects correctly. This is accomplished with the clang atomic builtins (think __atomic_add_fetch() and friends) and on amd64 amounts to nothing more than the lock prefix on the addition. This is essentially free for cache line aligned values.

## Deallocation

Since the smart pointer is an object itself, it has its own destructor that gets invoked when it goes out of scope. In that destructor, $R_{\text{strong}}$ is atomically decremented and tested for zero. Once it reaches zero, we’ll invoke the deleter so the object itself is destroyed.

At this time, we’ll also atomically decrement $R_{\text{weak}}$; if that also reaches zero, the information block is deallocated. This is why the counter starts off as 1, not 0, when the smart pointer is initially created. It can be thought of as a reference counter for the information block itself. Each shared pointer could increment this count by one, but instead there is a single +1 for all shared pointers.

This weak ref count is used to implement weak pointers; they don’t prevent the object from being deallocated, but they keep the information block around so that the weak pointer knows the actual object was deallocated.

# Implementation

The overall design of reference counting pointers in C++ is relatively simple. I’ll quickly go through some key functions and some decisions behind them in this implementation. Keep in mind this implementation is meant to be used in a kernel, where we don’t have the STL or any C++ runtime support.

## Structures

To make things work, there’s a few different structures at work beyond the user visible SharedPtr<T>.

### Deleters

A common theme throughout the STL is the concept of deleters, which are small classes that encapsulate the specifics of memory management. This is particularly important in a kernel, as many allocations do not come off the shared kernel heap, but private object-specific allocators.

Above is the default deleter implementation, which simply invokes operator delete on the pointee.

### Info Block

Reference counts for an allocation are stored in an info block, which is allocated the first time a smart pointer is created.

However, we need to store more than just reference counts: each info block has an associated deleter. To properly support polymorphic types, inside the SharedPtr class, we define the real info block that is allocated when a pointer is created:

This struct is templated on type U, which is a secondary type that is related by inheritance to type T that the SharedPtr points to, as well as the type of deleter. To illustrate this, imagine a simple class hierarchy like the following:

%% {init: {'theme': 'base', 'themeVariables': { 'darkMode': true, 'background': '#222222' }}} %%

graph TB;
A[Pland]
B[Vegetal]
C[Fruit]
A-->B
A-->C


It’s totally valid to assign a pointer to Pland with an object of instance Vegetal. However, these objects have different destructors, and may even be allocated from different pools. While the differing allocation pools are an extreme example, it should be obvious that a deleter for Pland should not be invoked for Vegetal, and even though Vegetal and Fruit are both subclasses of Pland, their internal layouts may be different and require different deleters.

## Functions

Beyond the above structures, we need a little bit of actual code to make the pointers do… pointer things. But it’s surprisingly simple.

### Construction

The simplest, most basic case for creating a SharedPtr is to create a new instance for an already allocated object, using either the default deleter or a custom deleter. This boils down to allocating a new instance of the info block, which automatically initializes its ref counts to 1.

Since this is the first time the pointer is created, we don’t have to use any atomic operations since it’s not possible for another thread to be accessing this instance before the constructor returns.

A quick note about the explicit keyword: this prevents the given constructor from being used during an implicit conversion or as a copy constructor. This means that creating a SharedPtr from a raw pointer cannot happen “by accident.” Since additional metadata is not stored in the pointed to object, but alongside it, this would result in multiple info blocks for the same object in memory, leading to possible memory corruption and use-after-free bugs.

Lastly, the destructor simply decrements $R_{\text{strong}}$. This is what makes these pointers so simple to use: they follow the RAII paradigm and extend it to raw pointers. Simply letting the SharedPtr instance go out of scope will drop its reference and deallocate the object if needed.

### Reference Counting

The remainder of the pointer implementation boils down to appropriately placed increment/decrement calls. Two internal methods are provided to do both of these as atomic operations:

To simplify the code presented, some error checking (to do with integer overflows) is omitted. Improper implementation of these methods can result in security bugs.

Incrementing is relatively straight forward; if an info block exists in the pointer (it has been assigned to or it was created with an object) it’s simply atomically incremented.

More involved is decrementing the ref count. First, we’ll decrement $R_{\text{strong}}$ atomically. If the result of this is zero, we invoke the deleter and invalidate the pointee inside the SharedPtr instance. Next we’ll decrement $R_{\text{weak}}$ and deallocate the info block if that is zero.

### Assignment

To realize the potential of these smart reference counting pointers, we have to support copying and assigning them. Thankfully, this is reasonably simple: all that’s needed is to decrement the old info block’s $R_{\text{strong}}$, copy the source’s pointee and info block pointers, then increment $R_{\text{strong}}$.

Assignment operators are implemented essentially the same as the copy constructors, except that there’s a possibility the smart pointer being assigned to is valid and points to an object.

A special overloaded version of assignment to nullptr_t is implemented so a pointer can be forced to drop its reference before going out of scope by assigning nullptr to it. This is the same behavior as std::shared_ptr<T>.reset() exposes.

### Syntax Sugar

Lastly, we need to implement a few remaining operators that allow the SharedPtr instance to behave as if it were a raw pointer to T.

There are two conversion operators provided: one to bool allows very simple checking of a pointer to nullptr, and one to void * that permits casting the SharedPtr to void * for display and debugging purposes.

# Conclusion

Proper generic reference counting smart pointers are far simpler to implement than I anticipated. They are much easier to use if there’s an arbitrary number of objects that must support reference counting.

While this implementation is missing several features of the STL’s std::shared_ptr template, it does what it needs to for use in kernel space. It is not as optimized (a separate allocation is made for the info block and the pointee) and likely falls apart with more complex inheritance situations.

This implementation has support for weak pointers, and for converting a plain this pointer back into a smart pointer, with some special support from the class. The implementations of both of these are relatively trivial and should be easily understandable given the background on the SharedPtr internals provided above.

Overall, this was a fun little adventure in understanding how the C++ library features I’ve taken for granted over the years are implemented internally. As with all of the kush-os code, the smart pointer implementation is available under the ISC license.

In more general terms, I’m hoping to keep up the pace of work on the OS for the next few weeks – while I still have zero responsibilities and alarm clocks are nothing but a distant memory. I recently finished the initial amd64 port (and essentially a rewrite of most of the kernel) and have a few bugs to resolve before I’m back to a fully functional kernel that just needs more userspace implemented. I’m hopeful that I’ll have something pretty to show off by the time I start my big boi(tm) job in a few weeks.

1. In this post, I’ll focus only on the shared_ptr knockoff part of that code.

2. The kernel does have a rt::WeakPtr class that does exactly that. Its implementation is not described in this article, but a quick look at the code should hopefully be sufficient.