Home Blast Processing Retrospectives
Post
Cancel

Blast Processing Retrospectives

The Sega Mega Drive/Genesis is what started it all for me over a decade ago: it was my journey into reverse engineering and bare metal programming.

Since I’ve been feeling a bit ✨ lost ✨ recently, it seems only logical to try and revisit some of my earliest work and see what I can do with that hardware with a bit more knowledge and experience1. This began with me archiving some old hard drives, and in the process discovering the source code to a sound driver I was working on many years ago, but apparently abandoned due to insurmountable bugs2.

Background

The Sega Mega Drive3 was Sega’s fourth-generation home video game console. It’s where the Sonic the Hedgehog franchise got its start, and hacking those games is indeed what brought me to the Mega Drive.

Hardware

It featured a Motorola 68000 (running at ~7.6MHz) with a whopping 64K of RAM available for use by game software, and a mind-bogglingly large 4MB reserved for cartridges. On top of the 68k, the console included a Zilog Z80 (running at ~3.5MHz) with 8K of dedicated RAM for Master System backwards compatibility, though Mega Drive software can make use of it as a dedicated sound processor.

Additionally, the console features a custom video processor (VDP for short) that supports two tile layers, up to 80 sprites, with 4 palette lines of 16 colors each. The hardware only has support for horizontal line scrolling, and vertical scrolling by 16-pixel columns: no SNES-like Mode 7 features are provided. Notably, there’s no framebuffer or bitmap support.

To generate sound, we’ve got a YM2612 FM synthesis chip that provides up to six channels of FM sound, one of which can be replaced with a rudimentary DAC for PCM playback; and 4 channels of square waves provided by an SN76489 PSG, again included for Master System compatibility.

For some more in-depth information about the hardware, consider checking out Rodrigo Copetti’s Mega Drive Architecture article.

Software

In the way of software, there’s… really absolutely nothing. Cartridges contain ROM, which is mapped directly into the 68k address space. The cartridge contains the CPU vector table and thus is the first thing that executes on startup after reset4.

This means there is no BIOS, no firmware, and no library of system routines that games can use. They run directly on bare metal, with no memory protection or any help. Not that you’d want the overhead of firmware or even an OS: there simply aren’t enough clock cycles.

Developing Homebrew

Way back when I was still doing Mega Drive stuff, my development tooling consisted of… a 68k assembler. And that’s it! No fancy toolchains or makefiles, everything was built as a single assembler file, that included other assembler files and binaries to produce the final ROM.

Apparently back then I had a serious disdain for anything that wasn’t hand-written assembly. Looking back, that was an absolutely ridiculous hill to die on: at least today, clang produces assembly that’s almost as good as handwritten code.

Toolchains

Since the Mega Drive has a 68000, I was able to use my 68komputer toolchain effectively unmodified. This means I have the choice between relatively modern clang and gcc versions: which means C++20 and later are supported, and the compilers are relatively good at producing optimized, efficient code.

Beyond that, I’m just using a collection of assembly routines (for stuff like DMA, compression, palette effects, etc.) that I had written over the years. Gluing these to C/C++ is relatively easy by writing some small wrappers with inline assembly, and relying on the compiler to shuffle registers around:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static inline void WriteToVRAM(const void *in, const uint16_t vramAddr,
        const size_t numTiles) {
    register const void *_in asm("a0") = in;
    register size_t _addr asm("d0") = vramAddr;
    register size_t _count asm("d1") = numTiles;

    __asm__ __volatile__(
    "\n\tjsr        Load_Tiles"
    "\n"
    : "+a" (_in),  "+d"(_count), "+d"(_addr)
    : [in] "a" (_in),  [num] "d"(_count), [addr] "d"(_addr)
    : /* clobber list from subroutine */
      "d2", "d3", "a1", "cc"
    );
}

There is also the SGDK SDK which is targeted towards plain C use. I don’t have any first-hand experience with it, but it seems to be what most people are using these days: it certainly looks quite comprehensive.

Emulation & Testing

I think the last emulator I used when I was still doing this stuff was Kega Fusion because it was one of the few emulators with a Mac port. It was accurate enough5 to play games, but it really wasn’t suited for development. These days, thankfully, things have changed for the better. Not only are there emulators available now with debugging features, but their accuracy as a whole has improved significantly; most of these even run on macOS natively.

For really involved and heavy debugging, I’ve been using the Exodus emulator in wine. This emulator focuses on accuracy before anything else, so a beefy machine helps: ironically enough, it was borderline unusable under Windows 11 on my Mac Pro, with a 16-core Xeon; but runs just fine on my M1 MacBook Pro6.

Screenshot of an emulator with various debugging tools open The Exodus emulator, running under wine, with various debugging tools open.

For general testing, I’ve been using BlastEm as it’s just as accurate, has some basic debugging features (including a gdb stub!) and most importantly, actually runs natively on my Mac7.

Sprite Multiplexing

Sprite multiplexing is a technique that’s pretty much as old as computers with hardware sprites. As the name implies, a single hardware sprite is rendered multiple times (hence, multiplexing) on screen in different locations, without requiring additional hardware resources.

Thanks to the flexibility of the Mega Drive video processor, implementing this effect isn’t too bad. However, there are a few “gotchas” I ran into while throwing this together; these are well worth it, as this technique allows multiplying 3 physical sprites into almost 80 virtual sprites on-screen. While sprite multiplexing does allow rendering more sprites than the 80 global limit of the VDP, it does not allow rendering more than 20 sprites per scanline; any excess sprites will still be dropped.

One of the key pieces to pulling off sprite multiplexing is being able to reliably and accurately update sprites at a price point during frame rendering: this is also known as racing the beam8.

The VDP provides a few ways to figure out what part of the screen is being rendered: first, the CPU could sit in a loop and continuously poll the H/V counter9. This indicates the exact pixel that’s being rendered, though it’s a bit inaccurate because the pixel clock is faster than the 68k clock, so the low bits read as garbage. This approach works, but you waste loads of CPU cycles.

Horizontal Interrupts

Instead, the VDP also provides a horizontal interrupt (or HInt), which can be configured to trigger every N lines. During the horizontal blanking interval when the line counter hits zero, the VDP will assert an interrupt to the 68k: precisely what we want to allow other logic to execute at all other times.

Picture showing the effect of changing the background color from the horizontal interrupt Effect of changing the background color during horizontal interrupts

One minor issue with the horizontal interrupt is apparent from the screenshot above: the interrupt doesn’t start executing on the 68k until the subsequent line has started rendering. (You can tell this is happening because the color doesn’t change until a few pixels into the line.)

There are ways to mitigate this: though the most essential is ensuring that the interrupt handler does as little work as possible (such as saving/restoring registers, using the fastest and most compact instruction encoding, etc.) In this case, precise synchronization to the start of a line really isn’t necessary, but if it is, spin in the interrupt handler, reading from the HV counter, until the end of the current/start of the next line is reached.

Sprite Attribute Table

The sprite attribute table is stored in VRAM, and read by the VDP every frame (and during the horizontal blanking period) to render sprites. This table contains the sprite’s screen position, as well as the base palette and pattern index. Each sprite entry occupies 8 bytes in the table, with a maximum of 80 sprites supported.

Screenshot of an emulator's sprite table list Sprite table viewer from an emulator; showing clearly that only three sprite slots are used.

With that in mind, actually doing the sprite multiplexing is rather simple, once we’ve established the VDP is rendering the correct line: just write to the VRAM, updating the sprite table as you would normally, and the VDP does the rest. Neat!

VRAM Bandwidth

Every scanline is divided into several VRAM access slots10, roughly one slot every two pixels. Most of these are used by the VDP to render the screen, but any excess can be used by the 68k to access VRAM. At 40-column (320 pixel) horizontal resolution, a total of 18 bytes can be written to VRAM. (In 32-column mode, it’s 16 bytes instead.)

18 bytes per line are not a whole lot: it’s just enough to update the position and palette and base pattern for three sprites. Try to write any more than this and the 68k will be halted by the VDP until it can process the write, leading to precious CPU cycles wasted11.

While we can theoretically update 3 sprites during a single scanline here, those 18 bytes of access slots include the horizontal blanking interval, during which time the VDP is also accessing sprite data to build its internal sprite line cache. This means that somewhere between 2 and 3 sprite entries will be completely written, possibly even with a torn write for a 16-bit word. The VDP doesn’t really seem too disturbed by garbage sprite data, but it means that sprites might flicker slightly or have their top/bottom cut off12.

An Example

Once I had all of that figured out, I was able to throw together a “demo” (if you want to call it that) with great programmer art to show this effect off. The video below was captured from real hardware13, but some better emulators should be able to deal with it:

You can see the three “original” sprites in the top left corner of the image; these are just rectangles, with pink dots in the top corners, and white dots in the bottom, to better visualize the fact that the top and bottom of the sprites can get cut off by the effect. All of the subsequent sprites below that are then rendered using this sprite multiplexing technique.

There are certainly still some issues with the implementation: for one, the aforementioned issue with the top and bottom of the sprite getting cut off is blatantly obvious; there’s also some issues with the code that’s calculating new sprites’ positions, leading in some overlap that the rendering code can’t do anything about. But hey, it works! Pretty good for someone who hasn’t written code for the Mega Drive in close to a decade, methinks. 😉

Right after I finished working on this effect, I discovered John Harrison’s Raster Scroll page that described a game (albeit unreleased – probably why I hadn’t heard of it before) that used this precise technique to generate the visuals of falling snow: precisely what I figured this technique could be useful for.

Conclusion

That’s it for this time. I might continue doing a few more posts about Mega Drive stuff, particularly the sound driver I alluded to earlier that I’ve been de-crustifying and trying to restore to a functional state.

It’s certainly been interesting building some of these effects out, as well as reviving my development setup (and porting over the toolchain from the 68komputer) to involve testing on real hardware. There’s been a good amount of research and development on emulators in the years since I last was active in this scene, which made my life much easier.

For now, though, writing neat little self-contained effects like this is probably as far as I’m going to go: I don’t need yet more projects. You can download a ROM with this effect if you’d like to see it for yourself.

An Aside

Perhaps you’ve noticed that it’s been almost a year since the last time I wrote something here. I had a whole series of things planned for last fall that never ended up materializing.

Well, long story short, I got a bit burnt out and decided the logical reasonable solution to that was to buy a house. While that did solve my problems[citation needed] of being bored or having spare money ever again, anyone who’s ever owned a home (or knows someone who does) can attest to the fact that every interaction with said home somehow produces anywhere between 37 and 912051489 additional to-do list items, so that’s been occupying the bulk of my time.

On top of all that, I decided I should probably spend less time in front of computer screens. Having the house has been nice as there’s no shortage of projects, but I’d much rather finish those projects than document them – though I imagine “Today I spent six hours wall mounting a single shelf because the walls are wonky and I have no idea what I’m doing” would make for riveting reading.

And after seeing my Seattle Kraken make it to the playoffs14, I’ve been inspired to learn to play hockey15 because it looks like way too much fun, I now have health and dental insurance, and I could certainly use a few fewer hours of sitting on my ass every week. (Those have instead been replaced by several hours of falling on my ass trying to learn to skate, but I guess that’s a start?)

In short, I’m alive, (probably?) doing better than last year, and (ideally) there’ll be some more stuff to follow soon. Or maybe it’ll be another 9 months. Who knows? I sure as hell don’t.

  1. Plus, anyone who knows me can vouch for the fact that I am unreasonably horny for the Motorola 68000 that’s in this thing. As if 68komputer wasn’t a big enough giveaway… 

  2. So I obviously picked that sound driver back up, got re-acquainted with Z80 assembler, fixed many of those bugs, and started writing some tools. I’m hoping to get this into shape to release it so folks can do stuff with it – I feel like way too much work went into this thing for it to disappear for another 10 years on one of my hard drives. 

  3. It was also known as the Sega Genesis in North America. 

  4. A notable exception is for later consoles, which feature the Trademark Security System (TMSS) on board. This is a small ROM inside the chipset that runs before the cartridge and ensures the ROM has the SEGA text string at a particular location. This was a rather misguided attempt at curtailing releases of unlicensed third-party games for the platform. 

  5. Hardware accuracy was not a focus of early emulators, because the Mega Drive has some rather odd timing interactions that the vast majority of software simply doesn’t care about. However, most early emulators also didn’t emulate basic things such as address errors that could break homebrew horribly on real hardware. 

  6. I’m genuinely blown away every day by this M1 machine; it’s effectively replaced my Mac Pro for all of my daily work, all while using a fraction of the power and being just as performant, if not significantly more so, as this case shows. The irony of Windows x86 software running far better on an ARM Mac than Windows is certainly not lost on me… 

  7. Technically I’m using a pre-built x86 binary; while the source code is available, it seems to JIT 68k and Z80 code, which makes porting to ARM a little less straightforward. Rosetta 2 is super fast, so the performance is still great. 

  8. Because old CRT displays literally had a cathode gun inside that scanned the image from top left to bottom right, line by line; hence, your code on the CPU is racing against this beam. 

  9. I think the HV counter was originally meant to be used by light gun games; controllers can raise a level 2 interrupt (so, an even higher priority than VBlank or HInt) that the game would use to read the latched HV counter value. 

  10. Each of these slots can be used to read or write 8 bits of data. The undocumented 128K VRAM mode doubles this to 16 bits. Having this supported on a real console would be immense: all VRAM bandwidth doubles, including DMA bandwidth. 

  11. This is why effectively every game of the era deferred VRAM updates until vertical blanking. Significantly more bandwidth is available during this time (~4KByte/frame for NTSC consoles) and when coupled with DMA, allows for much quicker transfers than during active display. Some games take this a step further and disable the display early, letterboxing the displayed content but gaining additional bandwidth. 

  12. A simple workaround here is to just… not use the top and bottom line of the 8x8 pattern tiles for the sprite. 

  13. My setup consists of an early Japanese model 1 Mega Drive + Mega CD, fed through a RetroTink 5X Pro and into a BlackMagic Design UltraStudio Recorder 3G Thunderbolt capture box. 

  14. The Kraken are absolutely a master class in doing sports team marketing right. I went from knowing nothing about hockey two years ago to now being a season ticket holder and learning to play. How crazy is that? (I just wish my wallet were thicker…) 

  15. “Trist don’t get into the most expensive hobbies ever” challenge (impossible version) 

This post is licensed under CC BY 4.0 .

Demystifying CryptoKit Signatures

-