It's 100% decompiled to C, but not fully labelled yet. That means there's lots it's auto-generated names all over the place. It would be interesting to see someone try to port it now though.
I wish someone ran a proper study. In my experience it helps mark some patterns you may not be immediately familiar with, like CRC functions/tables. It also does a good job where no thinking is required, like when you have partial information: "for(unk=0; unk<unk2; unk++) { unk3=players[unk]... }" - you know what the names are, you just need to do the boring part. For completely unknown things, it may get more interesting. But I know I'd like to at least see the suggestions. It's a long and boring work to decompile things fully.
better add IR too, and all the optimized variants of the ASM for the specified code etc. - its not as straightforward, but that depends also on platform. CISC is generally more wacky than RISC i suppose.
also, a lot of things in stuff like ROMs is about I/O to component in the devices, so you can disassemble and decompile all you want but without the right specifications and context you cannot say what the code does.
so it will also need all specifications of the hardware platform you are running the code on, as well as in this case perhaps even hardware in the catridge etc. (heard those also sometimes have their own chips etc...).
i'd say for 'regular application code' that runs within an OS it might be easier, but still you need to provide a lot of context from the actual execution environment to reason properly what the code actually does? (what does INT 80 run and possibly return anyway, that code is outside of your target binary)
There are several scientific publications on this. But I don't think the latest models are available as convenient plugins for IDA or Ghidra. Guessing variable and function names are considered as relatively easy nowadays. Types and structures are the challenges now.
The task sounds similar to descriptions in the API space. People figured LLMs would be awesome at annotating API specs with descriptions that are so often missing.
Truth is, everyone is realising it’s a bit the opposite: the LLMs are “holding it wrong”, making a best guess at what the interfaces do without slightly deeper analysis. So instead, you want humans writing good descriptions specifically so the LLM can make good choices as to how to piece things together.
It’s possible you could set it off on the labelling task, but anecdotally in my experience it will fail when you need to look a couple levels deep into the code to see how functions play with each other. And again, imo, the big risk is getting a label that _looks_ right, but is actually pretty misleadingly wrong.
With regards to API specs, if you have an LLM have a swing at it, is it adding value or is it a box ticking exercise because some tool or organization wants you to document everything in a certain way?
If it's easy to generate documentation, and / or if documentation is autogenerated, people are also less likely to actually read it. Worse, if that comment is then used with another LLM to generate code, it could do it even wronger.
I think that at this stage, all of the programming best practices will find a new reasoning, LLMs - that is, a well-documented API will have better results when an LLM takes a swing at it than a poorly documented one. Same with code and programming languages, use straightforward, non-magic code for better results. This was always true of course, but for some reason people have pushed that into the background or think of it as a box ticking exercise.
My take on AI-for docs is - it’s good. But you need to have a human review.
It’s a lot easier to have someone who knows the code well review a paragraph of text than to ask them to write that paragraph.
Good comments make the code much easier for LLMs to use, as well. Especially in the case where the LLM generated docs would be subtly misunderstanding the purpose.
LLM-assisted reverse engineering is definitely a hard problem but very worthwhile if someone can crack it. I hope at least some "prompt engineers" are trying to make progress.
In my limited experience (my use case was a decompiled minified jar that I just wanted to peek around in), LLM's are absolutely fantastic at it.
As with any LLM output, of course it won't be 100% perfect, and you shouldn't treat the output as truthful "data". But I could absolutely use it to make sense of things that at first sight were gibberish, with the original next to it.
i think this sort of scenario is where an ai code comb through is going to add value - get an ai to guess the likely name of variables/functions based on usage and context etc. It'd make a great starting point for manually labelling each variable correctly.
I’d argue using AI to provide that input is no longer a creative work and puts the output in the realm of being not transformative. Those are two of the main reasons these projects are not violating copyright and all of the legal risk that comes with decompiling.
I'm using Ghidra with Claude-Code to reverse engineer an old transportation-sim game, and Claude is very good at figuring out what a function does and what it and its variables should be named.
This is very wrong. Ghidra might decompile to some C code, but this is a completely different, and very high bar. This is 100% matching C code that compiles to the exact binary the game shipped with. This means it's able to be modified, tweaked and upgraded. Ghidra is helpful to do some research, but it won't compile.
Given the source code for the build engine, which was ported to the N64 and used to make the game, is freely available for non-commercial use, could it be used to map some of the function and variable names?
> build engine, which was ported to the N64 and used to make the game
I don't think that's what they did. Looking at some gameplay footage on youtube, it's a third person game with a full 3d player model, not flat sprites, and the level geometry seems to be full proper 3d without the build engine distortions when looking up and down. I think they built or used a different engine designed to take advantage of the N64s graphics hardware.
this might have some limited uses, but you'd need to know how it was optimised, and also perhaps build the build engine for the same target and then decompile it to see what it looks like after that kind of treatment.?
perhaps with a bit of luck you'd get some useful markers / functions mapped tho, its not unheard of.
problem in my mind (didnt test it ofc) would be that the decompiled version is decompiled from a different ISA that build usually compiles to, so the decompiled version in my mind would look totally different. (you dont have the ported sources i suppose, only the originals).
Always good to see SotN mentioned (and nice to see a familiar handle).
There are so many cool things that have been built for this decomp, probably my favorite being the dups tool (tools/dups) for finding duplicate or near duplicate code.
Zero Hour was one of the must have's of the Nintendo 64 era and one of the few good games in the latter part of the Duke Nukem series. Despite its challenging platforming and a few soul-crushing levels, the game had consistently rich settings and managed to recreate that Duke 3D charm.
The recent Perfect Dark port was incredible and I hope this decomp gets the same treatment.
It's a bit of a lost gem. Unlike the Playstation games, which are Tomb Raider clones and aren't well regarded, Zero Hour is based on the Build engine like the original Duke Nukem 3D was and while it doesn't hold up to that standard, it's arguably the best of the non-3D Realms Duke Nukem games. Unfortunately they changed the perspective to third person (with a half-finished first person mode as a cheat) and it controls poorly. With the source available, that can now be fixed.
No deal breakers but the lack of first person viewmodels, the really narrow FOV and an aggressive joystick accelerlation curve make it unpleasant to play. It isn't too bad on emulators that hack in keyboard and mouse, but this port is a good opportunity to polish it up further in much the same way that Perfect Dark port did.
Good question, but I wish they had a screenshot thre so I could send this to my school buddies. Last time we played this everything was still a simple chaotic heaven :)
Would really like to know what makes a person (or group of people) invest the time and energy to do this? Is there a group of hobbyist gamers who work on titles they love? Is it about digital conservation?
I've spent a lot of time reverse-engineering vintage synthesizer firmware (which is a bit simpler than modern games). I did complete end-to-end annotations of these two vintage synth ROMs:
It started because I was just curious about how these devices actually worked. In the end I learned a lot of really invaluable skills that really broadened my horizons as an engineer. I got a chance to talk to a handful of incredibly smart people too. The actual work can be a lot of fun. It's like piecing together a really large and technical jigsaw puzzle. In my case, it also led to me being able to release a fun firmware mod: https://github.com/ajxs/yamaha_dx97
It can be a bit analogous to archaeology too. Even though in my case the DX7 is only 42 years old, that was an aeon ago in computing terms. You gain a bit of insight into how different engineers used to design and build things. Even though development for the N64 is fairly recent, from memory the console had some interesting constraints that made development tricky.
> the console had some interesting constraints that made development tricky
The ones that come to mind are the tiny 4KB texture cache, high memory latency (thanks Rambus), and inefficient RCP microcode. The N64 could have been so much more with a few architectural tweaks but developers liked the Playstation much better on account of its simplicity despite it being technically inferior in most respects.
>developers liked the Playstation much better on account of its simplicity despite it being technically inferior in most respects.
That statement is surprising, as being a kid I remember the PlayStation as obviously graphically superior. I’m not doubting you but what explains the difference between technical and user perception?
The N64's processor had triple the clock speed of the Playstation's on top of having more RAM (up to 8MB versus the 3MB of the Playstation). Its graphics subsystem could also do perspective-correct texture mapping and push more polygons per second. It also had a hardware FPU which the Playstation notably lacked. It's pretty widely acknowledged that the N64's Achilles heel was its small texture cache which caused developers to use lower-resolution textures with heavy anti-aliasing than they otherwise would. This results in the characteristic smeary look of N64 games versus the Playstation's wobbly, pixelated aesthetic. You probably thought the PS1 looked better because of the more detailed textures.
I've no doubt (as a thoroughly amateur video game historian) that with a few small tweaks Nintendo would have ate Sony's lunch that generation. In that alternate universe Sega would have had better developer support for the Saturn and done crazy stuff with their super-wacky architecture too but I digress...
I'm the person who reimplemented Cosmo's Cosmic Adventure (DOS, 1992) and my original reasoning was a desire to know how it was able to do some of the graphical tricks it did on such underpowered hardware (it could run on an IBM AT). The game wasn't anything special by any metric, but it was an important piece of my childhood and I felt an attachment to it. I also learned a hell of a lot about the PC platform, the C ecosystem from the 80s, and my own tastes as an engineer.
Maybe they just really love the game. This is a form of tribute.
I too have a beloved video game from my childhood: Mega Man Battle Network 2. That game changed my life. I learned English and became a programmer because of it. I have two physical copies of it in my collection, one of them factory sealed.
Sometimes I open the game in IDA and try to reverse engineer bits and pieces of it. I just want to understand the game. I don't have the time, the dedication or even the low level programming knowledge that these badass folks in the ROM hacking community have, but I still try it.
I gave a talk at Game On Expo about decompiling Castlevania: Symphony of the Night (https://github.com/xeeynamo/sotn-decomp ) earlier this year and talked a little bit about exactly this. Almost everyone who works on loves the game. After that, motivation varies: some want to see ports, some want to mod, some want to learn everything they can, some want to preserve. Along with those I also like the challenge (not unlike sudoku).
Doing it long enough requires learning compiler history and theory, understanding business and engineering pressures of making the game, and occasionally reveals why parts of the game work the way they do.
In addition to those categories, speedrunning glitch hunters tend to gravitate to participating in these projects as well. E.g. the Twilight Princess decomp was started primarily by and for the speedrunning community.
It's also the endgame for romhacking, once a game is fully decompiled modders can go far beyond what was feasible through prodding the original binary. That can mean much more complicated gameplay mods, but also porting the engine to run natively on modern platforms, removing framerate limits, and so on.
Preservation and ease of modification. New console units are not being made anymore, and the number of old ones is limited, they can break, and there is an issue with output video formats that are incompatible with modern monitors/TVs. There is emulation, but it's not perfect and can be demanding. Decompilations enable people to create native binaries for different platforms. This makes playing the game easier and more accessible.
This is how the text adventure/interactive fiction community started. Some hackers reverse engineered the Infocom z-machine then built new languages and compilers so new games could be created.
There are people who spend hours and hours analyzing bit characters in things like Lord of the Rings (where did the Blue Wizards go? Who is Tom Bombadil?) or Star Wars. This is a similar fan obsession. Remember fan comes from fanatic.
Same. Is there a project page or anything that explains the context, the reasons, the history behind this? I bet it would be very interesting.
The Readme is too technical and misses a writeup on the soul of the project: Section 1, title. Section 2, already talking about Ubuntu and dependencies. Where is section "Why?" :-) ?
Based off the commit history, this has been one person's on-off project for 3 years. My guess is that they like this game and they were curious about how decomps come to fruition - and what better way to find out than to do it?
You climb a mountain because it's there. Different people have different mountains.
It's an interesting challenge, you can improve it or make it do X,Y,Z, you can add speedrunning or competition gaming features, solving puzzles gives a sense of accomplishment, a certain small group gives you social clout, etc.
A quick inspection of the repo indicates that it doesn’t contain any copyrighted material. They’ve just uploaded the code to perform the decompilation.
Usually these projects only contain a copy of the source code to build the binary. You still need the game assets like the levels and sounds to play the game.
You can definitely do a lot of relabeling that way. It may be also worth trying a loop of "fix until it matches binary" for separate files... But I haven't seen anyone actually write it up.
Edit: just gave it a go, and guessing reasonable variable names works extremely well when there's partial information already available. Especially nice for things like naming all the counters and temporaries when you know what you're iterating over (so it's just boring manual and trivial work), but can also figure out the meaning of larger patterns for function names.
> It may be also worth trying a loop of "fix until it matches binary" for separate files... But I haven't seen anyone actually write it up.
I might be misunderstanding what you mean, but there is decomp-permuter which I think does just this. Rewrites the code in various heuristics and subtle ways to attempt to find a match at function level.
Fairly well. They aren't perfect, but they save a lot of time.
They are also downright superhuman at recognizing common library functions or spotting well known algorithms, even if badly mangled by compilation and decompilation.
That's mostly my own experience. The results of reverse engineering can be sexy, but the work itself sure isn't, so we aren't exactly swimming in high quality writeups.
But given how much of RE is just going into unfamiliar places and learning unfamiliar things? And how much of it is repetitive but-not-easy-to-automate tasks? I fully expect LLMs to proliferate.
I’m not sure if EFF has made any statements about it, but I would be concerned with the copyright aspects. Decompilation works (legally) because a new creative work is being produced.
It would be easy to argue LLMs are producing derivative, non-transformative works. AI companies have not made that any easier by paying publishers huge royalties to license training data.
I personally would stay away from it to avoid the risk, but I’d imagine if it became “easy” enough to produce a matching decomp with an LLM, we would get a more specific legal precedent pretty quickly.
I'm using an agent to port a game. I have the source. It's not going well. Lots of rabbit holes that are self-inflicted because the LLM doesn't want to port a lot of libraries because it's too much work for one round. It does a lot of stubbing and makes assumptions and that breaks the whole thing.
And people who play Chinese retro handhelds must dump their own ROMs. Unless they bought a 10000 in 1 version for $10 extra, in which case it's all perfectly legal!
Sure, you must, I guess, but is anyone really going to jail over this piece of ancient history? I find these disclaimers cute.
The parent poster is not making a legal statement. They copied/pasted the first line of the Readme. I made the clarification that the note is a legal disclaimer, not s technical requirement, so people, including the parent poster, are not confused.
Functionally, the README describes that providing a game copy is necessary for creating a build. This would make sense, since unless the sound, image, text, etc. assets are all baked into the code, those would have to come separately.
Legally, it further doesn't make much sense. This is cleaned up (?) and painstakingly bytematched decompiler output (again based on the README), so it's unfortunately just plain illegal [0], disclaimers nonwithstanding.
[0] as always, legality depends on jurisdiction - so as always, if in doubt, consult an actual lawyer
Byte matched decompilation is not illegal in the US assuming it’s done correctly. Compilers produce the same output (bytes) for effectively infinite input (bytes). Figuring out a novel input without having access to the original is a new, protected, creative work with its own copyright.
legal: decomp -> write spec -> get it signed off -> pass it to someone else for implementation -> distribute
illegal: decomp -> clean up -> distribute
Immediately once they pulled the binary up in their decompiler of choice they were legally tainted and could no longer legally publish any related source code.
This is of course then debated, like everything. So all of this is to the extent I'm familiar with the topic.
It is illegal to redistribute this by any definition on the word and frankly I do not understand how you even think it could possibly be legal as it defies the very purpose of copyright. Like, you can create copies of a painting as long as you only look at a photo of it, not the real painting? Please.
Don't you need to have purchased the game before you have the right to enjoy it? How else do you prove that you have paid for the license to run the game if you don't have a copy. That was my initial interpretation.
It's 100% decompiled to C, but not fully labelled yet. That means there's lots it's auto-generated names all over the place. It would be interesting to see someone try to port it now though.
Would LLMs be good at labelling, or would the risk of false-positives just waste more time than it saved?
I wish someone ran a proper study. In my experience it helps mark some patterns you may not be immediately familiar with, like CRC functions/tables. It also does a good job where no thinking is required, like when you have partial information: "for(unk=0; unk<unk2; unk++) { unk3=players[unk]... }" - you know what the names are, you just need to do the boring part. For completely unknown things, it may get more interesting. But I know I'd like to at least see the suggestions. It's a long and boring work to decompile things fully.
Seems like it would be pretty straight forward to fine tune an LLM based on code + asm pairs to help facilitate reverse engineering.
better add IR too, and all the optimized variants of the ASM for the specified code etc. - its not as straightforward, but that depends also on platform. CISC is generally more wacky than RISC i suppose.
also, a lot of things in stuff like ROMs is about I/O to component in the devices, so you can disassemble and decompile all you want but without the right specifications and context you cannot say what the code does.
so it will also need all specifications of the hardware platform you are running the code on, as well as in this case perhaps even hardware in the catridge etc. (heard those also sometimes have their own chips etc...).
i'd say for 'regular application code' that runs within an OS it might be easier, but still you need to provide a lot of context from the actual execution environment to reason properly what the code actually does? (what does INT 80 run and possibly return anyway, that code is outside of your target binary)
> I wish someone ran a proper study
There are several scientific publications on this. But I don't think the latest models are available as convenient plugins for IDA or Ghidra. Guessing variable and function names are considered as relatively easy nowadays. Types and structures are the challenges now.
Even comments would be useful ("this function might be doing x, or maybe y")
I use the following IDA pro MCP plugin for this: https://github.com/mrexodia/ida-pro-mcp
I have never used it, but I think the GhidrAssist plugin does that (and more).
The task sounds similar to descriptions in the API space. People figured LLMs would be awesome at annotating API specs with descriptions that are so often missing. Truth is, everyone is realising it’s a bit the opposite: the LLMs are “holding it wrong”, making a best guess at what the interfaces do without slightly deeper analysis. So instead, you want humans writing good descriptions specifically so the LLM can make good choices as to how to piece things together.
It’s possible you could set it off on the labelling task, but anecdotally in my experience it will fail when you need to look a couple levels deep into the code to see how functions play with each other. And again, imo, the big risk is getting a label that _looks_ right, but is actually pretty misleadingly wrong.
With regards to API specs, if you have an LLM have a swing at it, is it adding value or is it a box ticking exercise because some tool or organization wants you to document everything in a certain way?
If it's easy to generate documentation, and / or if documentation is autogenerated, people are also less likely to actually read it. Worse, if that comment is then used with another LLM to generate code, it could do it even wronger.
I think that at this stage, all of the programming best practices will find a new reasoning, LLMs - that is, a well-documented API will have better results when an LLM takes a swing at it than a poorly documented one. Same with code and programming languages, use straightforward, non-magic code for better results. This was always true of course, but for some reason people have pushed that into the background or think of it as a box ticking exercise.
My take on AI-for docs is - it’s good. But you need to have a human review.
It’s a lot easier to have someone who knows the code well review a paragraph of text than to ask them to write that paragraph.
Good comments make the code much easier for LLMs to use, as well. Especially in the case where the LLM generated docs would be subtly misunderstanding the purpose.
LLM-assisted reverse engineering is definitely a hard problem but very worthwhile if someone can crack it. I hope at least some "prompt engineers" are trying to make progress.
In my limited experience (my use case was a decompiled minified jar that I just wanted to peek around in), LLM's are absolutely fantastic at it.
As with any LLM output, of course it won't be 100% perfect, and you shouldn't treat the output as truthful "data". But I could absolutely use it to make sense of things that at first sight were gibberish, with the original next to it.
With things like Ghidra now freely available, "100% decompiled to C" really isn't that high of a bar anymore.
Usually it means it has bit-perfect recompilation, which can take a lot of work.
i think this sort of scenario is where an ai code comb through is going to add value - get an ai to guess the likely name of variables/functions based on usage and context etc. It'd make a great starting point for manually labelling each variable correctly.
I’d argue using AI to provide that input is no longer a creative work and puts the output in the realm of being not transformative. Those are two of the main reasons these projects are not violating copyright and all of the legal risk that comes with decompiling.
I'm using Ghidra with Claude-Code to reverse engineer an old transportation-sim game, and Claude is very good at figuring out what a function does and what it and its variables should be named.
This is very wrong. Ghidra might decompile to some C code, but this is a completely different, and very high bar. This is 100% matching C code that compiles to the exact binary the game shipped with. This means it's able to be modified, tweaked and upgraded. Ghidra is helpful to do some research, but it won't compile.
Given the source code for the build engine, which was ported to the N64 and used to make the game, is freely available for non-commercial use, could it be used to map some of the function and variable names?
> build engine, which was ported to the N64 and used to make the game
I don't think that's what they did. Looking at some gameplay footage on youtube, it's a third person game with a full 3d player model, not flat sprites, and the level geometry seems to be full proper 3d without the build engine distortions when looking up and down. I think they built or used a different engine designed to take advantage of the N64s graphics hardware.
this might have some limited uses, but you'd need to know how it was optimised, and also perhaps build the build engine for the same target and then decompile it to see what it looks like after that kind of treatment.?
perhaps with a bit of luck you'd get some useful markers / functions mapped tho, its not unheard of.
problem in my mind (didnt test it ofc) would be that the decompiled version is decompiled from a different ISA that build usually compiles to, so the decompiled version in my mind would look totally different. (you dont have the ported sources i suppose, only the originals).
Gillou68310 looks to have been a one person army for 99% of it, what an impressive show of dedication.
The Legend of Zelda: Twilight Princess has been getting farther along as well https://decomp.dev/zeldaret/tp
While we're here I'll give a shoutout to the Castlevania: Symphony of the Night decomp which is coming along pretty well too (still lots to do)
https://github.com/Xeeynamo/sotn-decomp
Always good to see SotN mentioned (and nice to see a familiar handle).
There are so many cool things that have been built for this decomp, probably my favorite being the dups tool (tools/dups) for finding duplicate or near duplicate code.
Zero Hour was one of the must have's of the Nintendo 64 era and one of the few good games in the latter part of the Duke Nukem series. Despite its challenging platforming and a few soul-crushing levels, the game had consistently rich settings and managed to recreate that Duke 3D charm.
The recent Perfect Dark port was incredible and I hope this decomp gets the same treatment.
Why Duke Nukem: Zero Hour of all games?
It's a bit of a lost gem. Unlike the Playstation games, which are Tomb Raider clones and aren't well regarded, Zero Hour is based on the Build engine like the original Duke Nukem 3D was and while it doesn't hold up to that standard, it's arguably the best of the non-3D Realms Duke Nukem games. Unfortunately they changed the perspective to third person (with a half-finished first person mode as a cheat) and it controls poorly. With the source available, that can now be fixed.
Why is the first person mode considered half finished?
No deal breakers but the lack of first person viewmodels, the really narrow FOV and an aggressive joystick accelerlation curve make it unpleasant to play. It isn't too bad on emulators that hack in keyboard and mouse, but this port is a good opportunity to polish it up further in much the same way that Perfect Dark port did.
Excellent explanation, thank you
Good question, but I wish they had a screenshot thre so I could send this to my school buddies. Last time we played this everything was still a simple chaotic heaven :)
Would really like to know what makes a person (or group of people) invest the time and energy to do this? Is there a group of hobbyist gamers who work on titles they love? Is it about digital conservation?
Nostalgia: a sentimental longing or wistful affection for the past, typically for a period or place with happy personal associations.
I've spent a lot of time reverse-engineering vintage synthesizer firmware (which is a bit simpler than modern games). I did complete end-to-end annotations of these two vintage synth ROMs:
- https://github.com/ajxs/yamaha_dx7_rom_disassembly
- https://github.com/ajxs/yamaha_dx9_rom_disassembly
It started because I was just curious about how these devices actually worked. In the end I learned a lot of really invaluable skills that really broadened my horizons as an engineer. I got a chance to talk to a handful of incredibly smart people too. The actual work can be a lot of fun. It's like piecing together a really large and technical jigsaw puzzle. In my case, it also led to me being able to release a fun firmware mod: https://github.com/ajxs/yamaha_dx97
In case anyone is curious about how I worked, I wrote a bit of a tutorial article: https://ajxs.me/blog/Introduction_to_Reverse-Engineering_Vin...
It can be a bit analogous to archaeology too. Even though in my case the DX7 is only 42 years old, that was an aeon ago in computing terms. You gain a bit of insight into how different engineers used to design and build things. Even though development for the N64 is fairly recent, from memory the console had some interesting constraints that made development tricky.
> the console had some interesting constraints that made development tricky
The ones that come to mind are the tiny 4KB texture cache, high memory latency (thanks Rambus), and inefficient RCP microcode. The N64 could have been so much more with a few architectural tweaks but developers liked the Playstation much better on account of its simplicity despite it being technically inferior in most respects.
>developers liked the Playstation much better on account of its simplicity despite it being technically inferior in most respects.
That statement is surprising, as being a kid I remember the PlayStation as obviously graphically superior. I’m not doubting you but what explains the difference between technical and user perception?
The N64's processor had triple the clock speed of the Playstation's on top of having more RAM (up to 8MB versus the 3MB of the Playstation). Its graphics subsystem could also do perspective-correct texture mapping and push more polygons per second. It also had a hardware FPU which the Playstation notably lacked. It's pretty widely acknowledged that the N64's Achilles heel was its small texture cache which caused developers to use lower-resolution textures with heavy anti-aliasing than they otherwise would. This results in the characteristic smeary look of N64 games versus the Playstation's wobbly, pixelated aesthetic. You probably thought the PS1 looked better because of the more detailed textures.
I've no doubt (as a thoroughly amateur video game historian) that with a few small tweaks Nintendo would have ate Sony's lunch that generation. In that alternate universe Sega would have had better developer support for the Saturn and done crazy stuff with their super-wacky architecture too but I digress...
That’s interesting! I thought the answer was going to be related with CD vs cartridge capacity.
It also sounds crazy that games like tekken 3 could run on 3mb RAM total, when just the current music track feels like it could take that much space.
Many PSX games had the music on the CD as standard audio, so they didn't require much of any effort from the console to play.
I'm the person who reimplemented Cosmo's Cosmic Adventure (DOS, 1992) and my original reasoning was a desire to know how it was able to do some of the graphical tricks it did on such underpowered hardware (it could run on an IBM AT). The game wasn't anything special by any metric, but it was an important piece of my childhood and I felt an attachment to it. I also learned a hell of a lot about the PC platform, the C ecosystem from the 80s, and my own tastes as an engineer.
https://github.com/smitelli/cosmore
https://cosmodoc.org/
I guess you’ve never kicked ass and chewed bubble gum
It's hard to do when you're all out of gum
Maybe they just really love the game. This is a form of tribute.
I too have a beloved video game from my childhood: Mega Man Battle Network 2. That game changed my life. I learned English and became a programmer because of it. I have two physical copies of it in my collection, one of them factory sealed.
Sometimes I open the game in IDA and try to reverse engineer bits and pieces of it. I just want to understand the game. I don't have the time, the dedication or even the low level programming knowledge that these badass folks in the ROM hacking community have, but I still try it.
I gave a talk at Game On Expo about decompiling Castlevania: Symphony of the Night (https://github.com/xeeynamo/sotn-decomp ) earlier this year and talked a little bit about exactly this. Almost everyone who works on loves the game. After that, motivation varies: some want to see ports, some want to mod, some want to learn everything they can, some want to preserve. Along with those I also like the challenge (not unlike sudoku).
Doing it long enough requires learning compiler history and theory, understanding business and engineering pressures of making the game, and occasionally reveals why parts of the game work the way they do.
I stream working on SotN and am happy to answer any questions in chat if you’re interested in learning more - https://m.twitch.tv/madeupofwires/home
In addition to those categories, speedrunning glitch hunters tend to gravitate to participating in these projects as well. E.g. the Twilight Princess decomp was started primarily by and for the speedrunning community.
It's also the endgame for romhacking, once a game is fully decompiled modders can go far beyond what was feasible through prodding the original binary. That can mean much more complicated gameplay mods, but also porting the engine to run natively on modern platforms, removing framerate limits, and so on.
Preservation and ease of modification. New console units are not being made anymore, and the number of old ones is limited, they can break, and there is an issue with output video formats that are incompatible with modern monitors/TVs. There is emulation, but it's not perfect and can be demanding. Decompilations enable people to create native binaries for different platforms. This makes playing the game easier and more accessible.
This is how the text adventure/interactive fiction community started. Some hackers reverse engineered the Infocom z-machine then built new languages and compilers so new games could be created.
There are people who spend hours and hours analyzing bit characters in things like Lord of the Rings (where did the Blue Wizards go? Who is Tom Bombadil?) or Star Wars. This is a similar fan obsession. Remember fan comes from fanatic.
Same. Is there a project page or anything that explains the context, the reasons, the history behind this? I bet it would be very interesting.
The Readme is too technical and misses a writeup on the soul of the project: Section 1, title. Section 2, already talking about Ubuntu and dependencies. Where is section "Why?" :-) ?
Based off the commit history, this has been one person's on-off project for 3 years. My guess is that they like this game and they were curious about how decomps come to fruition - and what better way to find out than to do it?
You climb a mountain because it's there. Different people have different mountains.
It's an interesting challenge, you can improve it or make it do X,Y,Z, you can add speedrunning or competition gaming features, solving puzzles gives a sense of accomplishment, a certain small group gives you social clout, etc.
Very cool! But… GitHub? Do they ever learn? 3, 2, 1… Takedown notice!
A quick inspection of the repo indicates that it doesn’t contain any copyrighted material. They’ve just uploaded the code to perform the decompilation.
Why would they take it down? Decompilations of Nintendo games are on GitHub as well.
"A decompilation of Duke Nukem Zero Hour for N64.
Note: To use this repository, you must already own a copy of the game."
We all now do, of course
Usually these projects only contain a copy of the source code to build the binary. You still need the game assets like the levels and sounds to play the game.
Are LLMs well suited to this kind of reverse engineering?
You can definitely do a lot of relabeling that way. It may be also worth trying a loop of "fix until it matches binary" for separate files... But I haven't seen anyone actually write it up.
There are attempts like this https://github.com/louisgthier/decompai that are related, but not quite the same as this project.
Edit: just gave it a go, and guessing reasonable variable names works extremely well when there's partial information already available. Especially nice for things like naming all the counters and temporaries when you know what you're iterating over (so it's just boring manual and trivial work), but can also figure out the meaning of larger patterns for function names.
> It may be also worth trying a loop of "fix until it matches binary" for separate files... But I haven't seen anyone actually write it up.
I might be misunderstanding what you mean, but there is decomp-permuter which I think does just this. Rewrites the code in various heuristics and subtle ways to attempt to find a match at function level.
https://github.com/simonlindholm/decomp-permuter
Ah yeah, I forgot about this one! Something like that. LLM will be less restricted in choices, but permuter will get you compatible/valid code.
Fairly well. They aren't perfect, but they save a lot of time.
They are also downright superhuman at recognizing common library functions or spotting well known algorithms, even if badly mangled by compilation and decompilation.
Really? Are there any articles about that. That sounds really interesting.
That's mostly my own experience. The results of reverse engineering can be sexy, but the work itself sure isn't, so we aren't exactly swimming in high quality writeups.
But given how much of RE is just going into unfamiliar places and learning unfamiliar things? And how much of it is repetitive but-not-easy-to-automate tasks? I fully expect LLMs to proliferate.
I’m not sure if EFF has made any statements about it, but I would be concerned with the copyright aspects. Decompilation works (legally) because a new creative work is being produced.
It would be easy to argue LLMs are producing derivative, non-transformative works. AI companies have not made that any easier by paying publishers huge royalties to license training data.
I personally would stay away from it to avoid the risk, but I’d imagine if it became “easy” enough to produce a matching decomp with an LLM, we would get a more specific legal precedent pretty quickly.
I'm using an agent to port a game. I have the source. It's not going well. Lots of rabbit holes that are self-inflicted because the LLM doesn't want to port a lot of libraries because it's too much work for one round. It does a lot of stubbing and makes assumptions and that breaks the whole thing.
How did you approach it? Some specific harness? Planning?
I’ve not experimented but I thought they might be valuable for isolated variable / function renaming
Note: To use this repository, you must already own a copy of the game.
And people who play Chinese retro handhelds must dump their own ROMs. Unless they bought a 10000 in 1 version for $10 extra, in which case it's all perfectly legal!
Sure, you must, I guess, but is anyone really going to jail over this piece of ancient history? I find these disclaimers cute.
We do. Don't worry about it.
I used it just fine without one, I think you’re wrong.
I believe you are making a technical statement and the parent poster is making a legal one. You're both right I guess
The parent poster is not making a legal statement. They copied/pasted the first line of the Readme. I made the clarification that the note is a legal disclaimer, not s technical requirement, so people, including the parent poster, are not confused.
well, you better delete it within 24 hours then!
this is a legal disclaimer lol, not an actual requirement
How so?
Functionally, the README describes that providing a game copy is necessary for creating a build. This would make sense, since unless the sound, image, text, etc. assets are all baked into the code, those would have to come separately.
Legally, it further doesn't make much sense. This is cleaned up (?) and painstakingly bytematched decompiler output (again based on the README), so it's unfortunately just plain illegal [0], disclaimers nonwithstanding.
[0] as always, legality depends on jurisdiction - so as always, if in doubt, consult an actual lawyer
Byte matched decompilation is not illegal in the US assuming it’s done correctly. Compilers produce the same output (bytes) for effectively infinite input (bytes). Figuring out a novel input without having access to the original is a new, protected, creative work with its own copyright.
https://en.wikipedia.org/wiki/Clean-room_design
TLDR:
legal: decomp -> write spec -> get it signed off -> pass it to someone else for implementation -> distribute
illegal: decomp -> clean up -> distribute
Immediately once they pulled the binary up in their decompiler of choice they were legally tainted and could no longer legally publish any related source code.
This is of course then debated, like everything. So all of this is to the extent I'm familiar with the topic.
It is illegal to redistribute this by any definition on the word and frankly I do not understand how you even think it could possibly be legal as it defies the very purpose of copyright. Like, you can create copies of a painting as long as you only look at a photo of it, not the real painting? Please.
Don't you need to have purchased the game before you have the right to enjoy it? How else do you prove that you have paid for the license to run the game if you don't have a copy. That was my initial interpretation.
What if I just simply promise to not enjoy it?
Still [eagerly] waiting over here for Duke Nuke Forever!
..since how long? I've lost track (:
Not sure if it's a joke, but in case you missed the release - it's already out: https://store.steampowered.com/agecheck/app/57900/
That release is like the Matrix movie sequels.
There are no Matrix movie sequels I hear you say? ... Indeed.
The Matrix sequels are fantastic, the haters are only hurting themselves by refusing to enjoy them.
I think it (Duke Nukem Forever) wasn't as bad as most people say. It's a quite enjoyable shooter I would say.
As of earlier this year, it's been out longer than the time between it getting announced and released.
oh, this just makes me sad. Has it really been that long?
[dead]