Doom for Nintendo 64

Discussion in 'Nintendo Game Development' started by jnmartin84, Jul 4, 2014.

  1. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    Because of how much easier it is to transform the column textures (they are stored internally as a set of 1D pixel arrays instead of a 2D texture) I am working on my first hardware pass doing textured columns and solid colored spans. Will be ugly for a while.
     
  2. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    RDP HARDWARE RENDERED SOLID COLUMNS AND SPANS. Working on textures, having some technical difficulties. :)
    rdp1.jpg rdp2.jpg
     
    kammedo likes this.
  3. weinerschnitzel

    weinerschnitzel Spirited Member

    Joined:
    Sep 23, 2012
    Messages:
    153
    Likes Received:
    13
    Oh wow! Speed ups and hardware acceleration in the works? Awesome! I'm glad I'm not too late to enjoy the show!

    It would be really cool to see audio executed by RSP microcode someday. Nonetheless, homebrew that utilizes the RDP is cool too!
     
  4. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    Got textures working on columns. Looks like this will be more of a curiosity than anything. Performance is terrible. PSX Doom did a lot of things to make it more feasible, like rendering at 256x224 for one thing.
     
  5. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    Software rendered version is already about as fast as you can get. I'm not messing with the RSP.
     
  6. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
  7. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    Any level structure utilizing transparent patches slows things down to an excruciating crawl. Also with lots of openings to larger outdoor areas. Anything increasing the number of spans or columns significantly. Lighter scenes draw about 800 quads, heavier ones can head toward 1600.
     
  8. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    I went down a bunch of different rabbit holes over the past the 18 months or so.

    Most of my wild ideas didn't pan out. Networking via USB was a bust since I never even got my PC-only prototype to work.

    The RDP rendering was a little different. I did make some headway with performance improvements (by not acquiring and releasing a lock on the RDP and by not doing a SYNC_PIPE before each individual quad) but could never quite figure out how to scale the column textures correctly without first software texture mapping the original column into a texture before sending it to the RDP. At that point it just seems saner to do it all in software, so I stopped going down that route.

    That is not to say I haven't made some significant improvements recently however...
     
  9. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    I took a week or two to review the VR4300 documentation to really get intimate with the performance characteristics of it, especially with behaviors that negatively impact instruction throughput (avoiding pipeline stalls and avoiding going to memory wherever possible).

    GCC's code generation back end for MIPS is pretty good but it isn't perfect and it turns out that it generates sub-optimal code in a lot of cases regardless of the optimization settings you pass it.

    I already had results from prior profiling and knew where the hottest call sites were in the Doom engine when running with the software renderer.

    Improvements were available by wringing out code changes in R_DrawColumn, R_DrawSpan, FixedMul and FixedDiv.

    FixedMul and FixedDiv were spilling to memory even though they only use their input parameters and constants that fit into "IMMED"-type instructions to compute their return values. I was able to rewrite them so that they never execute a single LW/SW instruction. The prolog/epilog don't need to spill/restore or touch the stack pointer.

    In the case of R_DrawColumn and R_DrawSpan, neither functionn takes in any arguments or returns a value.

    I was able to write them from scratch in MIPS assembly without touching a single callee-saved register (that is, to only use temporary registers, the argument registers and the return value registers). GCC was unable to emit similar code. I was able to entirely remove the prolog/epilog code that touched the stack pointer and loaded/stored registers to/from memory on each and every call (up to 2,000 calls total per frame from my profiling). The code I wrote also has about 20 fewer instructions in the body of the inner texture-mapping loop in each function compared to the best assembly output GCC would produce regardless of the setting of "-O."

    In the case of both functions I also took time to make sure they were scheduled as optimally as possible to avoid pipeline hazards/stalls.
    Code:
    lw $t1, 0($t0)
    add $t2, $t1, $a0
    
    is an example of code that introduces a bubble into the pipeline to deal with the fact that the result of the MEM stage of the LW instruction isn't available to the EX stage of the ADD instruction regardless of register forwarding in the pipeline ($t1 in the case of the example above).

    I was able to re-order EVERY INSTANCE of the LW/xxx instruction pairs in my hand-rolled R_DrawColumn and R_DrawSpan functions to avoid this hazard. Given that these instruction pairs showed up in the inner texture-mapping loop of each function, removing these hazards was huge for instruction throughput.

    I was able to find useful instructions to put in the delay slot of the branches in both functions in almost every case.

    Putting these improvements in place gave a significant rendering performance boost.

    The output in high detail mode is now as fast / smooth as it is in low detail mode.

    One final improvement to be made to the software renderer, in the category of somewhat low-hanging fruit, is to modify it so that it outputs directly to a 16bpp framebuffer instead of updating the Doom-internal 8bpp framebuffer and then having to blit it to the N64 CFB at the end of each frame. That would save 76,800 byte reads each frame / 2,688,000 byte reads each second (76,800 bytes per frame, one frame per update, 35 updates per second). In other words, rendering to the CFB / changing Doom to a true color renderer would save 2 MB / sec in memory reads. That would probably have a significant positive performance impact. Just a guess though. ;-)

    A lot of this new assembly code is in the 64Doom GitHub already.

    I am going to do a new push of the whole code base in the next week or two.
     
    Last edited: Oct 13, 2017
    Borman likes this.
  10. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    Turns out it makes a signifcant difference.

    Newest version is much faster, smoother but is unstable and causes exceptions to be raised under certain rendering situations. Not sure why yet.
     
  11. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
  12. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
  13. thebigman1106

    thebigman1106 Robust Member

    Joined:
    Aug 1, 2010
    Messages:
    211
    Likes Received:
    65
    Wow that looks amazing. Keep up the great work.
     
    jnmartin84 likes this.
  14. Borman

    Borman Digital Games Curator

    Joined:
    Mar 24, 2005
    Messages:
    9,564
    Likes Received:
    2,221
    I can do a new video when I'm back in town if you'd like.
     
  15. udkultimate

    udkultimate Rapidly Rising Member

    Joined:
    Nov 10, 2016
    Messages:
    90
    Likes Received:
    78
    WOW Man, this is amazing. I have seen some time ago on youtube a port of Doom for Nintendo 64, however, I never knew you had a github page nor that you have released the full source code. Thanks and congratulations for this amazing work!

    Now just some questions:

    Is possible to compile any Doom 1 Total Conversion Mod for this Doom 64 (like Resident Evil Mod, Sonic, Donkey Kong, and so on)?

    So I can use the traditional PC Tools to create new content for DOOM and compile it for N64?

    If this is possible, so that means we have the first free game engine toolkit to develop homebrew games for N64!!!

    WOW, WOW, this is amazing bro!!!

    Cheers.
     
  16. udkultimate

    udkultimate Rapidly Rising Member

    Joined:
    Nov 10, 2016
    Messages:
    90
    Likes Received:
    78
    I mean mods like this one, is possible to compile for Doom 64 Port you made?

     
  17. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    That would be cool. I will have a new binary up soon once I get the menus and HUD/status bar rendering again
     
  18. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    I make no claims about mods working. PWAD support is not in the code anymore. Anything that required Dehacked patches would need to be reverse-engineered and applied to the source code and recompiled. I'd say no as far as that stuff goes.
     
  19. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    Github code update might take a while longer. Lots of "#if 0" and "#ifdef EXPERIMENTAL_FEATURE_XYZ" blocks and redundant copies of source files to clean up before comitting and pushing.
     
  20. jnmartin84

    jnmartin84 Robust Member

    Joined:
    Nov 11, 2013
    Messages:
    236
    Likes Received:
    31
    I forgot to mention in previous recent posts but if anyone is interested in following the various day-to-day / week-to-week musings about and updates I'm making to 64Doom, I created a public Facebook group named (wait for it) 64Doom.

    It recently attracted new membership from someone that was following the project through a YouTube video.

    The group (well, really it is me) is gladly and graciously accepting all new-comers as long as they don't:

    1) Ask me why I'm not working on (anything).

    2) Be overly / overtly negative about the project or anything seen in the group.

    3) SPAM.
     

Share This Page