Saturday, October 29, 2011

Part Six - Playing hide and seek with pixels

At the beginning of August 2009, I had already progressed quite far. In the previous two months I got to the point where I could run with the Prince through the first level, except that none of the interactive elements of the game worked yet.
And there was one big visual problem. Walking behind the pillars in the level didn't cause the character to be occluded. The prince was simply a set of sprites on top of the level background graphics. Not good enough.


In the Apple II original (and almost all other versions of the game) this issue is solved easily because the Kid is drawn into the same frame buffer as the level graphics. Then afterwards the occluding front pieces of each block are simply drawn again, overdrawing the parts of the player that should be invisible. It's as common a technique as possible.
But since the C64 has a color attribute system where you can only specify colors for each 8x8 block and not per pixel, using the same kind of drawing system would've caused the player to look very weird. The colors used for floors and other nearby parts would bleed into the player image. A problem which is commonly known as "color clash". Look no further than the ZX Spectrum version, which suffers from this problem, albeit they did try to hide it as much as possible.

On the ZX Spectrum, the colors of the torch bleed into the image of the character

I knew that I definitely wanted to avoid this problem on the C64. That's why I initially set out to use sprites, which have their own set of colors and therefore won't clash with the background colors.

Now generally you would solve the visibility of a sprite on the C64 using the sprite/background priority system. The VIC-II chip considers two of the 4 possible color bit-pairs as background and the other two as foreground. Each sprite can be selected to be either above foreground and background, or between foreground and background. Unfortunately this means that the foreground colors (bit pairs 10 and 11) are always drawn on top of sprites, so it's not possible to fully occlude a sprite with all of the possible colors.

The other method typically used to obscure certain parts of a sprite is by overlaying them with another sprite that has a higher priority and contains the same graphics as the background. This however requires that sprite to share some of its colors with the background, which puts a restriction on the colors to use for the actual sprite (two out of three sprite colors are shared).
Another trick is to put one sprite behind the foreground layer and the lower priority sprite in front of the foreground layer. This will perform the same masking but the foreground graphics will hide the sprite that is causing the masking.
But I knew already from the size of the animation images that I'll need a lot of sprites and that using extra sprites for masking would be pretty much impossible. Plus that would've meant even higher memory usage.

After considering all of these options and also taking their complexities into account, I've decided for the one solution that would be slowest, but would yield the most accurate results: Actually modifying the sprite data in memory, for each frame, every time. Ouch. After all, I wanted to have pixel-precise occlusion.
So whenever the character would walk behind a pillar or other occluding object, I would have to mask out those pixels of the sprites and replace them with the transparent color bit-pair.
I was confident that I could optimize the code to the point where it would be fast enough.

To simplify the masking code, I decided to use a whole bitmap (8KB of data) to buffer the mask for the whole screen. I thought that it should be easy to piggyback the drawing of the mask onto the normal drawing code, which I already was familiar with.
I basically just modified the function that draws the front piece images (182F:drawFrontPieceImage) to also draw into the mask bitmap, and added a simple set of images to be used as masks.
The result looked like this:

The mask bitmap for the first screen of the game. Just think of it as a 1-bit depth buffer

The black parts of the image are pixels where the sprites of the Prince should be invisible. Basically using an AND operation of this bitmap with the sprite data would mask out the occluded parts.

I probably went through 6 or 7 different versions of this masking code. In the beginning it was very unoptimized and therefore really slow. One of the reasons why it was so CPU intensive was the fact that the mask data needed to be shifted to align with the sprite data, depending on the X coordinate of the character on screen.

It took a lot of work, but in the end I was quite happy with the performance, but I'm sure someone is going to find some cycles to save to make it even faster. I have specialized the routines for 2, 3, 4, 5 and 6 byte-wide character images, here's an excerpt that performs reading of the mask, shifting and anding for one pixel row of a sprite that contains a two-byte wide image:

            ; get branch offset for shift array
            ldy ImageXOffset
            lda BlockOffsetToShiftBranchOffset2,y
            sta ShiftBranchOffset

            ; read 3 bytes from mask
            ldy #$00
            lda (MaskBitmapPtr),y
            sta CurrentMask
            
            ldy #$08
            lda (MaskBitmapPtr),y
            sta CurrentMask+1

            ldy #$10
            lda (MaskBitmapPtr),y
            sta CurrentMask+2

            ; shift the mask to align with the sprite
            clv
ShiftBranchOffset = *+1
            bvc ShiftBranchOffset
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0
            asl CurrentMask+2
            rol CurrentMask+1
            rol CurrentMask+0

            txa
            tay
            
            ; mask and store the pixel row
            lda CurrentMask
            and (RomSpriteImagePtr),y
            sta (SpriteImagePtr),y

            iny

            lda CurrentMask+1
            and (RomSpriteImagePtr),y
            sta (SpriteImagePtr),y

The right amount of shift operations is performed by modifying the parameter of the branch instruction. Three mask bytes are required temporarily, even though only two will be used to actually mask the sprite data afterwards.

Here's an early version of the masking code in action, a video I captured on August 9th, 2009:


You can see that the masking is not really precise yet (there's a one pixel offset, because the sprites are positioned on the full 320 pixel horizontal grid), and that it gets quite choppy when the player image is at its maximum width during a jump.

Download a C64 .prg of the build from 09-Aug-2009. Like the previous one, this will require a REU to run.

I wanted to press on now, to get to the point where the level is fully playable. So I entered another long phase of reverse-engineering the disassembled code to understand the parts that deal with animating the dynamic blocks, such as torches, gates, spikes, etc.
From reading the document, I saw that this was all a bit complicated. Starting on page 10 there's an explanation of the "redraw buffers". Those are 30 byte buffers, which contain a redraw count for each block on the screen. Typically these are filled with a value of 2 for a block to make sure it gets redrawn in both frame buffers (since the game uses double-buffering).
It didn't take me very long to find the redraw buffers and the code that deals with them, but it did take quite a while to identify each buffer correctly, because they're all a bit similar:

5E3C:HalfBuf
5E5A:RedBuf
5E78:FRedBuf
5E96:FloorBuf
5EB4:WipeBuf
5ED2:MoveBuf
5EF0:ObjBuf
5F0E:WhiteBuf
5F2C:TopBuf

The simplest redraw buffer is RedBuf. Setting an entry for a block in RedBuf causes the whole block to be redrawn. To get a correct redraw it's often necessary to clear the block first, this is done by using WipeBuf with a certain wipe height (which is stored in WhiteBuf). So, for example, a wipe height of $3f (63 pixels) would wipe the whole block. The reference point is the bottom of the block.
So typically a redraw of a block is triggered by both storing a value in WipeBuf and RedBuf:

            lda #$3f
            sta ImageTableEntry     ; ImageTableEntry is used for wipe height and stored in WhiteBuf

            ; Get the current position of the character and 
            ; convert it to a block index in Y
            jsr convertCharXAndYToBlockIndexInY  

            lda #$02
            jsr storeRedBufValue    ; store into RedBuf at offset Y
            jsr storeWipeBufValue   ; store into WipeBuf at offset Y

It doesn't matter if storeWipBufValue is called after storeRedBufValue, since the order of performing the redraws is fixed, and the wipe will always come before the redraw.

In the end there's one function dealing with applying the redraw buffers and it's called for every block from an iteration loop that is very much like iterateScreen:

1293:iterateScreenForRedraw
164B:redrawBlock

So as expected, redrawBlock has a certain fixed order in which it handles all of the redraw buffers. Wipes come first, then normal redraws, and so on until the last buffer, which is FRedBuf, for redrawing front pieces:

;----------------------------------
redrawBlock:
            lda WipeBuf,y
            beq noWipeNeeded

            sec
            sbc #$01
            sta WipeBuf,y

            jsr wipeBlock
            ldy ScreenUpdateBlockIndex

noWipeNeeded:

            lda RedBuf,y
            beq noRedrawNeeded

            sec
            sbc #$01
            sta RedBuf,y

            jsr switchToBgImageList
            jsr drawBlock
            ldy ScreenUpdateBlockIndex

noRedrawNeeded:

            [...]

            lda FRedBuf,y
            beq noFrontPieceRedrawNeeded

            sec
            sbc #$01
            sta FRedBuf,y

            jsr drawFrontPieceImage
            ldy ScreenUpdateBlockIndex

noFrontPieceRedrawNeeded:
            rts
;----------------------------------

Oh it looks quite simple now, but that took me a while. Many wrong leads that caused me to go off in the wrong direction, but thankfully redrawing calls the same basic drawing functions like drawBlock which I already knew from drawing the initial screen, so that helped in understanding how it all works together.

So as soon as I put in working versions of redrawBlock and iterateScreenForRedraw, I was able to add the code that stores values into the redraw buffers. The first one I tried were the torches.
Those are "transitional objects", so they use MoveBuf to redraw themselves. Most of the trans objects have specialized drawing code, e.g. the gate updates itself by drawing a tiled sequence of gate parts, depending on how high the gate currently is raised. The code for drawing the torches is specialized to only draw the flame itself, skipping the rest of the block.

The starting point for the trans object system is this function:

EE00:animateTransObjs

For each active trans object, it maintains the movement direction (e.g. for gates and spikes), the block index and the screen number, and then calls animateTransObj to update it.

F0FE:animateTransObj

Depending on the object id, this function dispatches to one of many different individual ones:

F194:animateExitTransObj
F1EC:animateGateTransObj
F264:animatePressPlateTransObj
F287:animateJawTransObj
F2DF:animateFlaskTransObj
F302:animateSwordTransObj
F31B:animateTorchTransObj
F354:animateSpikesTransObj
F386:animateLooseFloorTransObj

Their main job is to update the BlueSpec value for their block and store into the redraw buffers to make sure they're updated during the next redraw loop.

It is now late August 2009. I had various redraws working, and I started to also set video and color RAM differently for each block, to introduce a bit of color into my screens. Due to many changes some of the drawing was often broken, so the versions from this time were not really playable. But here's how it looked like with some early flame animations in place:

Some colors to make the new torch flame animations a bit prettier
So that's it for this time. In the next part we're gonna look at bit more at the redrawing of gates and spikes, and why I was suddenly forced to reveal my project to the public.

20 comments:

  1. First of all, I express my respect to you for your great masterpiece, Prince of Persia for C64! I am starwindz, one of PoP reverse engineering members(for Mac Levels) and developer of Prince of Persia 1 Total Pack( http://forum.princed.org/viewtopic.php?f=73&t=875 ). I have serveral questions for your splendid work.

    1. Room-To-Room transition screen enhancement
    Blank screen will be shown in a while when the prince moves to the next room. How about changing some program codes as follows? I expect the showing time of blank screen will be very short by using this technique.
    * BEFORE: Clearing current screen -> Setting up next screen -> Showing next screen
    * AFTER: Setting up next screen -> Clearing current screen -> Showing next screen

    2. More information of level data format
    Could you please let me know more information of level data format for PoP C64? I have seen your previous post(Part Four) and have compared PoP.crt to original PoP1 level data file(levels.dat) using some hex editor. I found that the level data sturcture of C64 and PC are not identitcal. Please me know more information of level data such as starting offset, size of each room data and etc something like following format(PC format).

    Length Offset Block Name
    ----------------------------
    720 0 pop1 foretable
    720 720 pop1 backtable
    256 1440 door I
    256 1696 door II
    96 1952 links
    64 2048 unknown I
    3 2112 start position
    3 2115 unknown II
    1 2116 unknown III
    24 2119 guard location
    24 2143 guard direction
    24 2167 unknown IV (a)
    24 2191 unknown IV (b)
    24 2215 guard skill
    24 2239 unknown IV (c)
    24 2263 guard colour
    16 2287 unknown IV (d)
    2 2303 0F 09 (2319)

    3. Permission of distributing modified PoP.crt file
    I have some plan for adding PoP C64 to PoP1 Total Pack. This means that numeriois fan made levels and random level generator(made by me) can be played in PoP C64. What do you think about that?

    ReplyDelete
  2. Wow, 8K for masking data... But it does make code much much simpler...

    I know game is done and it works perfectly, but here is how I do shifting:

    Depending on X-offset you have 4 routines. (shifting is done in multicolors pixels)

    For offset = 0 you do simple lda and sta.

    For offset = 1

    ldx CurrentMask

    lda shifted_1_right_part,x
    ldx CurrentMask+1
    ora shifted_1_left_part,x

    and (RomSpriteImagePtr),y
    sta (SpriteImagePtr),y
    iny

    lda shifted_1_right_part,x
    ldx CurrentMask+2
    ora shifted_1_left_part,x

    and (RomSpriteImagePtr),y
    sta (SpriteImagePtr),y

    For other offsets you just have different piece of code that has "shifted_2" and "shifted_3" tables.
    "shifted" tables are simple left or right parts of 256 bytes shifted by offset that the table is for.

    This one uses X register, but you can keep its value on zero page anyway.

    This method is much faster (especially in case of shifting by 6 bits) and more consistent in time it takes.

    ------------------------------------
    Hey, great read for Sunday morning!
    Thanks for sharing code once again!!!

    ReplyDelete
  3. @starwindz:

    1. I think you don't quite understand how this works. There's no memory available to set up a new screen in advance, and clearing is not what takes the time. It's drawing the new screen, and that's done while the screen is turned off. You could only show the previous screen longer, but the delay would be the same.

    2. The level data is almost exactly the same. The only differences are in the way the guards and their location/skill/color is handled.

    Here's the Apple II data layout:

    B700:BlueType
    B9D0:BlueSpec
    BCA0:LinkLoc
    BDA0:LinkMap
    BEA0:Map
    BF00:Info
    BF40:KidStartScrn
    BF41:KidStartBlock
    BF42:KidStartFace
    BF46:GdStartBlock
    BF5E:GdStartFace
    BF76:GdStartX
    BF8E:GdStartSeqLo
    BFA6:GdStartProg
    BFBE:GdStartSeqHi

    Subtract 0xb700 to get offsets into the normal data for one level. In the .crt not all the levels are directly after each other, there's some alignment padding to prevent the data from crossing a bank boundary. Also there are bank headers in the .crt, so you need to figure that out. There are 3 levels per bank, and they start at 0x000, 0x0900 and 0x1200 in bank 4:0, 5:0, 6:0, 7:0 and 8:0.

    3. You can modify the .crt if you want and distribute it, but change the texts at 0x6882b in the .crt file:
    The name of the game, and the version number and date at 0x688ab. These are in C64 screen codes, not ASCII, so you need to figure that out too. 01 is A, 02 is B and so on.
    But you have to be aware that you might run into problems caused by code that specifically handles spawning of certain things in certain levels. E.g. the shadow man will appear in levels 4, 5 and 6 in specific screens. You need to avoid those screens. The mirror will spawn in level 4, the mouse in level 8 and so on.
    Make sure you test everything properly, there are no checks, so if you overwrite something else in the .crt, the game will most likely crash.

    ReplyDelete
  4. @Vladimir:

    That was obviously the first idea I had, but it takes a lot of extra memory that I don't have, and banking in/out ROM in the inner loop is not going to make it faster either. The code size explodes a bit when doing this, and I have to keep this code in RAM, because the data I'm reading and masking has to be in ROM. In the end, this was the best option under the constraints.

    ReplyDelete
  5. Very interesting Sunday read again.

    ReplyDelete
  6. Talk about tight fit ha ? :)

    If 6x256 bytes were a problem I bet you had to make lots of decisions like that...

    Never mind that - best programming diary this year for sure!

    ReplyDelete
  7. Couldn't you compress the masking tables?

    It seems to me that most lines are identical to the previous line (or some other line). How about having a table of pointers for the start in memory of each row mask? And then not duplicating those row masks?

    And wouldn't that gain you enough memory that you could keep four versions of the mask image?

    (Maybe use relative pointers so you could share the table between the four shifts?)

    Of course I might be spouting nonsense, given that I have never programmed a Commodore 64 or played PoP much.

    ReplyDelete
  8. @peterfirefly:
    No, that actually sounds like a good idea. Might be doable, but the more complex the solution, the longer it will take to implement. And now I'm not gonna bother anymore...

    ReplyDelete
  9. A flaky wifi connection just ate my last comment. This is an attempt to recreate it.

    ---

    It should be pretty easy to compress the mask image on the fly so you can still piggyback on the drawing code.

    For each line, calculate a checksum. You might be able to get away with something as silly as a 1-byte sum or it might be something slightly more "grown up" (but still cheap) like Fletcher's checksum.

    For each of the already encountered unique rows in the mask image, you store their checksum.

    Search through the list of those checksums (backwards is probably slightly faster since most rows are like the previous row) and do byte-by-byte matching when the checksums match. It is probably going to be very rare that more than one row has a matching checksum.

    After you have compressed the mask image, you generate the shifted mask images easily, since they have the same compression pattern.

    Oh, and the row pointers should maybe be relative pointers.

    ReplyDelete
  10. On the other hand, since the mask generation isn't exactly real-time, you could try something simple first: just do a byte-by-byte comparison with the last row. No checksums, no detection of other repetition patterns.

    ReplyDelete
  11. >And now I'm not gonna bother anymore...

    I perfectly understand that :)

    But maybe somebody else will do that if you publish the source code at some point...?

    ReplyDelete
  12. @mrsid
    Unfortunately I could not figure out the location of the each data block where all level data located in the pop.crt. Several questions for this are shown as follows.

    1. The size of each level data block is 2304?
    (I think the total size of Apple II level layout you mentioned before is not 2304. Is it right?.)

    2. The number of level data block is 15? Demo level is not included?

    3. The offset of the first level data block is 0x100d0 in pop.crt?

    4. What are the offsets of 2nd and other level data blocks? For examples, 2nd offset is 0x?????, 3rd is 0x????? and others.
    I have figured out that the first three level blocks are in sequence but I could not find out the location of other levels.
    I could not understand the concept of memory banks since I do not have much C64 hardware knowledge. Sorry for that.)

    By the way I am developing the prototype of PoP:C64 Launcher in order to load custom level data made by many PoP fans.

    Please see the following screenshot. http://i43.tinypic.com/2brzn6.jpg
    Custom level launcher for PoP:C64 could be usefule for many lovers of PoP:C64 you have developed.

    Thank you.

    ReplyDelete
  13. @starwindz:

    1. Yes, 2304 bytes.

    2. 14 level blocks in total, demo level is not included.

    3: Yes.

    4: Read this: http://ist.uwaterloo.ca/~schepers/formats/CRT.TXT
    Each bank starts with a CHIP header. You have to skip 0x500 bytes after the first three levels to get to the next CHIP header, then skip 8KB past that to get to the next CHIP containing level data. There are always 3 levels per CHIP block, except for the last one.

    You should also be able to find each of the levels by searching for the the first few bytes of the PC data.

    I don't have Windows, so I won't be able to run your launcher.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. @mrsid

    I think I just figured out the offsets of each level banks by using the method and information you told me.
    * 1st bank: 0x100d0 (3 levels)
    * 2nd bank: 0x140f0 (3 levels)
    * 3rd bank: 0x18110 (3 levels)
    * 4th bank: 0x1c130 (3 levels)
    * 5th bank: 0x20150 (2 levels)
    (Total 14 levels which do not have the potion level.)

    There may be some questions after further development of launcher.
    Regarding launcher, I did not know that you do not have Windows but I will send and share the modified version of each pop.crt files which have custom levels to you and other people.

    Thank you.

    ReplyDelete
  16. OK, here's a quick improvement to the masking code:

    Omit the command sta CurrentMask+2,
    and just keep the value of CurrentMask+2 in accu. Then, replace all those asl CurrentMask+2 commands by simple asl commands. The code gets 10 bytes shorter, and uses 27 fewer cycles (assuming CurrentMask+2 was in zero page).

    ReplyDelete
  17. @mrsid
    PoP:C64 Launcher and custom levels pack have been released on lemon64 forum. Current version is 0.2 and please feedback any comments.

    http://www.lemon64.com/forum/viewtopic.php?t=40024&start=0&postdays=0&postorder=asc&highlight=

    ReplyDelete
  18. @mrsid
    I have three questions.

    1. How to skip the levels by using some kind of a key? I am going to test converted custom levels.

    2. Could you please let me know all the differences between C64(Apple) and PC version have been identified? The compatibility issue may be minimized by some conversion algorithm.

    3. When floors fall down, the game speed is slightly slow down. Is this a bug or do you have some plan to improve it?

    ReplyDelete
  19. One query - the occlusion of the kid when he's on or near a left-facing ledge tile. Whilst walking toward the edge, he has to have priority but when climbing (or falling, probably from the opposite side) that same ledge should then have priority.
    So, do you perform such masking based on the kid's animation frame or something?

    Or does the bounds check + relative positions that the animations have just take care of it anyway?

    ReplyDelete