Explore GameReplays...

Rise of the Witch King

684 users online in the past 15 minutes
677 guests and 7 members
No streams are active
LATEST VIDEOS ON DEMAND
QUICK LINKS

IPB Image


IPB Image


IPB Image

SOCIALISE WITH US!
Get involved with Rise of the Witch-King on Social Media!

Like us on Facebook & Follow us on Twitter to make sure you never miss out on the latest updates!


IPB Image IPB Image
HOT FORUM TOPICS
TEAMSPEAK 3 SERVER

All Things Delay

By Danetta - 2nd February 2024 - 13:54 PM


IPB Image


IPB Image



What you're about to witness is bleeding-edge technology, full of dirty bandaids and unconfirmed theories.
But if you're interested in all-things-delay and feel that you're mighty enough to improve it then buckle up.
You need some basic understanding of the nature of the delay tho, but you shall find it elsewhere.




Here is the illustration of the delay. Left-side is how the host sees the game, and right side is for off-host (client with the delay).



IPB Image



IPB Image



IPB Image



Introduction



As you may have guessed, the most important address is 0xD9F608, more commonly known as logic tick rate (LTR next), which equals 5. The game runs at 30 FPS, so each logic frame "contains" 6 render frames.

If you place a breakpoint somewhere on both client and server in a way that it stops on every frame, you will notice that client tries to catch up if it still has buffered frames. However, once buffer runs out, it will skip all the indices up to "6" before rendering another frame. This is the delay in action. The engine is capable of rendering everything on the client side in sync with the server, it just waits for the buffer to fill in order to start doing so. Yes, it is tedious to debug. I literally used two PC's and pressed "step over" on two keyboards like a snake in a matrix. Wow. Such engineering.

Anyway. One peculiar detail is that if you set logic tick rate to "10", the delay disappears.As the game speed is tied to the logic tick rate and FPS, this results in a game being sped up by a factor of two. But still, very interesting. Previously there were attempts to make BFME run at 60 FPS, but it had the same problem - you needed to slow everything down back to normal speeds, which isn't trivial.

However, I don't think it's a coincidence, not even in mathematical sense, that there is no delay when running the game at 10 LTR. The devs were setting up the delay-thing specifically for the configuration that they were plannig to release the game in. Keep in mind, that C&C: Generals (+Zero Hour), which uses a tiny little bit older version of the same engine, doesn't have a delay. (I strongly advise you to look into C&C: Generals code, there is even a version with symbols; and with knowledge that lies there you may be able to improve the existing fix; just look into places that calculate frame sync based on logic tick rate). So I think that in general BFME is supposed to run without delay, but devs added some checks that enable delay specifically for the default parameters (30 FPS, 5 LTR).
If you look into code long enough, you will notice these details too.

But let's return to logic tick rate at 0xD9F608. The value is used all over the place. I'm writing from memory, but there should be 30-40 places with direct access. Most popular pattern is division of FPS by LTR, 30/5, which results in 6. It is our favourite number now. And most of the time, after the division is performed, the result is directly compared with 6, and some subroutines are usually skipped.



Details




But enough with the theory. Let's dive into details.

CODE
static constexpr u32 offset_fix_delay     = 0x00632A9B;
static constexpr u32 offset_fix_animation = 0x00632535;
static constexpr u32 offset_fix_frames    = 0x00eca400;
static constexpr u8 code_delay[]          = { 0xB8, 0x02, 0x00, 0x00, 0x00, 0xEB, 0x19 };
static constexpr u8 code_animation[]      = { 0xF7, 0x3D, 0x00, 0xA4, 0xEC, 0x00 };
static constexpr u8 code_frames[]         = { 0x08 };


Here is our little patch. Keep in mind, it is written in a hurry, in anticipation of finally beating the delay, so the code is not ideal.
First of, the most important address is 0x00632A9B.

Code there divides FPS by LTR, which results in 30 / 5 = 6.
And then { cmp ecx, 06 }, { jnl 0x632B03 } always has the same result.
Further routines are skipped and we are branching to the code that pushes current logic frame. Or something like that.
There is also some extra counting involved. Like accumulating render frames (up to 6), before full logic frame is prepared. Or in reverse.
Lots of what I write doesn't make sense (and it was more than a year since I worked on it, so it's somewhat hazy), but when you start investigating, you will notice all those division and modulo operations for frames specifically, and it will start clicking for you.

Anyway. If you just change the check at cmp/jnl part, it won't make any difference. Because they disabled this part intentionally, because it doesn't make sense to run this code with [30 / 5 = 6] as a parameter. It gives the same result, which is "making synchronization with 6 frames buffer".

In other words:
The check against of "30/5" against "6" avoids unnecessary frame computations.
Previously other engineers tried to disable this check to make it so it always goes to extra frame computation routine.
However, it wasn't enough, because the result "6" is still used in further computations.
So we needed to change this argument, otherwise it doesn't make sense to call the subroutine.

I didn't really dive into details of computation that happens if we don't jump at { jnl 0x632B03 }.
It produces some extra numbers that go into partial sync, but as I said, it doesn't change anything.
So I made a dirty patch, replacing { F7 3D 08 F6 D9 00 8B } with { B8 02 00 00 00 EB 19 } (code_delay[] above)
So instead of doing { cmp ecx, 06 }, { jnl 0x632B03 }; we just do { mov eax, 2 }, { jmp 0x00632ABB }.
Just using the constant and jumping further.Then eax gets incremented (to 3), then some magic happens.
It affects the way the synchronization works, as previously just removing the branch didn't change anything.
So now instead of "sync every 6 frames", it works like (or at least it looks like it works this way) "sync every frame".
Or is it "every other frame"? I don't know, but during my tests using two machines on LAN it looked like there is zero delay.
Even when doing frame by frame comparison. However, the way the code looks, I think I didn't actually configure delay-buffer correctly.
Instead, it looks like incorrect configuration of the delay-buffer, which results in engine continuously feeding on buffer until it is empty.
Which is also what we want to do, because there are perfectly valid logic frame in a buffer.



But why?




Long story short, the above part is just bleaky memories of previous research, intended to be more like a bedtime story.

To summarize:
1. Look at 0x00632A9B and the code after that long enough.
2. Look at the math there and two closest dynamic dispatch calls before { push "logic" }.
3. Then ideally look at C&C: Generals.
Something is there and it is much more simple than what I described.

Done. No delay. Goblin Builder running in circles around fortress in perfect sync on both machines. Insane. We did it. It may look like these builders aren't in perfect sync, but that's just a montage not being perfect, they are actually in perfect sync



IPB Image


IPB Image


IPB Image






Is it done? It is not




But the game barely works now. Specifically, some timings are broken. Remember, we bruteforced the delay-buffer by feeding it incorrect data about frames, but all other parts of the game are still trying to keep the 30 FPS / 5 LTR loop.And it results in broken animations, but also some other things. An easy way to test animations is by ordering a builder to run in a loop (by using ALT-command).If you do it, and then "fix" the delay in runtime - the builder keeps normal animations forever. Their wheels and legs are moving normally. However, if you order another builder to do the same - it will have broken animation. As does almost everything else.



IPB Image


IPB Image


IPB Image




Now, it looks like proper changes at 0x00632A9B should be enough. Either I'm wrong and it is not enough, or we didn't make proper changes.
But it's good enough for now, and if you want to adapt the fix from RotWK to BFME2 and BFME1, it should be easy enough to find needed offsets in their respective binaries, I believe code is the same there.

This is where I arrived at the trifecta

Delay
Animations
Speed


I could fix any 2 of those things, but never 3. Again, time for bandaids.
Another offset is at 0x00632535. Nothing interesting here, the same old FPS divided by LTR.
Unfortunately, this is where our tale comes to an end. No research done, pure bruteforce.

First, place a constant 0x08 at address 0xECA400 just for an easier division patch. Yes, dirty. Yes, good enough.
Then at 0x00632535 we are replacing { idiv [00D9F608] (5) } with { idiv [0xECA400] }; as we just put (8) there.
Patched bytes: { F7 3D 08 F6 D9 00 } -> { F7 3D 00 A4 EC 00 }

Essentially, we are replacing division by 5 with division by 8. And 30/8=3. And 3 is our magic number (instead of 6), that fixes our animations. Well, they aren't quite right. They aren't in sync. If you ask me, it looks like we skip every other frame. Our magic patch at 0x00632A9B is probably to blame. But even if animations are not quite correct, they now work and are good enough for competitive gaming. And there is no delay. Interestingly, having no delay is sometimes worse, because 1 second buffer is a lifesaver during multiplayer matches with bad PCs or bad internet. And there is no such buffer now. So, unless the delayfix solution is somehow perfected, it would be handy to have some sort of flag introduced into the game. Delay configuration doesn't result in "Out of Sync" errors, so every client could enable and disable delayfix as they desire, possibly even at any point during the multiplayer game. This shouldn't be too hard to implement.



Now, to finalize




1. Replace bytes at offset 0x00632A9B with { 0xB8, 0x02, 0x00, 0x00, 0x00, 0xEB, 0x19 };
2. Replace bytes at offset 0x00632535 with { 0xF7, 0x3D, 0x00, 0xA4, 0xEC, 0x00 };
3. Replace bytes at offset 0x00ECA400 with { 0x08 };

If you do it in runtime, then do the 3 before the 2, otherwise division by zero incoming. Or replace it with the proper code that doesn't put magic bytes at the end of the binary. This should also give you enough info to produce the same fix for BFME1 and BFME2.
Otherwise, with some work you can make RotWK game.dat play nicely with BFME2, as some patches already do as far as I'm aware.
And if you want to continue investigation -- concentrate your efforts at 0x00632A9B and don't forget about Command&Conquer binary, as it has no delay.



Afterword




I know the delayfix isn't perfect, but it's the best I could do. Really. It's dreamy even. Romantic. I will probably never forget the time when I looked into those bytes formany months and cursed Electronic Arts. Even in current state, I never thought anything will come to existence. It was a monumental challenge, and the delayfix that we have is the result of pure luck and "some" determination.You're a wonderful community and I'm thankful for being a part of it.

I have the opportunity to finally thank everyone involved. All the reverse-engineers that investigated the delay for many years before me, together with me, and after me. I couldn't have done it without you.



Special Thanks






IPB Image





Mjstral, Solas, mkow, tomsons26, ezimmermann, iamtheliquor, SpecialGuest, Excelsior, Talos, nameless, Mr.Smokk, Maru, MayShadowFax, Eternal, Setochka and many many others. Some for their direct involvement and some for entertainment along the way. And of course Brabox, for countless amounts of paper scribbled with various graphs. And some more graphs. And some more paper. You were the first to try and properly describe what the hell is happening within the delay guts. Thank you RotWK 2.02 community, and the whole BFME community. Also saying hello to those pesky little trees around Men's Fortress in BFME2.

Thank you and good luck! And may the delay (not) be with you!

Download game.dat here
- This one is based on 2.02, so it includes armorfix; for other versions please patch it yourself or reach me
- Please remember that this comes with the trade-off of broken animations!


~Danetta