frame pacing: analysis of the game loop


Frame Pacing Analysis

The graph above was generated from Visual Studio’s performance profiler on a custom
game engine. It illustrates the time taken to render each frame, in milliseconds.
From the 7282 frames that were rendered during this collection period, there were
only 36 that did not take exactly 16667μs (60FPS target framerate). Of these 36,
the largest difference was only 88μs.

Frame pacing is the consistency of frame-times for rendered frames in your engine.
If you want your game to feel “smooth” or “fluid”, you should aim to have consistent
frame pacing like the graph above. In this article, I’ll be walking you through
the low level details that you can implement in your own engine.

High Resolution Timing

On Windows you should already be making use of QueryPerformanceCounter and
QueryPerformanceFrequency. Processors can change frequency based on power
state, and on Windows 7 and later, these provide a guaranteed 1μs resolution
time stamp. Example usage of these APIs (without caching frequency):

// Returns the current time in microseconds.
int64_t timer_query_us() {
    LARGE_INTEGER tick, freq;
    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&tick);
    return (tick.QuadPart * 1000000ll) / freq.QuadPart;
}

So we can measure time correctly, next is controlling it.

Burning Time

While most modern games are probably using extra CPU cycles for something useful,
if you have time to spare before your next frame, you’ll need to wait it out. The
simplest method, but most inefficient, is the busy loop:

void timer_limit(i64 us_prev, i64 us_target) {
    for (;;) {
        i64 us_now = timer_query_us();
        i64 us_diff = (us_now - us_prev);
        if (us_diff >= us_target) {
            break;
        }

        YieldProcessor();
    }
}

The added twist here is the use of YieldProcessor,
which makes use of the _mm_pause
intrinsic to help with power efficiency even while we’re burning cycles. But we can do better
than this by utilizing an actual Sleep. Like
the documentation suggests, if we want predictable behavior across all power states,
you need to make use of timeBeginPeriod.

Sleeping Time

struct timer_t {
    timer_t();
    ~timer_t();

    void limit(int64_t us_target);
    int64_t query();

protected:
    void sleep(uint32_t ms);

    LARGE_INTEGER freq;
    int64_t us_start;
    int64_t us_prev;
};

timer_t::timer_t() {
    timeBeginPeriod(1);
    QueryPerformanceFrequency(&freq);

    us_start = query();
    us_prev = us_start;
}

timer_t::~timer_t() {
    timeEndPeriod(1);
}

void timer_t::limit(int64_t us_target) {
    int64_t us_now = query();

    if (us_target > 0) {
        for (;;) {
            us_now = query();

            int64_t us_diff = (us_now - us_prev);
            if (us_diff >= us_target) {
                break;
            }

            int64_t us_sleep = (us_target - us_diff);
            if (us_sleep > 2000) {
                uint32_t ms = (us_sleep - 2000) / 1000;
                sleep(ms);
            } else {
                sleep(0);
            }
        }
    }

    us_prev = us_now;
}

int64_t timer_t::query() {
    LARGE_INTEGER tick;
    QueryPerformanceCounter(&tick);
    return (tick.QuadPart * 1000000ll) / freq.QuadPart;
}

void timer_t::sleep(uint32_t ms) {
    if (ms == 0) {
        YieldProcessor();
    } else {
        Sleep(ms);
    }
}

The above combines everything discussed above. The timer has been made into
a class to deal with setting/restoring the timer frequency (since it’s a global
setting on Windows), and so that we can cache the frequency and save a few hundred
cycles from calling into the kernel an extra time for each query.

Since the Sleep only has roughly a resolution of 1ms, I reverse a buffer of 2ms from the target frametime that
is just spent busy looping with YieldProcessor. The result is consistent frame pacing with low CPU usage for
games that have cycles to spare.

Vertical Sync

Controlling frametime using V-sync mostly works. In my experience, most frames
will be within 500μs of the monitor’s target refresh rate. The variation isn’t too
noticeable as long as you consistently spike above it with long running tasks (double buffer setup, standard on PC). But if your application is running
under the Windows compositor, using V-sync can add some unfortunate latency to your
game:

Mode V-Sync Latency
Composited Off 22.32ms
Composited On 49.10ms
Exclusive Off 3.42ms
Exclusive On 66.38ms

The data for the table are averages derived from PresentMon reports on a 60Hz monitor.

The Game Loop

With a suitable method for both measuring and controlling the framerate, we
can provide the game loop with correct timing information at a consistent pace.
As a preface, don’t store time in a float. If
you need to work in seconds, use a double. This is
something that even commercial engines like Unity still don’t get right.

void game_loop() {
    // startup...

    constexpr int64_t tick_rate = 16667;
    constexpr double dt_fixed = tick_rate / 1000000.0;

    int64_t tick_frame = 0;
    int64_t us_prev = 0;

    while (running) {
        int64_t us_now = timer.query();

        // update input related subsystems...
        // window.update();
        // gamepad.update();
        // etc...

        // determine the tick we're allowed to run up to
        int64_t tick_last = us_now / tick_rate;

        for (; tick_frame < tick_last; ++tick_frame) {
            // update game state that's dependent on tickrate, like physics
            game.fixed_update(dt_fixed);
        }

        // calculate real delta time
        int64_t us_delta = us_now - us_prev;
        us_prev = us_now;

        double dt = us_delta / 1000000.0;

        // update game state that isn't tied to tickrate
        game.update(dt);

        // calculate the interpolation factor
        int64_t us_interp = us_now - (tick_frame * tick_rate);
        double interp = us_interp / (double)tick_rate;

        // render the world - be careful...
        game.render(dt, interp);

        // render the frame...
        // render.present();

        timer.limit();
    }

    // cleanup...
}

Unlike the previous code samples, this one is a little more “abstract”. This implements
a game loop for a mixed variable/fixed timestep like the one Unity has. There are
good reasons to have physics on
a fixed timestep, but also to leave elements that don’t depend on frame timing (like the GUI)
on a variable timestep for performance.

The first and last functions in your game loop should be related to timing information - you
can even merge these into one function if you want. That way, variation in updating
input related subsystems won’t affect timing information for your game loop.

The next point of interest is using the real query time instead of an accumulator. This may seem like a minor point, but it’s important to help prevent “tick drift” from occurring
during longer play sessions.

Like any fixed timestep implementation, you should be interpolating between the game’s
previous state and current state for physically based entities. This is critical for
ensuring your game entities look smooth when entities are moving around.

Fixed Update Rate

The target frametime really matters when it comes to monitors that are only 60Hz. You should
be choosing your fixed update rate with a frametime value that matches your refresh rate, like 16667μs.
For example, targeting a frametime of 10000μs (increasing the fixed update rate) is something you
should avoid. The fixed update will frequently tick twice during a single frame,
but not consistently. Consistency is key.

Closing Thoughts

Control the rate at which frames render in your engine. Picking the target frametime for
both rendering and your fixed updates is important, and you should be profiling
your application to ensure it achieves them.

This is something that consumers care about, and will feel while playing your game.