View on GitHub

FOnline Engine

Flexible cross-platform isometric game engine

Exception Safety & Engine-Invariant Stability

How the engine stays in a consistent state when an exception is thrown, and the rules to keep it that way. The concern this doc answers: an exception thrown partway through a multi-step state mutation (entity create/register/destroy/invalidate, cross-entity links, persistence) must never leave the running process in a half-mutated state whose only recovery is a restart.

1. Out-of-memory is not a recoverable error

Engine memory comes from two paths that terminate the process on exhaustion instead of throwing:

Consequence — do not write rollbacks for allocation. A code path whose only failure mode is an allocation cannot leave a half-mutated world: it either completes or the process dies deterministically at the failing allocation (which is the fail-fast outcome we want). Treat std::bad_alloc as “can’t happen”; never add scope_exit/scope_fail undo logic justified solely by a possible container/string/refcount allocation between two mutations.

(The throwing global operator new still exists for a bare new T / a std::allocator; engine code should not rely on it for OOM behavior — prefer SafeAlloc/engine containers so OOM terminates rather than propagates.)

2. What can actually be thrown, and where it is caught

After §1, the exceptions that can propagate through a running server are:

The server runs all gameplay work as jobs on a WorkerPool. WorkerPool::WorkerEntry (Source/Server/WorkerPool.cpp) wraps each job body in try { job() } catch (const std::exception&) { ReportExceptionAndContinue (log only) } catch (...) { FO_UNKNOWN_EXCEPTION -> terminate }. So a std::exception from a job does not crash the server — it is logged, the job’s SyncContext is released (locks dropped), and the worker proceeds. Whatever half-state existed at the throw point stays live in the world. This is exactly why the rules below matter.

Scripted-event fan-out (Game.OnX.Fire(...), entity->OnX.Fire(...)) is noexcept: Entity::FireEvent swallows per-callback exceptions and converts a throwing/StopChain callback into a chain stop. So OnX.Fire itself never propagates — but a handler’s side effects (it may destroy or relocate entities) do persist, and the engine re-validates after firing.

3. The entity-lifecycle contract (create / destroy)

Entity lifecycle is deliberately not “transactional rollback”. The contract, encoded by the tests in Source/Tests/Test_EntityLifecycle.cpp and Source/Tests/Test_ServerMapOperations.cpp, is:

Create (CritterManager::CreateCritterOnMap, ItemManager::CreateItem, MapManager::CreateLocation/CreateMap, ServerEngine::CreateCritter/LoadCritter):

Destroy (DestroyCritter/DestroyItem/DestroyLocation/DestroyMap/DestroyCustomEntity):

3.1 Destruction-loop convergence (why a teardown can’t spin forever)

Each DestroyX runs a teardown loop for (size_t prev_deps = std::numeric_limits<size_t>::max(); <holder still has children/links>;) { try { detach steps } catch { ReportExceptionAndContinue } /* inline progress guard, §3.1 */ }, converging by emptying the holder’s child collections / links. Eight such loops exist: ItemManager::DestroyItem, CritterManager::DestroyCritter + DestroyInventory, MapManager::DestroyMapContent + DestroyMapInternal + the DestroyLocation inner-entity loop, and EntityManager::DestroyInnerEntities + the DestroyCustomEntity inner-entity loop.

A loop can fail to converge for three reasons:

  1. Re-entrant re-add (the realistic cause). The teardown fires Finish events and recursively destroys children (more events) and transfers critters (more events). A script handler reacting to a child’s destruction can add a new child to the holder being destroyed — e.g. an OnItemFinish handler that puts a fresh item back on the same critter/map/container, or an inner-entity-destroy handler that creates a new inner entity. If it re-adds on every child-destroy, the collection refills as fast as it drains.
  2. Persistent-throw detach step. A detach step throws on every iteration (caught, logged, retried), so the collection never shrinks — a genuine bug in that step.
  3. No-progress logic bug. A detach step succeeds but does not reduce the loop condition.

Exit mechanism

Primary — block re-add during destruction (eliminates cause 1), echeloned across tiers (§5). A re-entrant re-add originates from a script handler (Finish/destroy events run scripts), so the same “you may not add a child to a destroying entity” rule is enforced at two depths — caught as early as possible, and re-checked deeper as a backstop:

With these, a destroying holder cannot gain new children, so the teardown loop is monotone. Note: only adding children is forbidden — modifying a destroying entity (writing its properties) stays allowed, because a Finish/destroy handler legitimately mutates the entity it is cleaning up (and the engine passes the destroying entity into its own handlers).

Backstop — terminate on genuine non-convergence (causes 2 and 3). Each teardown loop carries a local size_t prev_deps (seeded to std::numeric_limits<size_t>::max()) and, at the end of every pass, recomputes the entity’s current remaining-dependency count (e.g. cr->GetInvItems().size() + cr->GetInnerEntitiesCount() + …) and asserts a strict monotonic decrease:

for (size_t prev_deps = std::numeric_limits<size_t>::max(); cr->HasItems() || cr->HasInnerEntities() || ;) {
    try { /* tear off one layer */ } catch (const std::exception& ex) { ReportExceptionAndContinue(ex); }

    const size_t remaining_deps = cr->GetInvItems().size() + cr->GetInnerEntitiesCount() + ;
    FO_STRONG_ASSERT(remaining_deps < prev_deps, "Critter destruction made no progress", cr->GetId(), remaining_deps, prev_deps);
    prev_deps = remaining_deps;
}

If a pass does not reduce the count (the loop stalled, or a re-add grew it), FO_STRONG_ASSERT terminates. The loop’s boolean condition is exactly equivalent to remaining_deps != 0, so a real drain ends the loop before the next check — a slow-but-progressing teardown is never falsely killed; only a stall is. This is written inline at each destruction loop (no shared helper class) so the dependency count is defined right next to the loop it guards. The earlier throw-on-overflow escaped to the WorkerPool catch and left a half-destroyed, un-finalized “undead” entity in the registry until restart; terminating surfaces the underlying bug deterministically. (Every such progress-guarded loop is a destruction loop, where non-progress is always corruption.)

The reaction is FO_STRONG_ASSERT (terminate), not stop-and-throw (SetDestroying(false) + throw): by the time the loop spins, the entity’s Finish event has already fired, so a “recovered” entity would be alive-but-finalized — a worse corruption than a deterministic crash on the real bug.

Net guarantee: a destruction either completes, or the process terminates deterministically on a real bug — it never silently leaves an undead entity.

4. Post-mutation invariants → FO_STRONG_ASSERT

This is the unexpected-and-unhandled tier (§5). A verify placed after an irreversible mutation that checks an invariant which “can only be false if the world is already corrupt” is FO_STRONG_ASSERT (deterministic ReportExceptionAndExit), not a throwable verify: there is nothing left to handle, and a throwable verify would just be caught by the WorkerPool and keep running on corrupt state. (The same invariant may still be re-checked earlier, where it is recoverable, as a throw or FO_VERIFY_AND_THROW — that is the echeloned model in §5, not a contradiction.)

Such invariants must abort in every build. FO_STRONG_ASSERT (Source/Essentials/ExceptionHandling.h) is unconditional — it always evaluates its condition and calls ReportExceptionAndExit on failure, regardless of build profile — so it is the correct tool for load-bearing post-mutation invariants.

Examples converted to FO_STRONG_ASSERT (all post-mutation, can’t-happen-with-correct-code):

The legitimate non-invariant throws are left as FO_VERIFY_AND_THROW: e.g. EntityManager::RegisterEntity’s global-registry insert (a duplicate id from a corrupt persisted record is handled gracefully on the load path via is_error), and EntityLock::Acquire’s EntityLockWaitAbortedException (the legitimate shutdown-abort path).

5. Error tiers & choosing the disposition

Three tiers, classified by whether an error is expected and whether it is handled. All three stay active in every build, including release — we accept that our own bugs can surface only under real, loaded production conditions, beyond what development and testing catch, so we never compile out the lower tiers.

Tier When Mechanism
Expected error A condition we anticipate and design around — bad script arguments, malformed/untrusted input, a target legitimately full/absent/forbidden. Not an invariant violation. throw a domain exception (ScriptException, DataBaseException, …). Caught upstream and turned into a normal failure result; no side effect when it fires before any mutation.
Unexpected but handled An invariant violation — “can’t happen if the code is correct” — that we nonetheless re-check and catch rather than trust blindly. Assert semantics, not expectation. The FO_VERIFY_* family (FO_VERIFY_AND_THROW, FO_VERIFY_AND_CONTINUE, FO_VERIFY_AND_RETURN, FO_VERIFY_AND_RETURN_VALUE). All report the violation (log + stack trace) and are never compiled out; the variant only chooses the post-report control flow — see “Picking the variant” below.
Unexpected and unhandled An invariant violation reached where it is too late to recover — continuing would run on already-corrupt state and no caller can sensibly handle it. FO_STRONG_ASSERT — deterministic ReportExceptionAndExit (terminate). Also always-on.

Picking the FO_VERIFY_* variant. The tier is the same for all of them — an invariant violation, always reported — only the continuation after the report differs, and it is dictated by the control-flow context, not preference. In every case you get the report; what matters is how execution proceeds:

Rule of thumb: if you are (or might be) inside a noexcept region you cannot use FO_VERIFY_AND_THROW — pick CONTINUE / RETURN / RETURN_VALUE so the report is emitted and the caller is left in a defined state. (FO_STRONG_ASSERT is also valid from a noexcept region, when the right response is to terminate rather than continue.)

Echeloned (defense-in-depth) detection. The same underlying error may be caught by more than one tier, at different depths — and that is intentional: the path from a high-level entry point to the low-level mutation varies, so we catch it as early as possible without removing the deeper backstops. Catching an error earlier is always better than later, and there can never be “too much” error protection — so the tiers are additive, not either/or. Canonical example, “a script adds a child to an entity that is being destroyed”:

  1. throw at the top — the FO_SCRIPT_API add-method rejects with ScriptException (expected script misuse, caught early).
  2. FO_VERIFY_AND_THROW below — the internal mutation method (CritterManager::AddItemToCritter, Map::AddCritter, Entity::AddInnerEntity, …) re-checks the same IsDestroying invariant, in case a call slipped past the top guard.
  3. FO_STRONG_ASSERT at the bottom — if a re-add nonetheless stalled a teardown loop (a pass that removes no dependency), the loop’s inline strict-monotonic-decrease check (§3.1) terminates (too late to undo).

Per-situation guidance:

Persistence note: DB writes (DbStorage.Insert/Update/Delete) are enqueue-only and never perform synchronous backend I/O; a backend failure is handled asynchronously (recovery op-log + reconnect + panic-shutdown with replay on restart). So a DB write cannot throw a backend error mid-mutation — no write-through rollback is needed. See Source/Server/DataBase.cpp.

6. Primitives

7. Tests

The contracts above are pinned by Source/Tests/Test_EntityLifecycle.cpp (init/finish-event destruction, load/unload events, registry collections) and Source/Tests/Test_ServerMapOperations.cpp (the full map critter in/out/init × may-destroy / may-move-away / may-destroy-map matrix). When changing any lifecycle or invariant behavior, run LF_UnitTests and extend these suites rather than weakening an assertion.