Exception Safety & Engine-Invariant Stability
How the engine stays in a consistent state when an exception is thrown, and the rules to keep it that way. The concern this doc answers: an exception thrown partway through a multi-step state mutation (entity create/register/destroy/invalidate, cross-entity links, persistence) must never leave the running process in a half-mutated state whose only recovery is a restart.
1. Out-of-memory is not a recoverable error
Engine memory comes from two paths that terminate the process on exhaustion instead of throwing:
SafeAlloc::MakeRefCounted/MakeRaw/MakeUnique/MakeRawArr(Source/Essentials/MemorySystem.h): nothrownew, then drain a fixed backup pool and retry, thenReportAndExit. Every entity (Item/Critter/Map/Location/CustomEntity/…) is allocated this way.SafeAllocator<T>(MemorySystem.h) backs every engine container alias —vector,small_vector,unordered_map/unordered_set,map/set,list,deque,string/stringstream(Source/Essentials/Containers.h). Itsallocate()is nothrow + backup-pool retry +ReportAndExit. Soemplace,emplace_back,insert,resize,reserve, rehash, complex-property storage growth, and string growth on engine containers cannot throwstd::bad_alloc— they terminate at the allocation site.
Consequence — do not write rollbacks for allocation. A code path whose only failure mode is an allocation cannot leave a half-mutated world: it either completes or the process dies deterministically at the failing allocation (which is the fail-fast outcome we want). Treat std::bad_alloc as “can’t happen”; never add scope_exit/scope_fail undo logic justified solely by a possible container/string/refcount allocation between two mutations.
(The throwing global operator new still exists for a bare new T / a std::allocator; engine code should not rely on it for OOM behavior — prefer SafeAlloc/engine containers so OOM terminates rather than propagates.)
2. What can actually be thrown, and where it is caught
After §1, the exceptions that can propagate through a running server are:
VerificationExceptionfromFO_VERIFY_AND_THROW(cond, …)/ the scriptverify(…)macro.- Engine exceptions:
EntitySyncException,DataBaseException,GenericException, manager exceptions, etc. - Exceptions raised from an AngelScript callback that runs directly in native code (init scripts via
CallInit), and the post-event re-validationFO_VERIFY_AND_THROWchecks.
The server runs all gameplay work as jobs on a WorkerPool. WorkerPool::WorkerEntry (Source/Server/WorkerPool.cpp) wraps each job body in try { job() } catch (const std::exception&) { ReportExceptionAndContinue (log only) } catch (...) { FO_UNKNOWN_EXCEPTION -> terminate }. So a std::exception from a job does not crash the server — it is logged, the job’s SyncContext is released (locks dropped), and the worker proceeds. Whatever half-state existed at the throw point stays live in the world. This is exactly why the rules below matter.
Scripted-event fan-out (Game.OnX.Fire(...), entity->OnX.Fire(...)) is noexcept: Entity::FireEvent swallows per-callback exceptions and converts a throwing/StopChain callback into a chain stop. So OnX.Fire itself never propagates — but a handler’s side effects (it may destroy or relocate entities) do persist, and the engine re-validates after firing.
3. The entity-lifecycle contract (create / destroy)
Entity lifecycle is deliberately not “transactional rollback”. The contract, encoded by the tests in Source/Tests/Test_EntityLifecycle.cpp and Source/Tests/Test_ServerMapOperations.cpp, is:
Create (CritterManager::CreateCritterOnMap, ItemManager::CreateItem, MapManager::CreateLocation/CreateMap, ServerEngine::CreateCritter/LoadCritter):
- The entity is allocated and registered first, then placed, then its init script and entry events run.
- A lifecycle event (or init script) may legitimately destroy the new entity, relocate it (e.g. an
OnMapCritterInhandler attaches and transfers the critter to another map), or leave it where created. - The create function throws as a signal that the requested operation did not complete nominally — it does not roll the entity back. The entity is left in whatever valid state the events produced:
- event destroyed it → the entity is gone and the registry is back to baseline (
ItemInitEventMayDestroyItem,CritterInitEventMayDestroyCritter,LocationInitEventMayDestroyLocation); - event relocated it → the entity survives at its new location (
MapAddCritterEventMayMoveCritterAwayThrows,MapAddCritterInitEventMayMoveCritterAwayThrowsassert the critter is alive on the destination map after the throw); - load event destroyed a loaded critter →
LoadCritterthrows and the critter is gone (CritterLoadEventMayDestroyLoadedCritterThrows).
Because “relocated and surviving” is indistinguishable from “leaked” at the throw site, a blanket create-time rollback is incorrect — it would destroy entities the design intends to keep. Do not add one.
- event destroyed it → the entity is gone and the registry is back to baseline (
Destroy (DestroyCritter/DestroyItem/DestroyLocation/DestroyMap/DestroyCustomEntity):
MarkAsDestroying()latches first; a redundant call early-returns.- The finish event fires, then the environment tear-off runs inside a
for (size_t prev_deps = std::numeric_limits<size_t>::max(); …) { try { … } catch (ReportExceptionAndContinue); /* progress guard */ }loop so a misbehaving teardown step is retried until the entity is fully detached (or the inline progress guard, §3.1, terminates on non-convergence), and a finish-event handler cannot take over or block the destruction (ItemFinishEventCannotTakeOverItemDestruction, and the critter/location variants). - Snapshot iteration sets with
copy_hold_ref(...)and re-checkIsDestroyed()after each event before continuing to use a retained reference.
3.1 Destruction-loop convergence (why a teardown can’t spin forever)
Each DestroyX runs a teardown loop for (size_t prev_deps = std::numeric_limits<size_t>::max(); <holder still has children/links>;) { try { detach steps } catch { ReportExceptionAndContinue } /* inline progress guard, §3.1 */ }, converging by emptying the holder’s child collections / links. Eight such loops exist: ItemManager::DestroyItem, CritterManager::DestroyCritter + DestroyInventory, MapManager::DestroyMapContent + DestroyMapInternal + the DestroyLocation inner-entity loop, and EntityManager::DestroyInnerEntities + the DestroyCustomEntity inner-entity loop.
A loop can fail to converge for three reasons:
- Re-entrant re-add (the realistic cause). The teardown fires Finish events and recursively destroys children (more events) and transfers critters (more events). A script handler reacting to a child’s destruction can add a new child to the holder being destroyed — e.g. an
OnItemFinishhandler that puts a fresh item back on the same critter/map/container, or an inner-entity-destroy handler that creates a new inner entity. If it re-adds on every child-destroy, the collection refills as fast as it drains. - Persistent-throw detach step. A detach step throws on every iteration (caught, logged, retried), so the collection never shrinks — a genuine bug in that step.
- No-progress logic bug. A detach step succeeds but does not reduce the loop condition.
Exit mechanism
Primary — block re-add during destruction (eliminates cause 1), echeloned across tiers (§5). A re-entrant re-add originates from a script handler (Finish/destroy events run scripts), so the same “you may not add a child to a destroying entity” rule is enforced at two depths — caught as early as possible, and re-checked deeper as a backstop:
- Top —
throw(expected script misuse): everyFO_SCRIPT_APIadd-method (Server_Map_AddItem/Server_Map_AddCritter,Server_Critter_AddItem/Server_Critter_AttachToCritter,Server_Item_AddItem,Server_Location_AddMap) rejects an add to anIsDestroyingentity with aScriptException. - Below —
FO_VERIFY_AND_THROW(invariant re-check): the internal mutation methods (Entity::AddInnerEntity,CritterManager::AddItemToCritter,Map::AddCritter/Map::SetItem,Item::SetItemToContainer,Critter::AttachToCritter,Location::AddMap) re-check the sameIsDestroying/IsDestroyedinvariant in case a path slipped past the top guard.
With these, a destroying holder cannot gain new children, so the teardown loop is monotone. Note: only adding children is forbidden — modifying a destroying entity (writing its properties) stays allowed, because a Finish/destroy handler legitimately mutates the entity it is cleaning up (and the engine passes the destroying entity into its own handlers).
Backstop — terminate on genuine non-convergence (causes 2 and 3). Each teardown loop carries a local size_t prev_deps (seeded to std::numeric_limits<size_t>::max()) and, at the end of every pass, recomputes the entity’s current remaining-dependency count (e.g. cr->GetInvItems().size() + cr->GetInnerEntitiesCount() + …) and asserts a strict monotonic decrease:
for (size_t prev_deps = std::numeric_limits<size_t>::max(); cr->HasItems() || cr->HasInnerEntities() || …;) {
try { /* tear off one layer */ } catch (const std::exception& ex) { ReportExceptionAndContinue(ex); }
const size_t remaining_deps = cr->GetInvItems().size() + cr->GetInnerEntitiesCount() + …;
FO_STRONG_ASSERT(remaining_deps < prev_deps, "Critter destruction made no progress", cr->GetId(), remaining_deps, prev_deps);
prev_deps = remaining_deps;
}
If a pass does not reduce the count (the loop stalled, or a re-add grew it), FO_STRONG_ASSERT terminates. The loop’s boolean condition is exactly equivalent to remaining_deps != 0, so a real drain ends the loop before the next check — a slow-but-progressing teardown is never falsely killed; only a stall is. This is written inline at each destruction loop (no shared helper class) so the dependency count is defined right next to the loop it guards. The earlier throw-on-overflow escaped to the WorkerPool catch and left a half-destroyed, un-finalized “undead” entity in the registry until restart; terminating surfaces the underlying bug deterministically. (Every such progress-guarded loop is a destruction loop, where non-progress is always corruption.)
The reaction is FO_STRONG_ASSERT (terminate), not stop-and-throw (SetDestroying(false) + throw): by the time the loop spins, the entity’s Finish event has already fired, so a “recovered” entity would be alive-but-finalized — a worse corruption than a deterministic crash on the real bug.
Net guarantee: a destruction either completes, or the process terminates deterministically on a real bug — it never silently leaves an undead entity.
4. Post-mutation invariants → FO_STRONG_ASSERT
This is the unexpected-and-unhandled tier (§5). A verify placed after an irreversible mutation that checks an invariant which “can only be false if the world is already corrupt” is FO_STRONG_ASSERT (deterministic ReportExceptionAndExit), not a throwable verify: there is nothing left to handle, and a throwable verify would just be caught by the WorkerPool and keep running on corrupt state. (The same invariant may still be re-checked earlier, where it is recoverable, as a throw or FO_VERIFY_AND_THROW — that is the echeloned model in §5, not a contradiction.)
Such invariants must abort in every build. FO_STRONG_ASSERT (Source/Essentials/ExceptionHandling.h) is unconditional — it always evaluates its condition and calls ReportExceptionAndExit on failure, regardless of build profile — so it is the correct tool for load-bearing post-mutation invariants.
Examples converted to FO_STRONG_ASSERT (all post-mutation, can’t-happen-with-correct-code):
EntityManagertyped-registry duplicate checks andUnregister*/global lookup checks (forward registration implies the typed/global maps agree).Crittervisibility-graph symmetry (AddVisibleCritterreverse insert,RemoveVisibleCritter/ClearVisibleEnititesreverse lookups: a forward link implies its reverse).EntitySyncpost-grant lock invariants (EntityLock::Acquire: a non-aborted waiter must wakeGRANTEDand own the lock).AngelScriptContextManager::ResumeSpecificContextpre-resume invariants (a suspended context must have its extended data / scheduler).
The legitimate non-invariant throws are left as FO_VERIFY_AND_THROW: e.g. EntityManager::RegisterEntity’s global-registry insert (a duplicate id from a corrupt persisted record is handled gracefully on the load path via is_error), and EntityLock::Acquire’s EntityLockWaitAbortedException (the legitimate shutdown-abort path).
5. Error tiers & choosing the disposition
Three tiers, classified by whether an error is expected and whether it is handled. All three stay active in every build, including release — we accept that our own bugs can surface only under real, loaded production conditions, beyond what development and testing catch, so we never compile out the lower tiers.
| Tier | When | Mechanism |
|---|---|---|
| Expected error | A condition we anticipate and design around — bad script arguments, malformed/untrusted input, a target legitimately full/absent/forbidden. Not an invariant violation. | throw a domain exception (ScriptException, DataBaseException, …). Caught upstream and turned into a normal failure result; no side effect when it fires before any mutation. |
| Unexpected but handled | An invariant violation — “can’t happen if the code is correct” — that we nonetheless re-check and catch rather than trust blindly. Assert semantics, not expectation. | The FO_VERIFY_* family (FO_VERIFY_AND_THROW, FO_VERIFY_AND_CONTINUE, FO_VERIFY_AND_RETURN, FO_VERIFY_AND_RETURN_VALUE). All report the violation (log + stack trace) and are never compiled out; the variant only chooses the post-report control flow — see “Picking the variant” below. |
| Unexpected and unhandled | An invariant violation reached where it is too late to recover — continuing would run on already-corrupt state and no caller can sensibly handle it. | FO_STRONG_ASSERT — deterministic ReportExceptionAndExit (terminate). Also always-on. |
Picking the FO_VERIFY_* variant. The tier is the same for all of them — an invariant violation, always reported — only the continuation after the report differs, and it is dictated by the control-flow context, not preference. In every case you get the report; what matters is how execution proceeds:
FO_VERIFY_AND_THROW— throwsVerificationException. Use only where an exception may legally propagate and be caught by some upper layer (a normal exception-permitting flow: a WorkerPool job, a script-call boundary, a destroy loop’stry). The current unit of work unwinds.FO_VERIFY_AND_CONTINUE— reports and keeps executing at the same point. Use in anoexceptcontext (a destructor, ascope_exit/scope_failbody, anoexceptcallback, any function on anoexceptpath — throwing there wouldstd::terminate), or in a loop where you want to skip a bad element and keep iterating.FO_VERIFY_AND_RETURN— reports and returns from avoidfunction. Use in anoexcept/no-throwvoidfunction that must bail out cleanly.FO_VERIFY_AND_RETURN_VALUE(expr, fallback, …)— reports and returns a safe fallback. Use in anoexcept/no-throw value-returning function (e.g. a getter that must yield something defined).
Rule of thumb: if you are (or might be) inside a noexcept region you cannot use FO_VERIFY_AND_THROW — pick CONTINUE / RETURN / RETURN_VALUE so the report is emitted and the caller is left in a defined state. (FO_STRONG_ASSERT is also valid from a noexcept region, when the right response is to terminate rather than continue.)
Echeloned (defense-in-depth) detection. The same underlying error may be caught by more than one tier, at different depths — and that is intentional: the path from a high-level entry point to the low-level mutation varies, so we catch it as early as possible without removing the deeper backstops. Catching an error earlier is always better than later, and there can never be “too much” error protection — so the tiers are additive, not either/or. Canonical example, “a script adds a child to an entity that is being destroyed”:
throwat the top — theFO_SCRIPT_APIadd-method rejects withScriptException(expected script misuse, caught early).FO_VERIFY_AND_THROWbelow — the internal mutation method (CritterManager::AddItemToCritter,Map::AddCritter,Entity::AddInnerEntity, …) re-checks the sameIsDestroyinginvariant, in case a call slipped past the top guard.FO_STRONG_ASSERTat the bottom — if a re-add nonetheless stalled a teardown loop (a pass that removes no dependency), the loop’s inline strict-monotonic-decrease check (§3.1) terminates (too late to undo).
Per-situation guidance:
- Expected runtime failure (bad args/input, target full, forbidden call) →
throwthe domain exception (e.g.DbStorage.Insert’s empty-document guard, theNetOutBufferoutgoing-message verifies, theFO_SCRIPT_APIargument andIsDestroyingchecks). NeverFO_STRONG_ASSERT— it has no side effect yet and crashing all players for a recoverable mistake (or one connection) is wrong. - Validate top-level (script / RPC / client-writable-property) arguments at the boundary, before the value can reach a deep
numeric_castor a low-levelFO_VERIFY_*. AFO_SCRIPT_APImethod’s args and a///@ RemoteCallpayload are script/client-controlled (a tampered client can send anything anint/ipos/mposcan hold). Range-check them in the export method andthrow ScriptException(range, bounds, non-negative) so the failure surfaces at the right altitude with a clear message — rather than a deepnumeric_cast<unsigned/smaller>throwing an opaqueOverflowException/UnderflowException, or astd::string::resize/container op throwinglength_error, far from the call site. The deep cast/verify stays as the echeloned backstop. (Audited 2026-06-16: no script-reachableFO_STRONG_ASSERTexists — the engine never terminates on bad script input — but several args reached deep wrong-tier throws; those boundary guards were added. The property get/set boundary was confirmed fully guarded.) - Invariant re-check you can still reject →
FO_VERIFY_AND_THROW(the internal add-to-destroying guards; any “this should already hold” check before a mutation that a caller can abort). - Invariant reached too late to recover →
FO_STRONG_ASSERT(post-mutation registry/graph/lock corruption; non-convergent teardown). - Throw-as-signal (no rollback) — the operation may legitimately not complete because an event/script changed the world; throw to inform the caller and leave the entity in the valid state the events produced (the lifecycle contract, §3).
- Commit-or-rollback (
scope_fail/scope_exit) — only when a step that genuinely can fail at runtime would otherwise leave two representations out of sync with no design intent to keep the partial state, and there is no allocation-only excuse (§1). The rollback body must benoexcept(safe_call). Confirm with a failing test first. - Reorder (validate-first, mutate-last) — do all fallible/validating work before the first irreversible mutation; insert into the authoritative store (with its duplicate guard) before populating any derived cache. (Example:
ProtoManager::AddProto.)
Persistence note: DB writes (DbStorage.Insert/Update/Delete) are enqueue-only and never perform synchronous backend I/O; a backend failure is handled asynchronously (recovery op-log + reconnect + panic-shutdown with replay on restart). So a DB write cannot throw a backend error mid-mutation — no write-through rollback is needed. See Source/Server/DataBase.cpp.
6. Primitives
scope_exit/scope_fail/scope_success(Source/Essentials/BasicCore.h): RAII guards.scope_failruns only on stack unwinding (an in-flight exception) and static-asserts its callback isnoexcept.safe_call(callable, …)(Source/Essentials/CommonHelpers.h): invoke a callable, swallowing any exception — use to make a rollback/teardown bodynoexcept.copy_hold_ref(container): snapshot a server-entity collection by ref-count so elements survive a loop that fires re-entrant events.- Inline teardown-progress guard (no shared class): seed
size_t prev_deps = std::numeric_limits<size_t>::max(), and at the end of each destruction-loop passFO_STRONG_ASSERT(remaining_deps < prev_deps, …)thenprev_deps = remaining_deps. Detects non-convergence by progress, not by an iteration cap, so a slow-but-progressing teardown is never falsely killed (§3.1).
7. Tests
The contracts above are pinned by Source/Tests/Test_EntityLifecycle.cpp (init/finish-event destruction, load/unload events, registry collections) and Source/Tests/Test_ServerMapOperations.cpp (the full map critter in/out/init × may-destroy / may-move-away / may-destroy-map matrix). When changing any lifecycle or invariant behavior, run LF_UnitTests and extend these suites rather than weakening an assertion.