Book notes – Data-oriented design

Data-oriented design: software engineering for limited resources and short schedules (Goodreads)

  • The author didn’t know of any big AAA game project fully built using DoD, but it is already vastly used in performance critical areas like particles and big simulations;
  • Employee turnover would probably be one of the biggest problems of not “documenting” the problem domain using OOP, since the original design is not easily inferable from inspecting the code in DoD, external documentation is required;
  • After profiling and identifying where most execution time is being spent in a project it is worth remembering that memory access (specially cache misses) are one of the most expensive things that a processor can do (another are mutexes);
  • Decoupling the data from the processing and keeping the data in low complexity arrays (of structs if the data is always used together, primitive types if not) allows already a much cheaper usage of memory timewise;
  • Separating hot data (frequently accessed or updated) and cold (mostly constant) is alsovery beneficial for memory reads;
  • Instead of conditions/branching (which potentially wastes processing and memory reading with prediction) the author proposes to use presence in arrays as an indication that you need a specific processing done (existential processing, reminds a lot of job systems);
  • Enums and polymorphism can be treated in a similar way;
  • Instead of following barely predictable invocation trees from objects, you loop over your arrays in a defined order in the time you reserve for your main update;
  • To get the best results data transforms should be idempotent (always the same output for the same inputs) and write the outputs to a different place to the inputs;
  • Knowing what data is going to be produced and what is going to be read at each step allows the code to be massively parallelized without locks (and/or by using SIMD);
  • Idempotent transforms make understanding and testing a specific part of the code much more straightforward;
  • The author separates data transforms in maps (one input, one or more outputs), generators (no inputs, one or more outputs) or reduce/filters (multiple inputs, the same amount or less of outputs);
  • Database normalization is a great way of organizing the data so there is no redundancy, and also making data relationships flat;
  • Column-oriented DBMS and structs of arrays are good references about how to implement these ideas;
  • Related talk: CppCon 2014: Mike Acton “Data-Oriented Design and C++”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s