On improving game saving routines


Recommended Posts

Heyo everyone!

The recent 1.06 patch addressed save games 'going missing' on PS4. Let's now also improve save file consistency on PC!

Current procedure:

Here's a brief outline of how saving a game on PC currently works. In the following pseudo-code, I'm using two variables, save_path and save_path_temp, which are just the locations of the current save file and another temporary file.

It's also important to know that the game only ever tries to load the save_path file, but never the save_path_temp file.

if "Compressed save data fits into shared buffer":
    1.1. Delete possibly existing temporary save file at save_path_temp
    1.2. Write the new save state from the shared buffer to save_path_temp
    1.3. Delete the old save file at save_path
    1.4. Copy file at save_path_temp to save_path
    1.5. Delete file at save_path_temp
    1.6. Done
else:
    2.1. Compress the save data into a new buffer
    2.2. Overwrite the current save by directly writing the buffer to save_path
    2.3. Done

As long as nothing unexpected happens, this code works perfectly.

Failure states:

This is where we get to the "What happens if something goes terribly wrong part" of this analysis. You'll see that if there's a crash while the game is saving its state to a file, it's very likely that that save file will be lost.

  • If the game crashes between 1.1 and 1.2, the new game state will not be saved, but the current save file has not been touched yet and can still be loaded.
    So all is fine here.
  • If the game crashes at 1.3 or 1.4, we're in some deep trouble, but the user may be able to manually recover their save file:
    The current save game has been deleted, but there is a perfectly valid save file at save_path_temp. Unfortunately, the loading routine won't attempt to load from that file. This means that the slot that the save game previously occupied will now appear to be empty. The user may be able to recover their save file by manually copying the new save file from save_path_temp to save_path.
  • If the game crashes at 2.1, we're in the same position as in 1.1. We'll lose the progress since the last time the game saved, but that's fortunately all that happens.
  • If the game crashes at 2.2, however, we've also lost our save file - and this time, there is no way to recover that file. The file at save_path will now contain an incomplete save state that cannot be loaded, and we never bothered to write to save_path_temp in the first place.

Proposed alternative:

Here's my take on how this procedure could be made simpler and more crash resistant without requiring any more computation or IO:

1. Write the save game to a buffer and hold on to it.
   It makes no difference whether this is the internal shared
   buffer or a newly created buffer.
2. If save_path_old exists, delete that file.
3. Copy the current save file from save_path to save_path_old
4. Delete the file at save_path
5. Write the buffer to a new file at save_path
6. Optionally delete the file at save_path_old

and also modify the save file loading procedure as such:

1. Try loading from save_path
2. If this fails, try loading from save_path_old
3. If this also fails, both save files are corrupted: Give up and notify the user

Analysis:

This new procedure is more crash resistant by always trying to keep one valid copy of the save file around.

  • If the game crashes at step 1, there's not much we can do. We could not create a new save state, but fortunately, we can still load the old game state from save_path.
  • If the game crashes at step 2 or 3, we're in the same situation as in step 1. We have also lost an older save file at save_path_old, but this should not worry us as there is still a valid save file at save_path, which we can load as usual.
  • If the game crashes at step 4 or 5, we will attempt to load from save_path first, which will fail because that file is either missing or not yet fully written. Fortunately, we've copied the old save file to save_path_old in step 3, which we will be able to load from.
  • If the game crashes at step 6, we may or may not have lost save_path_old. This does not really matter because we've already written the new game state to save_path, so everything is still fine.
  • I'd recommend against including step 6 anyways. It doubles the disk space required for save files, but it further reduces the risk of people losing their save files due to the file being corrupted.

As you can see, there is no longer a point in time during which the user can completely lose their save file.
Instead, they will always be able to load from where the game saved prior to the crash.

Further work:

These are items the current save game system already relies upon, but which could also be improved in the future. I've ordered them from most important to least important.

  • Caching. Of course, we need to make sure to always flush all our buffers - but that's not all. The operating system can also cache write operations, which it will later execute at its leisure. Flushing our user level caches is required for making the saving routine resistant to game crashes. If you guys also manage to flush the OS caches, the game saving routine should also be orders of magnitude more resistant to system crashes or sudden power losses while the game is saving to disk.
  • Both saving routines rely on the output of the game state serializer to be valid. If the save data that's stored in the buffer is invalid, we'll write a save file we can no longer load from. If there's ample time, it may be a nice idea to build a small verifier that attempts to catch these errors before the corrupted data is written to disk.
  • This is probably a bit over-cautious: Writing and copying to disk can introduce bit flips or other errors. If a disk stops responding for too long, for example, the operating system may silently discard pending write operations. The game could detect this and attempt to recover by calculating a hash of the file we've just written / copied and comparing that to the hash of the current buffer or file we're trying to copy. Again, this is probably over the top, but nevertheless worth considering.

Hopefully, this thread will make its way to the staff member responsible for this subsystem. Thanks!

Towards a future without lost save files!

Link to comment
Share on other sites

  • 3 weeks later...

How did you determine this information @StrangerFromTheInternet?

If you want a member of staff to notice something in this forum, you have to use the @ symbol aand then their forum name. The moderators of the forums tend to not read every post by every person in the forums...

I do, by going to activity and then reading all the new posts since the last time I was on the forums, but unfortunately I've never been asked to be a moderator....

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.