1a59a1bb99
Observed problem: Persisted index state (in root bolt) would contain index snapshots which pointed to index files that did not exist. Debugging this uncovered two main problems: 1. At the end of persisting a snapshot, the persister creates a new index snapshot with the SAME epoch as the current root, only it replaces in-memory segments with the new disk based ones. This is problematic because reference counting an index segment triggers "eligible for deletion". And eligible for deletion is keyed by epoch. So having two separate instances going by the same epoch is problematic. Specifically, one of them gets to 0 before the other, and we wrongly conclude it's eligible for deletion, when in fact the "other" instance with same epoch is actually still in use. To address this problem, we have modified the behavior of the persister. Now, upon completion of persistence, ONLY if new files were actually created do we proceed to introduce a new snapshot. AND, this new snapshot now gets it's own brand new epoch. BOTH of these are important because since the persister now also introduces a new epoch, it will see this epoch again in the future AND be expected to persist it. That is OK (mostly harmless), but we cannot allow it to form a loop. Checking that new files were actually introduced is what short-circuits the potential loop. The new epoch introduced by the persister, if seen again will not have any new segments that actually need persisting to disk, and the cycle is stopped. 2. The implementation of NumSnapshotsToKeep, and related code to deleted old snapshots from the root bolt also contains problems. Specifically, the determination of which snapshots to keep vs delete did not consider which ones were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if the introducer gets 3 snapshots ahead of the persister, what can happen is that the three snapshots we choose to keep are all in memory. We now wrongly delete all of the snapshots from the root bolt. But it gets worse, in this instant of time, we now have files on disk that nothing in the root bolt points to, so we also go ahead and delete those files. Those files were still being referenced by the in-memory snapshots. But, now even if they get persisted to disk, they simply have references to non-existent files. Opening up one of these indexes results in lost data (often everything). To address this problem, we made large change to the way this section of code operates. First, we now start with a list of all epochs actually persisted in the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to keep. Third, anything else in the eligibleForRemoval list will be deleted. I suspect this code is slower and less elegant, but I think it is more correct. Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to 1, which feels like saner out-of-the-box behavior (though it's debatable if the original intent was perhaps instead for "extra" snapshots to keep, but with the variable named as it is, 1 makes more sense to me) Other minor changes included in this change: - Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and 'ineligibleForRemoval' members of Scorch struct were moved into the paragraph with 'rootLock' to clarify that you must hold the lock to access it. - TestBatchRaceBug260 was updated to properly Close() the index, which leads to occasional test failures. |
||
---|---|---|
.. | ||
scorch | ||
store | ||
upsidedown | ||
analysis.go | ||
field_cache.go | ||
index.go |