You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, wanted to share my view on the DocHandle eviction problem and hear your thoughts. If the suggested approach sounds reasonable, I can start working on it this or early next week.
The problem:
DocHandles are hard-referenced in Repo#handleCache and never destroyed. The longer the program runs the more "idle" documents there are.
Constraints:
We definitely don't want to unload a document if client code is working with it (holds a reference).
We don't want to unload a document another peer is actively replicating to us, so that we don't need to load it back to memory on every new message.
There might be >10k handles if objects are small and each reside in their own document, unlikely >100k (?).
Suggested solution:
Rename DocHandle to InternalDocHandle, all the private logic is going to stay here.
Add DocHandle that'll hold a hard-ref to InternalDocHandle. Public API goes here.
InternalDocHandle holds a weak-ref to DocHandle that hard-references it. We need this to avoid unloading documents referenced by client code.
Maintain a lastChange timestamp on InternalDocHandle. We need it to avoid unloading documents which are being replicated.
setInterval to scan all handles, removing those where DocHandle weak-ref is undefined and it's been more than a minute (?) since the last document update. Should probably be user-configurable.
Scan can be incremental, yielding after 10k items not to hold event-loop for too long, we're ok with eventual eviction. We can continue the scan from where we left on the next interval trigger or just schedule a macro-task to continue after other callbacks had a go. I'd probably go with the latter approach.
Alternatives:
Maintain an LRU-cache, moving handles in response to change events, when garbage-collecting we can iterate from least to most recently used and stop iteration as soon as time passed since lastChange doesn't exceed eviction threshold. Not a bad idea, but we don't gain much if user code holds references to a lot of objects which rarely change and sit in the tail. We'll need to go through all of them first. In addition with our data set size reference walking can be worse than linear array scan even if the latter checks more elements.
Don't introduce InternalDocHandle. Let's say we maintain an LRU cache as in alternative#1 and based on lastChange timestamp we move handles from strong-ref active cache to inactive: Map<DocumentId, WeakMap<DocHandle>> cache. On sync message or find we can look up in both and return an existing handle which was still in-use by user-code. On change we can bring handles back to activeCache if they weren't there. Don't like this solution because:
Quite error-prone and unjustifiably complicated: when a handle is moved to inactiveCache we need to ensure that all internal references are cleaned-up. When we bring it back to active cache we need to ensure everything is in place, like there's an active docSynchronizer.
We'll still need to full-scan and periodically clean inactiveCache entries where deref resolves to undefined.
The text was updated successfully, but these errors were encountered:
Hi, wanted to share my view on the DocHandle eviction problem and hear your thoughts. If the suggested approach sounds reasonable, I can start working on it this or early next week.
The problem:
DocHandles are hard-referenced in
Repo#handleCache
and never destroyed. The longer the program runs the more "idle" documents there are.Constraints:
Suggested solution:
DocHandle
toInternalDocHandle
, all the private logic is going to stay here.DocHandle
that'll hold a hard-ref toInternalDocHandle
. Public API goes here.InternalDocHandle
holds a weak-ref toDocHandle
that hard-references it. We need this to avoid unloading documents referenced by client code.InternalDocHandle
. We need it to avoid unloading documents which are being replicated.setInterval
to scan all handles, removing those whereDocHandle
weak-ref is undefined and it's been more than a minute (?) since the last document update. Should probably be user-configurable.Alternatives:
change
events, when garbage-collecting we can iterate from least to most recently used and stop iteration as soon as time passed sincelastChange
doesn't exceed eviction threshold. Not a bad idea, but we don't gain much if user code holds references to a lot of objects which rarely change and sit in the tail. We'll need to go through all of them first. In addition with our data set size reference walking can be worse than linear array scan even if the latter checks more elements.InternalDocHandle
. Let's say we maintain an LRU cache as in alternative#1 and based onlastChange
timestamp we move handles from strong-refactive
cache toinactive: Map<DocumentId, WeakMap<DocHandle>>
cache. On sync message orfind
we can look up in both and return an existing handle which was still in-use by user-code. Onchange
we can bring handles back toactiveCache
if they weren't there. Don't like this solution because:inactiveCache
we need to ensure that all internal references are cleaned-up. When we bring it back to active cache we need to ensure everything is in place, like there's an active docSynchronizer.inactiveCache
entries wherederef
resolves to undefined.The text was updated successfully, but these errors were encountered: