-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kill leftovers of pre-MVCC read conflicts #320
Kill leftovers of pre-MVCC read conflicts #320
Conversation
CI: py3.7 fails with:
I got this failure at least once locally without my patch. I will try to investigate. |
Yeah, there's something flaky about that test. A rerun worked correctly. Because that test is about concurrency, that indicates some underlying concurrency issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@jamadden, thanks for feedback rerun and LGTM! I'm still investigating and maybe the failure is indeed related to my patch. I've added the following to debug --- a/src/ZODB/mvccadapter.py
+++ b/src/ZODB/mvccadapter.py
@@ -152,6 +152,9 @@ def load(self, oid):
assert self._start is not None
r = self._storage.loadBefore(oid, self._start)
if r is None:
+ print('\n\n\nload %d <%s -> None\n\n' % (u64(oid), self._start.encode('hex')))
+ from zodbtools.zodbdump import zodbdump
+ zodbdump(self._storage, None, None, hashonly=True)
raise POSException.POSKeyError(oid)
return r[:2] and got:
After pack object
The load comes with before
which is earlier. The database view as of that before, after pack, should thus indeed see no data for that object. Previously this logic error in the test was hidden becuase there is catch all for ZODB/src/ZODB/tests/PackableStorage.py Lines 761 to 772 in 22df1fd
My draft conclusion is that it is incorrect to pack with cut time not before earliest of client, or else, in the presence of such pack, it is expected that database can raise POSKeyError for objects that are removed from under client foot. Do you agree? (I will try to think about it a bit more) |
I think I agree. Let me put it another way to make sure our understanding is the same: For storages that do not implement MVCC natively, if a storage has clients viewing it as-of time t, and you perform a pack of that storage as-of time t + n, bad things can happen because objects visible at time t might be removed, causing clients to receive Previously, they would have received a The options are:
Thoughts? |
ae7e656
to
fe3d054
Compare
@jamadden, thanks for feedback. Our understandings are the same. Yes, in ideal world we could enforce "don't pack before clients view", but this cannot be easily done with storages like ZEO and NEO which can be used by several clients simultaneously, and ZODB not providing any kind of notification for storages that a client is using it at such and such state. As much as I don't like it, I guess from backward-compatibility point of view the only way for us is option "2". I've amended the patch to return to raising (*) for the reference here is similar place in RelStorage: https://github.com/zodb/relstorage/blob/466168a3/src/relstorage/cache/storage_cache.py#L306-L326 diff --git a/m1.txt b/m2.txt
index 70876d70e..86251022c 100644
--- a/m1.txt
+++ b/m2.txt
@@ -1,4 +1,4 @@
-commit ae7e6564269577330d6cfb58193ed3c4e48b74f9
+commit fe3d0542c3b4f9a5e08ef1e907fd3ab06241406a
Author: Kirill Smelkov <[email protected]>
Date: Tue Jul 21 13:13:28 2020 +0300
@@ -13,22 +13,29 @@ Date: Tue Jul 21 13:13:28 2020 +0300
This means no more ReadConflictErrors, each transaction is guaranteed to be
able to load any object as it was when the transaction begun.
- So today the only way to get a ReadConflictError should be at commit time
- for an object that was requested to stay unchanged via
- checkCurrentSerialInTransaction.
+ So today the only way to get a ReadConflictError should be
- However MVCCAdapterInstance.load(), instead of reporting "no data", was
- still raising ReadConflictError for a deleted or not-yet-created object.
- If an object is deleted and later requested to be loaded, it should be
- "key not found in database", i.e. POSKeyError, not ReadConflictError.
- Fix it.
+ 1) at commit time for an object that was requested to stay unchanged
+ via checkCurrentSerialInTransaction, and
- Adjust docstring of ReadConflictError accordingly to explicitly describe
- that this error can only happen at commit time for objects requested to
- be current.
+ 2) at plain access time, if a pack running simultaneously to current
+ transaction, removes object revision that we try to load.
+
+ The second point is a bit unfortunate, since when load discovers that
+ object was deleted or not yet created, it is logically more clean to
+ raise POSKeyError. However due to backward compatibility we still want
+ to raise ReadConflictError in this case - please see comments added to
+ MVCCAdapter for details.
+
+ Anyway, let's remove leftovers of handling regular read-conflicts from
+ pre-MVCC era:
+
+ Adjust docstring of ReadConflictError to explicitly describe that this
+ error can only happen at commit time for objects requested to be
+ current.
There were also leftover code, comment and test bits in Connection,
- interfaces, transact, testmvcc and testZODB, that are corrected/removed
+ interfaces, testmvcc and testZODB, that are corrected/removed
correspondingly. testZODB actually had ReadConflictTests that was
completely deactivated: commit b0f992fd ("Removed the mvcc option..."; 2007)
moved read-conflict-on-access related tests out of ZODBTests, but did not
@@ -61,7 +68,8 @@ Date: Tue Jul 21 13:13:28 2020 +0300
AttributeError: 'module' object has no attribute 'utils'
Since today ZODB always uses MVCC and there is no way to get
- ReadConflictError on access, those tests should be also gone together
- with old pre-MVCC way of handling concurrency.
+ ReadConflictError on concurrent plain read/write access, those tests
+ should be also gone together with old pre-MVCC way of handling
+ concurrency.
/cc @jimfulton
diff --git a/src/ZODB/POSException.py b/src/ZODB/POSException.py
index 1a833b15d..ed84af776 100644
--- a/src/ZODB/POSException.py
+++ b/src/ZODB/POSException.py
@@ -149,6 +149,12 @@ class ReadConflictError(ConflictError):
An object was requested to stay not modified via
checkCurrentSerialInTransaction, and at commit time was found to be
changed by another transaction (eg. another thread or process).
+
+ Note: for backward compatibility ReadConflictError is also raised on
+ plain object access if
+
+ - object is found to be removed, and
+ - there is possibility that database pack was running simultaneously.
"""
def __init__(self, message=None, object=None, serials=None, **kw):
if message is None:
diff --git a/src/ZODB/mvccadapter.py b/src/ZODB/mvccadapter.py
index 4cd7d723d..dc1f77da2 100644
--- a/src/ZODB/mvccadapter.py
+++ b/src/ZODB/mvccadapter.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
"""Adapt IStorage objects to IMVCCStorage
This is a largely internal implementation of ZODB, especially DB and
@@ -9,7 +10,7 @@
import zope.interface
from . import interfaces, serialize, POSException
-from .utils import p64, u64, Lock
+from .utils import p64, u64, Lock, oid_repr, tid_repr
class Base(object):
@@ -152,7 +153,31 @@ def load(self, oid):
assert self._start is not None
r = self._storage.loadBefore(oid, self._start)
if r is None:
- raise POSException.POSKeyError(oid)
+ # object was deleted or not-yet-created.
+ # raise ReadConflictError - not - POSKeyError due to backward
+ # compatibility: a pack(t+δ) could be running simultaneously to our
+ # transaction that observes database as of t state. Such pack,
+ # because it packs the storage from a "future-to-us" point of view,
+ # can remove object revisions that we can try to load, for example:
+ #
+ # txn1 <-- t
+ # obj.revA
+ #
+ # txn2 <-- t+δ
+ # obj.revB
+ #
+ # for such case we want user transaction to be restarted - not
+ # failed - by raising ConflictError subclass.
+ #
+ # XXX we don't detect for pack to be actually running - just assume
+ # the worst. It would be good if storage could provide information
+ # whether pack is/was actually running and its details, take that
+ # into account, and raise ReadConflictError only in the presence of
+ # database being simultaneously updated from back of its log.
+ raise POSException.ReadConflictError(
+ "load %s @%s: object deleted, likely by simultaneous pack" %
+ (oid_repr(oid), tid_repr(p64(u64(self._start)-1))))
+
return r[:2]
def prefetch(self, oids):
diff --git a/src/ZODB/tests/testmvcc.py b/src/ZODB/tests/testmvcc.py
index 70152d1b6..d8f13c8ca 100644
--- a/src/ZODB/tests/testmvcc.py
+++ b/src/ZODB/tests/testmvcc.py
@@ -386,7 +386,7 @@
We'll reuse the code from the example above, except that there will
only be a single revision of "b." As a result, the attempt to
-activate "b" will result in a POSKeyError.
+activate "b" will result in a ReadConflictError.
>>> ts = TestStorage()
>>> db = DB(ts)
@@ -413,7 +413,7 @@
>>> r1["b"]._p_activate() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
-POSKeyError: ...
+ReadConflictError: ...
>>> db.close()
"""
@@ -427,7 +427,7 @@
(re.compile("b('.*?')"), r"\1"),
# Python 3 adds module name to exceptions.
(re.compile("ZODB.POSException.ConflictError"), r"ConflictError"),
- (re.compile("ZODB.POSException.POSKeyError"), r"POSKeyError"),
+ (re.compile("ZODB.POSException.ReadConflictError"), r"ReadConflictError"),
])
def test_suite():
diff --git a/src/ZODB/transact.py b/src/ZODB/transact.py
index 1ffd20ba1..a2927d794 100644
--- a/src/ZODB/transact.py
+++ b/src/ZODB/transact.py
@@ -13,7 +13,7 @@
##############################################################################
"""Tools to simplify transactions within applications."""
-from ZODB.POSException import ConflictError
+from ZODB.POSException import ReadConflictError, ConflictError
import transaction
def _commit(note):
@@ -40,7 +40,16 @@ def g(*args, **kwargs):
n = retries
while n:
n -= 1
- r = f(*args, **kwargs)
+ try:
+ r = f(*args, **kwargs)
+ except ReadConflictError as msg:
+ # the only way ReadConflictError can happen here is due to
+ # simultaneous pack removing objects revision that f could try
+ # to load.
+ transaction.abort()
+ if not n:
+ raise
+ continue
try:
_commit(note)
except ConflictError as msg: |
The other option is to push This would free MVCCStorage from thankless job of guessing whether pack was running or not, and provide possibility to give always correct results with good semantic. |
Technically, yes. I question whether that's an actual use case though (who packs to "now"?). Still, the cost of backwards compatibility isn't very high, and "technically correct is the best kind of correct". So I'm fine with it. Thanks for tracking that down. |
@jamadden, thanks for feedback.
Did you mean "is very high" here? For the reference, after reverting to old behaviour, I'm having trouble for my actual use case which hit the need to remove read conflicts. Now I'm thinking in a way to extend drivers to report last packtime, in case of deletion compare before to that packtime, and report ReadConflict only if before is earlier than packtime, POSKeyError otherwise. Do you think that would be a good way? |
No, I meant "isn't". It's the difference between raising one exception and raising another one.
Not really, no. To me, that sounds like way too much work for what doesn't really seem like a practical problem. |
I see. If so, and if possible, I prefer to raise POSKeyError for correctness. In this case would the following approach be practically fine with you? --- a/src/ZODB/DB.py
+++ b/src/ZODB/DB.py
@@ -830,9 +830,24 @@ def pack(self, t=None, days=0):
the number of days to subtract from t or from the current
time if t is not specified.
"""
+ hour = 3600
if t is None:
t = time.time()
- t -= days * 86400
+ t -= days * 24 * hour
+
+ # don't allow packtime to start close to head of database log.
+ # this way we practically avoid scenario when concurrent pack can remove
+ # object revisions that "current" connections, started with database view
+ # being a bit earlier than packtime, could try to load.
+ #
+ # If such transaction will impractically last longer than 1 hour, the
+ # worst that could happen is that, in the presense of simultaneous pack,
+ # read for object with removed revision will return POSKeyError instead
+ # of ReadConflictError.
+ head = self.storage.lastTransaction()
+ thead = TimeStamp(head).timeTime()
+ t = min(t, thead - 1*hour)
+
try:
self.storage.pack(t, self.references)
except: If yes, I will rework the patch back, and amend the test correspondingly (likely adding |
I appreciate the thought behind having the DB mutate the parameters the user gives to try to make them "safe" (for FileStorage). But I feel that's a different proposal, and it's not one that I would support. Limitations in one storage engine shouldn't dictate the DB's behaivour in this manner. If someone is packing a FileStorage to now while simultaneously running transactions against it, that either needs to be fixed in FileStorage (tricky), or documented as "don't do that". But more to the point, I would be seriously surprised if anyone packs a production system to now while it's running with open transactions and expects reasonable results. If they're trying to pack like that, and are relying on the old behaviour to get application retry to happen. then their transaction is going to take a potentially huge leap forward into a very different state, and it still might fail the same way again — or maybe not; it's inconsistent and unpredictable. In short, I just don't think that's a use case worth worrying about because it's already pretty broken. |
My 2 cents (and hi Kirill !).
...and that such pack would reach any object relevant to ongoing transactions before these transactions are over. [EDIT]: made above sentence grammatically better. I mean, the larger the database the longer it takes to pack and the smaller the proportion of objects in active use at any given time. So it will likely take time for the pack to reach any object which is in-use at pack beginning, so the less likely it becomes that such object is still in-use by the time its history gets pruned. While not all production databases are large, the incentive for packing should grow with database size so smaller databases should be less at risk just because nobody tries to pack them. Of course, none of this is a very strong guarantee (ex: maybe the application is spending its time modifying the root object on a large database, which will likely be the first to get its history chopped), but these to me point in the direction of these issues being most relevant to toy environments (small database aggressively packed), for which I would personally not tend to optimize if it costs anything for larger environments/more reasonable uses.
I have a faint memory from when I implemented the initial pack support for NEO, but I believe this is already kind of present: if an object got its history trimmed by pack, the oldest kept A packed creation-undone object is IIRC not detectable after the fact, but in this case my argument above seems even stronger:
If the snapshot of the original transaction is still active after all those operations (3 and 4 being likely slow) I would tend to think there are more dubious practices going on than just aggressive packing. |
FWIW, with the work we did at ZC, it was very workable to pack a large database while it was running, which we did weekly. We did the GC analysis reading data from a ZRS secondary, and we could do this at our leisure without impacting the primary. When we were done we committed a single transaction that generated a pile of delete records, that were used in subsequent packs. We always packed to some time in the past, which was useful for auditing and manual recovery from oopsies. :) Packing to some time in the past also mitigated the risk of affecting recent transactions. FTR, packing never violated transactional integrity unless an application had a bug like holding the only reference to a persistent object in memory. |
fe3d054
to
0d499a3
Compare
@jamadden, @vpelletier, @jimfulton, thanks for feedback (hi Vincent, glad to see you around ZODB Jim). First of all I appologize for the delay with replying - last few days were a bit crazy on my side. Anyway after thinking about all this overnights:
Kirill |
@jamadden wrote:
Currently, it's indeed unlikely because it's always good to keep some history in case of recent application regression (fixing data can be easier when old data is easily available). But with NEO, there's an upcoming new use case. We'll soon rework pack and extend it with partial pack. The idea is minimize the history size of some parts of the ZODB, i.e. packing will be automatic and the cut point will be quite recent. I haven't started to think more about the implementation but an appropriate error when accessing data before pack point might help. |
@jamadden, this patch was already approved, but it was before I reworked it to actually only remove things without changing semantic. Do you still think that it is ok for it to go in? To me it should be ok, given that here we only remove unused bits and improve For the reference: the change of semantic was moved to separate pull requests: #322 and #323. Thanks beforehand for feedback, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is now a minimal internal change whose only visible external effect is a slightly more descriptive error message.
In the early days, before MVCC was introduced, ZODB used to raise ReadConflictError on access to object that was simultaneously changed by another client in concurrent transaction. However, as doc/articles/ZODB-overview.rst says Since Zope 2.8 ZODB has implemented **Multi Version Concurrency Control**. This means no more ReadConflictErrors, each transaction is guaranteed to be able to load any object as it was when the transaction begun. So today the only way to get a ReadConflictError should be 1) at commit time for an object that was requested to stay unchanged via checkCurrentSerialInTransaction, and 2) at plain access time, if a pack running simultaneously to current transaction, removes object revision that we try to load. The second point is a bit unfortunate, since when load discovers that object was deleted or not yet created, it is logically more clean to raise POSKeyError. However due to backward compatibility we still want to raise ReadConflictError in this case - please see comments added to MVCCAdapter for details. Anyway, let's remove leftovers of handling regular read-conflicts from pre-MVCC era: Adjust docstring of ReadConflictError to explicitly describe that this error can only happen at commit time for objects requested to be current, or at plain access if pack is running simultaneously under connection foot. There were also leftover code, comment and test bits in Connection, interfaces, testmvcc and testZODB, that are corrected/removed correspondingly. testZODB actually had ReadConflictTests that was completely deactivated: commit b0f992f ("Removed the mvcc option..."; 2007) moved read-conflict-on-access related tests out of ZODBTests, but did not activated moved parts at all, because as that commit says when MVCC is always on unconditionally, there is no on-access conflicts: Removed the mvcc option. Everybody wants mvcc and removing us lets us simplify the code a little. (We'll be able to simplify more when we stop supporting versions.) Today, if I try to manually activate that ReadConflictTests via @@ -637,6 +637,7 @@ def __init__(self, poisonedjar): def test_suite(): return unittest.TestSuite(( unittest.makeSuite(ZODBTests, 'check'), + unittest.makeSuite(ReadConflictTests, 'check'), )) if __name__ == "__main__": it fails in dumb way showing that this tests were unmaintained for ages: Error in test checkReadConflict (ZODB.tests.testZODB.ReadConflictTests) Traceback (most recent call last): File "/usr/lib/python2.7/unittest/case.py", line 320, in run self.setUp() File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/testZODB.py", line 451, in setUp ZODB.tests.utils.TestCase.setUp(self) AttributeError: 'module' object has no attribute 'utils' Since today ZODB always uses MVCC and there is no way to get ReadConflictError on concurrent plain read/write access, those tests should be also gone together with old pre-MVCC way of handling concurrency. /cc @jimfulton /reviewed-on zopefoundation#320 /reviewed-by @jamadden
0d499a3
to
28a0b0c
Compare
@jamadden, thanks for feedback and LGTM. I've incorporated your suggestion into the patch. Let's wait for CI to run and then merge it if all is ok. interdiff--- a/COMMIT_MSG.a
+++ b/COMMIT_MSG.b
@@ -73,3 +73,5 @@ Date: Tue Jul 21 13:13:28 2020 +0300
concurrency.
/cc @jimfulton
+ /reviewed-on https://github.com/zopefoundation/ZODB/pull/320
+ /reviewed-by @jamadden
--- a/src/ZODB/mvccadapter.py
+++ b/src/ZODB/mvccadapter.py
@@ -176,7 +176,7 @@ def load(self, oid):
# database being simultaneously updated from back of its log.
raise POSException.ReadConflictError(
"load %s @%s: object deleted, likely by simultaneous pack" %
- (oid_repr(oid), tid_repr(p64(u64(self._start)-1))))
+ (oid_repr(oid), tid_repr(p64(u64(self._start) - 1))))
return r[:2] |
If you wouldn't force-push, you wouldn't need to manually paste that "interdiff" and things would be a lot easier for reviewers, both now and in the future when browsing the history of the repository. (Basically I'm asking: please don't force-push. Just push new commits for new changes.) I forgot to mention that a change note would be appreciated. While this is mostly internal, it does change an exception representation, and that could impact doctests. |
@jamadden, I see. Ok, I will try, from now on, to push up new fixup commits and squash the result when merging. The short-term history is indeed handy while reviewing patch iterations (that's what interdiff and/or separate fixup pushes are for). However the short-term history is of negative value (imho) when the change is landed into repository into its main history - if someone is navigating through master history, or learning/navigating the codebase via Regarding change note: if possible, could we please delay it a bit till after going through at least #322 ? Then it will be something like
Is it ok? |
( applied the patch to master ) |
In the early days, before MVCC was introduced, ZODB used to raise
ReadConflictError on access to object that was simultaneously changed by
another client in concurrent transaction. However, as
doc/articles/ZODB-overview.rst says
So today the only way to get a ReadConflictError should be at commit time
for an object that was requested to stay unchanged via
checkCurrentSerialInTransaction.
However MVCCAdapterInstance.load(), instead of reporting "no data", was
still raising ReadConflictError for a deleted or not-yet-created object.
If an object is deleted and later requested to be loaded, it should be
"key not found in database", i.e. POSKeyError, not ReadConflictError.
Fix it.
Adjust docstring of ReadConflictError accordingly to explicitly describe
that this error can only happen at commit time for objects requested to
be current.
There were also leftover code, comment and test bits in Connection,
interfaces, transact, testmvcc and testZODB, that are corrected/removed
correspondingly. testZODB actually had ReadConflictTests that was
completely deactivated: commit b0f992f ("Removed the mvcc option..."; 2007)
moved read-conflict-on-access related tests out of ZODBTests, but did not
activated moved parts at all, because as that commit says when MVCC is
always on unconditionally, there is no on-access conflicts:
Today, if I try to manually activate that ReadConflictTests via
it fails in dumb way showing that this tests were unmaintained for ages:
Since today ZODB always uses MVCC and there is no way to get
ReadConflictError on access, those tests should be also gone together
with old pre-MVCC way of handling concurrency.
/cc @jimfulton