Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jepsen: G-nonadjacent-item #13318

Open
snaury opened this issue Jan 13, 2025 · 6 comments
Open

Jepsen: G-nonadjacent-item #13318

snaury opened this issue Jan 13, 2025 · 6 comments
Assignees
Labels
area/datashard Issues related to datashard tablets (relational table partitions) bug Something isn't working

Comments

@snaury
Copy link
Member

snaury commented Jan 13, 2025

20250110T200005.800+0300

G-nonadjacent-item #0
Let:
  T1 = {:index 139718, :time 193500112399, :type :ok, :process 101, :f :txn,
        :value [[:r 96 [8 20 32 37 38 51 54 55 70 77 78 84 86 87 88 90 97 122 126 127 154 155 158 174 188 193 197 209 200 226 227 246 256 262 278 299 309 314 315 321 325 326 334 343 351 354 355 359 368 362 363 369 370 382 383 395 410 425 428 449 450 452 471 481 495 507 510 514 517 521 535 544 552 583 584 593 597 633]]
                [:r 96 [8 20 32 37 38 51 54 55 70 77 78 84 86 87 88 90 97 122 126 127 154 155 158 174 188 193 197 209 200 226 227 246 256 262 278 299 309 314 315 321 325 326 334 343 351 354 355 359 368 362 363 369 370 382 383 395 410 425 428 449 450 452 471 481 495 507 510 514 517 521 535 544 552 583 584 593 597 633]]],
        :modified-txn [[:commit nil
                        [:batch nil [[:r 96 nil]
                                     [:r 96 nil]]]]],
        :debug-info [[{:tablet 72075186224210175, :node 50004, :gen 2, :op "read", :read-id 0, :read-version "v1736528601701/18446744073709551615", :snapshot-repeatable true}
                      {:tablet 72075186224210175, :node 50004, :gen 2, :op "read", :read-id 0, :read-version "v1736528601701/18446744073709551615", :snapshot-repeatable true}]]}
  T2 = {:index 139760, :time 193662048610, :type :ok, :process 153, :f :txn,
        :value [[:append 92 384]
                [:append 96 643]
                [:r 96 [8 20 32 37 38 51 54 55 70 77 78 84 86 87 88 90 97 122 126 127 154 155 158 174 188 193 197 209 200 226 227 246 256 262 278 299 309 314 315 321 325 326 334 343 351 354 355 359 368 362 363 369 370 382 383 395 410 425 428 449 450 452 471 481 495 507 510 514 517 521 535 544 552 583 584 593 597 633 643]]
                [:r 96 [8 20 32 37 38 51 54 55 70 77 78 84 86 87 88 90 97 122 126 127 154 155 158 174 188 193 197 209 200 226 227 246 256 262 278 299 309 314 315 321 325 326 334 343 351 354 355 359 368 362 363 369 370 382 383 395 410 425 428 449 450 452 471 481 495 507 510 514 517 521 535 544 552 583 584 593 597 633 643]]],
        :modified-txn [[:batch nil [[:append 92 384]
                                    [:append 96 643]]]
                       [:commit nil [:batch nil [[:r 96 nil]
                                                 [:r 96 nil]]]]],
        :debug-info [[{:tablet 72075186224210175, :gen 2, :read-version "v1736528601701/18446744073709551615", :op "read", :node 50004, :read-id 1, :snapshot-repeatable true, :lock-tx-id 844425153304948}]
                     [{:tablet 72075186224210175, :gen 2, :op "kqp-tx", :node 50004, :snapshot "v1736528601701/18446744073709551615", :tx-id 844425153304997, :snapshot-repeatable true, :lock-tx-id 844425153304948}
                      {:tablet 72075186224210175, :gen 2, :read-version "v1736528601701/18446744073709551615", :op "read", :node 50004, :read-id 0, :snapshot-repeatable true, :lock-tx-id 844425153304948}
                      {:tablet 72075186224210175, :gen 2, :read-version "v1736528601701/18446744073709551615", :op "read", :node 50004, :read-id 0, :snapshot-repeatable true, :lock-tx-id 844425153304948}
                      {:tablet 72075186224210175, :node 50004, :gen 2, :op "kqp-tx", :tx-id 844425153305075, :version "v1736528601741/18446744073709551615"}]]}
  T3 = {:index 139812, :time 193800571315, :type :ok, :process 248, :f :txn,
        :value [[:r 96 [8 20 32 37 38 51 54 55 70 77 78 84 86 87 88 90 97 122 126 127 154 155 158 174 188 193 197 209 200 226 227 246 256 262 278 299 309 314 315 321 325 326 334 343 351 354 355 359 368 362 363 369 370 382 383 395 410 425 428 449 450 452 471 481 495 507 510 514 517 521 535 544 552 583 584 593 597 633 643]]
                [:r 8 [1 27 42 44 45 49 58 79 85 89 90 104 120 136 139 140 144 146 155 169 176 200 202 214 225 245 258 263 281 287 297 298 304 310 318 328 331 335 341 344 358 365 369 382 387 390 394 400 409 411 414 426 431 433 437 443 446 447 456 457 463 470 475 480 493 494 499 505 496 510 513 515 531 539 540 553 562 582 584 590 596 598 606 616 623 626 632 634 638 642 644 647 648 641 654]]],
        :modified-txn [[:commit nil
                        [:batch nil [[:r 96 nil]
                                     [:r 8 nil]]]]],
        :debug-info [[{:tablet 72075186224209701, :node 50004, :gen 12, :op "read", :read-id 0, :read-version "v1736528601840/18446744073709551615", :snapshot-repeatable true}
                      {:tablet 72075186224210175, :node 50004, :gen 2, :op "read", :read-id 0, :read-version "v1736528601840/18446744073709551615", :snapshot-repeatable true}]]}
  T4 = {:index 139722, :time 193504731955, :type :ok, :process 23, :f :txn,
        :value [[:append 8 655]
                [:append 77 350]],
        :modified-txn [[:batch nil [[:append 8 655]
                                    [:append 77 350]]]],
        :debug-info [[{:tablet 72075186224209701, :gen 12, :read-version "v1736528601530/18446744073709551615", :op "read", :node 50004, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562950176977329}
                      {:tablet 72075186224210157, :gen 3, :read-version "v1736528601530/18446744073709551615", :op "read", :node 50006, :read-id 2, :snapshot-repeatable true, :lock-tx-id 562950176977329}]
                     [{:tablet 72075186224209701, :node 50004, :gen 12, :op "kqp-tx", :tx-id 562950176977431, :step 1736528601611, :version "v1736528601611/562950176977431"}
                      {:tablet 72075186224210157, :node 50006, :gen 3, :op "kqp-tx", :tx-id 562950176977431, :step 1736528601611, :version "v1736528601611/562950176977431"}]]}
  T5 = {:index 139738, :time 193520146000, :type :ok, :process 271, :f :txn,
        :value [[:r 94 [3 9 16 17 18 30 37 53 60 64 71 83 89 91 92 102 105 109 123 125 127 135 139 141 144 145 152 154 157 163 173 183 184 192 190 209 198 216 215 224 225 243 244 245 242 250 255 261 262 264 270 272 276 277 279 287 292 295 304 310 327 333 334 345 341 348 361 369 372 383 371 390 391 393 402 411 421 424 426 431 441 445 452 455 464 466 486 491 500 505 510 523 524 528 535 543 544 556 559 567 572 578 594 595 600 601 602 603 608 618 621 626 627 640 642 547 674 672 673 677 685 699 695 703 697 712 722 727 734 741 742 753 758 761 762 774 778 780 783 792 793 800 814 824 825 830 831 856 857 859 865 878 881 896 903]]
                [:r 77 [4 7 17 18 20 21 23 25 28 29 30 34 37 39 42 44 46 48 53 54 55 63 67 71 78 80 84 86 102 115 117 118 143 145 148 150 153 160 161 172 189 202 217 230 233 240 244 247 249 258 269 273 278 282 289 291 292 294 296 298 313 321 327 328 329 331 334 343 344 350]]
                [:r 96 [8 20 32 37 38 51 54 55 70 77 78 84 86 87 88 90 97 122 126 127 154 155 158 174 188 193 197 209 200 226 227 246 256 262 278 299 309 314 315 321 325 326 334 343 351 354 355 359 368 362 363 369 370 382 383 395 410 425 428 449 450 452 471 481 495 507 510 514 517 521 535 544 552 583 584 593 597]]],
        :modified-txn [[:commit nil
                        [:batch nil [[:r 94 nil]
                                     [:r 77 nil]
                                     [:r 96 nil]]]]],
        :debug-info [[{:tablet 72075186224210157, :node 50006, :gen 3, :op "read", :read-id 0, :read-version "v1736528601690/18446744073709551615", :snapshot-repeatable true}
                      {:tablet 72075186224210175, :node 50004, :gen 2, :op "read", :read-id 0, :read-version "v1736528601690/18446744073709551615", :snapshot-repeatable true}
                      {:tablet 72075186224210175, :node 50004, :gen 2, :op "read", :read-id 0, :read-version "v1736528601690/18446744073709551615", :snapshot-repeatable true}]]}
  T6 = {:index 139714, :time 193489156392, :type :ok, :process 253, :f :txn,
        :value [[:append 91 787]
                [:append 94 915]
                [:append 96 633]],
        :modified-txn [[:batch nil [[:append 91 787]
                                    [:append 94 915]
                                    [:append 96 633]]]],
        :debug-info [[{:tablet 72075186224210175, :gen 2, :read-version "v1736528601630/18446744073709551615", :op "read", :node 50004, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562950176977472}]
                     [{:tablet 72075186224210175, :node 50004, :gen 2, :op "kqp-tx", :tx-id 562950176977659, :version "v1736528601691/0"}]]}

Then:
  - T1 < T2, because T1 did not observe T2's append of 643 to 96.
  - T2 < T3, because T3 observed T2's append of 643 to key 96.
  - T3 < T4, because T3 did not observe T4's append of 655 to 8.
  - T4 < T5, because T5 observed T4's append of 350 to key 77.
  - T5 < T6, because T5 did not observe T6's append of 915 to 94.
  - However, T6 < T1, because T1 observed T6's append of 633 to key 96: a contradiction!
@snaury snaury added bug Something isn't working area/datashard Issues related to datashard tablets (relational table partitions) labels Jan 13, 2025
@snaury snaury self-assigned this Jan 13, 2025
@snaury
Copy link
Member Author

snaury commented Jan 13, 2025

When sorting by versions the order should be T4 -> T5 -> T6 -> T1 -> T2 -> T3. However T3 did not observe T4's append of 655 to 8 for some reason. My guess is that was :tablet 72075186224209701 which was in :gen 12 both during append and commit (distributed tx at v1736528601611/562950176977431) and a repeatable read (at v1736528601840/18446744073709551615). Note: this test was with volatile txs enabled.

@snaury
Copy link
Member Author

snaury commented Jan 13, 2025

The full log shows that the max read for key 8 was [:r 8 [1 27 42 44 45 49 58 79 85 89 90 104 120 136 139 140 144 146 155 169 176 200 202 214 225 245 258 263 281 287 297 298 304 310 318 328 331 335 341 344 358 365 369 382 387 390 394 400 409 411 414 426 431 433 437 443 446 447 456 457 463 470 475 480 493 494 499 505 496 510 513 515 531 539 540 553 562 582 584 590 596 598 606 616 623 626 632 634 638 642 644 647 648 641 654 655 656 663 672 674 679 682 685 687 703 706 723 730 741 742 744 749 760 762 768 776 789 807 809 811 812 824 829 831 846 851 855 871 877 885 889 894 900 903 908 913 916 924 928 950 952 955 962 968 979 980 990]] and clearly shows 655 was eventually visible.

Must be some kind of out-of-order execution bug. :-/

@snaury
Copy link
Member Author

snaury commented Jan 13, 2025

The first time 655 was observed was in this op:

{:index 140552, :time 195197752467, :type :ok, :process 284, :f :txn,
 :value [[:r 91 [6 11 17 12 24 31 38 39 49 60 66 67 61 83 93 105 113 126 135 145 147 152 159 171 161 175 187 195 193 200 204 208 209 210 211 217 221 232 235 237 240 241 243 246 251 256 259 255 267 269 270 280 283 292 297 303 306 313 319 322 335 344 340 356 362 363 364 370 371 373 374 381 382 385 390 395 394 401 410 415 416 435 448 457 467 478 479 489 483 494 500 512 520 526 532 542 550 552 556 559 569 570 572 574 581 582 584 589 592 594 600 603 607 609 622 628 633 640 647 653 671 674 684 687 698 703 710 712 719 485 742 746 755 758 762 775 778 779 787 788 789 799 798 800 816 823 830]]
         [:r 97 [1 29]]
         [:r 8 [1 27 42 44 45 49 58 79 85 89 90 104 120 136 139 140 144 146 155 169 176 200 202 214 225 245 258 263 281 287 297 298 304 310 318 328 331 335 341 344 358 365 369 382 387 390 394 400 409 411 414 426 431 433 437 443 446 447 456 457 463 470 475 480 493 494 499 505 496 510 513 515 531 539 540 553 562 582 584 590 596 598 606 616 623 626 632 634 638 642 644 647 648 641 654 655 656]]],
 :modified-txn [[:batch nil [[:r 91 nil]
                             [:r 97 nil]
                             [:r 8 nil]]]],
 :debug-info [[{:tablet 72075186224210175, :gen 2, :read-version "v1736528603490/18446744073709551615", :op "read", :node 50004, :read-id 0, :snapshot-repeatable true, :lock-tx-id 844425153306246}
               {:tablet 72075186224209701, :gen 12, :read-version "v1736528603490/18446744073709551615", :op "read", :node 50004, :read-id 0, :snapshot-repeatable true, :lock-tx-id 844425153306246}
               {:tablet 72075186224210175, :gen 2, :read-version "v1736528603490/18446744073709551615", :op "read", :node 50004, :read-id 0, :snapshot-repeatable true, :lock-tx-id 844425153306246}]
              [{:tablet 72075186224209701, :gen 12, :readonly true, :op "kqp-tx", :node 50004, :snapshot "v1736528603490/18446744073709551615", :tx-id 844425153306359, :snapshot-repeatable true, :version "v1736528603490/18446744073709551615"}
               {:tablet 72075186224210175, :gen 2, :readonly true, :op "kqp-tx", :node 50004, :snapshot "v1736528603490/18446744073709551615", :tx-id 844425153306359, :snapshot-repeatable true, :version "v1736528603490/18446744073709551615"}]]}

Shard 72075186224209701 was still in :gen 12 at that time, so no restarts.

@snaury
Copy link
Member Author

snaury commented Jan 13, 2025

Oh, and T4 didn't seem to use write locks. The read happened at ts 2025-01-10 20:03:21.530 MSK setting lock 562950176977329 then commit happened at ts 2025-01-10 20:03:21.611 MSK. Lock was set and checked/committed in the same generation without restarts, so state migration is unlikely to be the culprit. The incorrect read happened at ts 2025-01-10 20:03:21.840 MSK, and it should have waited for the tx 562950176977431 to complete, or discover changes in tx 562950176977431 during iteration and wait for it to be confirmed, but it clearly didn't happen for some reason.

@snaury
Copy link
Member Author

snaury commented Jan 13, 2025

There's also this op that received reply a bit earlier:

{:index 139752, :time 193590471301, :type :ok, :process 151, :f :txn,
 :value [[:append 8 656]],
 :modified-txn [[:batch nil [[:append 8 656]]]],
 :debug-info [[{:tablet 72075186224209701, :gen 12, :read-version "v1736528601991/18446744073709551615", :op "read", :node 50004, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562950176977803}]
              [{:tablet 72075186224209701, :node 50004, :gen 12, :op "kqp-tx", :tx-id 562950176977810, :version "v1736528602000/18446744073709551615"}]]}

The read was using a snapshot 1736528601991 which is greater than T3's 1736528601840, and it obviously observed 655, otherwise it wouldn't have appended the value 656 after 655. This doesn't really say anything though.

@snaury
Copy link
Member Author

snaury commented Jan 13, 2025

One idea was that UnprotectedReadEdge managed to become very large before the reboot by accepting a lot of transactions, which didn't commit, causing a newer generation to migrate to this large UnprotectedReadEdge before transactions have been observed. Unfortunately the tx we're interested in was planned for 20:03:21.611 MSK, and gen 12 started at 20:02:58.813 MSK, so gen 11 couldn't have possibly accepted this tx because it was clearly planned long after gen 12 has started. This means the problematic read should have been added to waiting iterators by having a snapshot greater than UnreadableEdge, and would have later paused by dependencies. Very strange. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datashard Issues related to datashard tablets (relational table partitions) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant