Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table.update_schema() continues to commit when exist with an exception #1505

Closed
1 of 3 tasks
jiakai-li opened this issue Jan 10, 2025 · 3 comments
Closed
1 of 3 tasks

Comments

@jiakai-li
Copy link
Contributor

jiakai-li commented Jan 10, 2025

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

While working on the issue of #1493 I noticed when running below code, the field of field_should_not_exist will still be added to the table.

from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import IntegerType, NestedField, StringType

WAREHOUSE_PATH = "/workspaces/iceberg-python/warehouse"
catalog = SqlCatalog(
    "default",
    uri=f"sqlite:///{WAREHOUSE_PATH}/pyiceberg_catalog.db", warehouse=f"file://{WAREHOUSE_PATH}",
)
catalog.create_namespace_if_not_exists("default")

try:
    catalog.drop_table("default.test")
except:
    pass

schema = Schema(
    NestedField(1, "field1", StringType(), required=False)
)

table = catalog.create_table("default.test", schema)

with table.update_schema() as update:
    update.add_column("field_should_not_exist", IntegerType())
    raise Exception("Error!")

The behaviour is a bit confuse to me. Due to an exception is raised within the context manager, I would expect the transaction to be aborted. But seems commit is currently the only option to exit the context manager:

class UpdateTableMetadata(ABC, Generic[U]):
    _transaction: Transaction

    def __init__(self, transaction: Transaction) -> None:
        self._transaction = transaction

    @abstractmethod
    def _commit(self) -> UpdatesAndRequirements: ...

    def commit(self) -> None:
        self._transaction._apply(*self._commit())

    def __exit__(self, _: Any, value: Any, traceback: Any) -> None:
        """Close and commit the change."""
        self.commit()  # <----- we currently ignore the exc_type

    def __enter__(self) -> U:
        """Update the table."""
        return self  # type: ignore

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@jiakai-li jiakai-li changed the title table.update_schema() commits when context manager exist with an error table.update_schema() still commits when exist with an error Jan 10, 2025
@jiakai-li jiakai-li changed the title table.update_schema() still commits when exist with an error table.update_schema() still commits when exist with an exception Jan 10, 2025
@jiakai-li jiakai-li changed the title table.update_schema() still commits when exist with an exception table.update_schema() continues to commit when exist with an exception Jan 10, 2025
@kevinjqliu
Copy link
Contributor

kevinjqliu commented Jan 10, 2025

Hey @jiakai-li thanks for raising this issue!
The root cause is the same as the one i raised in #1253 and #1497

Specifically, when using UpdateSchema as a "transaction" and a context manager, I expect the transaction to be atomic

with table.update_schema() as update:
    update.add_column("field_should_not_exist", IntegerType())
    raise Exception("Error!")

This example shows that the first add_column is processed, even though the transaction errored and should not apply any updates.

The core issue is from setting Transaction's autocommit=True in UpdateSchema (source)
This will cause every update to be "applied"/"flushed" as soon as possible (source)

As mentioned in #1253, i think we should get rid of autocommit ASAP, as there are new table update PRs trying to use it. (#1500 #1285)

@jiakai-li
Copy link
Contributor Author

Thank you @kevinjqliu , I'm closing this issue to avoid duplicates then. Let's keep the discussion at #1253

@kevinjqliu
Copy link
Contributor

I like the example you have above! Do you mind transferring it to #1253?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants