Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time-related types to the storage and transactional interfaces. #2437

Merged
merged 17 commits into from
Jan 8, 2025

Conversation

Torch3333
Copy link
Contributor

@Torch3333 Torch3333 commented Dec 23, 2024

Description

This implements the CRUD interface for the DATE, TIME, TIMESTAMP, and TIMESTAMPTZ types.

The limitations are:

  • On Oracle: ScalarDB TIMESTAMPTZ cannot be used as a partition or clustering key because its storage type TIMESTAMP WITH TIME ZONE cannot be a primary key. Using it as a secondary index is possible.
  • On Cassandra, the TIMESTAMP type is currently not supported but can be in a future ScalarDB version

Remaining work:

  • Add integration tests for the two-phase commit interface
  • Support the importing table feature

Related issues and/or PRs

Changes made

Added support to CRUD operation of the storage and transactional interface for the DateColumn, TimeColumn, TimestampColumn and TimestampTZColumn

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

Sorry for the huge number of changes, most of them concern constructor or builder method but I commented on the area that require more careful review.

This PR will be merged into the feature branch add_time_related_types

Release notes

N/A

@Torch3333 Torch3333 force-pushed the add_time_related_types_CRUD branch from 1360d38 to be37237 Compare December 23, 2024 06:55
@Torch3333 Torch3333 self-assigned this Dec 23, 2024
@Torch3333 Torch3333 force-pushed the add_time_related_types_CRUD branch from 0e70eb5 to 57cbb31 Compare December 23, 2024 09:08
@Torch3333 Torch3333 changed the title Add time related types crud Add time-related types to the storage and transactional interfaces. Dec 23, 2024
@@ -45,4 +50,25 @@ protected boolean isParallelDdlSupported() {
}
return super.isParallelDdlSupported();
}

@Override
protected Stream<Arguments> provideColumnsForCNFConditionsTest() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the test DistributedStorageCrossPartitionScanIntegrationTestBase.scan_WithConjunctiveNormalFormConditionsShouldReturnProperResult for Oracle, SQLServer, and SQLite to reduce the size of the condition

getColumns().entrySet().stream()
.collect(
Collectors.toMap(
Entry::getKey, e -> ScalarDbUtils.toValue(e.getValue())))));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do not add the implementation of the Value<T> type for time-related types because it is deprecated.
I removed any code from src/main that use Value<?> objects. Some unit and integration tests still use it extensively but that will need to be taken care of later on.

@@ -690,6 +690,42 @@ public enum CoreError implements ScalarDbError {
""),
DATA_LOADER_ERROR_METHOD_NULL_ARGUMENT(
Category.USER_ERROR, "0151", "Method null argument not allowed", "", ""),
OUT_OF_RANGE_COLUMN_VALUE_FOR_DATE(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some error codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-wong Could you please take a look at these error messages?

* calendar system, such as 16:15:30, and can be expressed with microsecond precision.
*/
@Immutable
public class TimeColumn implements Column<LocalTime> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the TimeColumn type. Its value is backed by a LocalTime object.

* calendar system, such as 2007-12-03
*/
@Immutable
public class DateColumn implements Column<LocalDate> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the DateColumn type. Its value is backed by the LocalDate object.

Comment on lines 375 to 376
return LocalDateTime.of(
config.getOracleTimeColumnDefaultDateComponent(), column.getTimeValue());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encode any TIME value in Oracle with the default date component set in the configuration.

@@ -131,4 +143,56 @@ default String getPattern(LikeExpression likeExpression) {
default @Nullable String getSchemaName(String namespace) {
return namespace;
}

default Object encodeDate(DateColumn column) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding of time-related types for JDBC is not straightforward as some drivers have specificities regarding time-zone handling or the use of dates before the transition from the Julian to the Gregorian Calendar on October 15th, 1582.

}

@Test
public void
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two integration tests put_WithProblematicDateBecauseOfJulianToGregorianCalendarTransition_ShouldPutCorrectly and put_forTimeRelatedTypesWithVariousJvmTimezone_ShouldPutCorrectly to cover for some corner cases.
See the comment in the test for more details.

@@ -31,7 +31,7 @@ subprojects {
commonsDbcp2Version = '2.13.0'
mysqlDriverVersion = '8.4.0'
postgresqlDriverVersion = '42.7.4'
oracleDriverVersion = '21.16.0.0'
oracleDriverVersion = '23.6.0.24.10'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrade the Oracle driver to the latest version to facilitate the handling of dates before October 15th, 1582.

@@ -93,10 +111,32 @@ protected String getNamespaceBaseName() {
}

private void createTables() throws ExecutionException {
TableMetadata.Builder tableMetadata =
Copy link
Contributor Author

@Torch3333 Torch3333 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up until now DistributedTransactionIntegrationTestBase was performing tests on a table composed of columns of INT type only.
I updated several tests that targeted the most generic use case for each operation to target all data types.

@Torch3333 Torch3333 marked this pull request as ready for review December 23, 2024 10:16
return super.getPartitionKeyTypes().stream()
.filter(
type -> {
if (JdbcTestUtils.isOracle(rdbEngine) && type == DataType.TIMESTAMPTZ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is a bit complicated and shouldn't be duplicated. How about moving this to a util class to reuse it?

Copy link
Contributor Author

@Torch3333 Torch3333 Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I created a util method JdbcTestUtils.filterDataTypes(...). Thank you.
adcbfde


@Override
protected final void finalize() throws Throwable {
// TODO delete this method once https://github.com/scalar-labs/scalardb/pull/2421 is merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2421 is merged 👍

Copy link
Contributor Author

@Torch3333 Torch3333 Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will do that in a separate PR because rebasing the feature branch add_time_related_types now would mess up this PR I think 🤔

@Override
public DateTimeOffset encodeTimestampTZ(TimestampTZColumn column) {
assert column.getTimestampTZValue() != null;
// When using SQLServer DATETIMEOFFSET data type, we should use the SQLServer JDBC driver's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean using Instant or LocalDateTime fails with SQLServer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for encoding data, only using the SQLServer driver microsoft.sql.DateTimeOffset type worked for the range of data I tested.
Decoding through a Java java.time.OffsetDateTime object worked, if my memory is correct, microsoft.sql.DateTimeOffset was ok too.

@Torch3333 Torch3333 requested a review from komamitsu December 25, 2024 06:32
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Great work!

@brfrn169 brfrn169 requested a review from josh-wong December 25, 2024 14:19
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Left several comments. Please take a look when you have time!

@@ -690,6 +690,42 @@ public enum CoreError implements ScalarDbError {
""),
DATA_LOADER_ERROR_METHOD_NULL_ARGUMENT(
Category.USER_ERROR, "0151", "Method null argument not allowed", "", ""),
OUT_OF_RANGE_COLUMN_VALUE_FOR_DATE(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-wong Could you please take a look at these error messages?

core/src/main/java/com/scalar/db/io/Key.java Outdated Show resolved Hide resolved
Comment on lines 13 to 17
/**
* This class provides utility methods for encoding and decoding time related column value for
* DynamoDB, CosmosDB and SQLite
*/
public final class TimeRelatedColumnEncodingUtils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this class under com.scalar.db.util?

@Torch3333 Torch3333 requested a review from brfrn169 December 26, 2024 00:51
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments and suggestions. PTAL!

Torch3333 and others added 4 commits December 27, 2024 13:15
…scalardb into add_time_related_types_CRUD

# Conflicts:
#	core/src/main/java/com/scalar/db/common/error/CoreError.java
@Torch3333 Torch3333 requested a review from josh-wong December 27, 2024 04:40
Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, there was one minor thing I noticed about the wording, so I've added some suggestions. PTAL!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good to me. Thank you!
I left a few comments. PTAL!

* precision.
*/
@Immutable
public class TimestampTZColumn implements Column<Instant> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class TimestampTZColumn implements Column<Instant> {
public class TimestampTzColumn implements Column<Instant> {

Probably? (based on the naming convention in the style guide)

Copy link
Contributor Author

@Torch3333 Torch3333 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also hesitated between TimestampTZ and TimestampTz but chose the first one for the following reasons:

  • My understanding of the Google Java style guide differs from yours. Since time zone is written in two words, it recommends capitalizing the T and the Z.
  • Compared to TimestampColumn, TimestampTZColumn stands out more than TimestampTzColumn. So, it is easier to spot the difference between the two types when reading code and writing code using auto-completion.
  • I could only find two Java classes with a similar naming and both used the capital letter version. For example in Apache Hive and Spark

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it makes sense. So, let's keep it as is.

import javax.annotation.Nullable;

/**
* An interface to hide the difference between underlying JDBC SQL engines in SQL dialects, error
* codes, and so on. It's NOT responsible for actually connecting to underlying engines.
*/
public interface RdbEngineStrategy {
public interface RdbEngineStrategy<T_DATE, T_TIME, T_TIMESTAMP, T_TIMESTAMPTZ> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but having four time-related generics in this interface might be too much since this handles many other data types.
How about pushing the time-related type things to other interface and classes?
For example, how about creating a time-specific strategy that converts the default type to a DB-specific type?
For example, we can create RdbEngineTimeTypeStrategy, and it converts time-related types as necessary as follows.

preparedStatement.setObject(index++, rdbEngineTimeType.convert(rdbEngine.encodeTimestampTZ(column)));

Copy link
Contributor Author

@Torch3333 Torch3333 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good, thank you.
I reverted an earlier commit based on @komamitsu suggestion in 0eee567 then added your suggestion in
fe3be8d

Copy link
Member

@josh-wong josh-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!🙇🏻‍♂️

Does this PR need to have labels and projects selected? Or is it not necessary since it'll be merged into an existing branch where other work is being done?

@Torch3333
Copy link
Contributor Author

@josh-wong

LGTM! Thank you!🙇🏻‍♂️

Does this PR need to have labels and projects selected? Or is it not necessary since it'll be merged into an existing branch where other work is being done?

Yes, the current PR is merged into a temporary feature branch, so we don't add labels or projects. I will add tags and labels to the PR to merge the feature branch into master.

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@feeblefakie feeblefakie merged commit f851464 into add_time_related_types Jan 8, 2025
48 checks passed
@feeblefakie feeblefakie deleted the add_time_related_types_CRUD branch January 8, 2025 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants