From 998b7da84b0dc4ff7e2cc7afc297188a6adff6ce Mon Sep 17 00:00:00 2001 From: Mahesh Raju Somalaraju Date: Wed, 11 May 2022 18:38:27 +0530 Subject: [PATCH] Partition related document changes --- src/main/webapp/ddl-of-carbondata.html | 57 +++++----------------- src/site/markdown/ddl-of-carbondata.md | 67 +++++--------------------- 2 files changed, 22 insertions(+), 102 deletions(-) diff --git a/src/main/webapp/ddl-of-carbondata.html b/src/main/webapp/ddl-of-carbondata.html index 6c988e4..7226d87 100644 --- a/src/main/webapp/ddl-of-carbondata.html +++ b/src/main/webapp/ddl-of-carbondata.html @@ -272,15 +272,12 @@

PARTITION

  • @@ -1118,8 +1115,6 @@

    PARTITION

    -

    -STANDARD PARTITION

    The partition is similar as spark and hive partition, user can use any column to build partition:

    Create Partition Table

    @@ -1143,14 +1138,20 @@

    STORED AS carbondata

    NOTE: Hive partition is not supported on complex data type columns.

    +

    Show Partitions

    This command gets the Hive partition information of the table

    SHOW PARTITIONS [db_name.]table_name
     

    +Add partition

    +

    This command adds the specified Hive partition

    +
    ALTER TABLE [db_name].table_name ADD PARTITION('new_partition')
    +
    +

    Drop Partition

    -

    This command drops the specified Hive partition only.

    +

    This command drops the specified Hive partition.

    ALTER TABLE table_name DROP [IF EXISTS] PARTITION (part_spec, ...)
     

    Example:

    @@ -1169,42 +1170,6 @@

    SELECT * FROM another_user au WHERE au.country = 'US'; -

    -Show Partitions

    -

    The following command is executed to get the partition information of the table

    -
    SHOW PARTITIONS [db_name.]table_name
    -
    -

    -Add a new partition

    -
    ALTER TABLE [db_name].table_name ADD PARTITION('new_partition')
    -
    -

    -Split a partition

    -
    ALTER TABLE [db_name].table_name SPLIT PARTITION(partition_id) INTO('new_partition1', 'new_partition2'...)
    -
    -

    -Drop a partition

    -

    Only drop partition definition, but keep data

    -
    ALTER TABLE [db_name].table_name DROP PARTITION(partition_id)
    -
    -

    Drop both partition definition and data

    -
    ALTER TABLE [db_name].table_name DROP PARTITION(partition_id) WITH DATA
    -
    -

    NOTE:

    -
      -
    • Hash partition table is not supported for ADD, SPLIT and DROP commands.
    • -
    • Partition Id: in CarbonData like the hive, folders are not used to divide partitions instead partition id is used to replace the task id. It could make use of the characteristic and meanwhile reduce some metadata.
    • -
    -
    SegmentDir/0_batchno0-0-1502703086921.carbonindex
    -          ^
    -SegmentDir/part-0-0_batchno0-0-1502703086921.carbondata
    -                   ^
    -
    -

    Here are some useful tips to improve query performance of carbonData partition table:

    -
      -
    • The partitioned column can be excluded from SORT_COLUMNS, this will let other columns to do the efficient sorting.
    • -
    • When writing SQL on a partition table, try to use filters on the partition column.
    • -

    BUCKETING

    Bucketing feature can be used to distribute/organize the table/partition data into multiple files such diff --git a/src/site/markdown/ddl-of-carbondata.md b/src/site/markdown/ddl-of-carbondata.md index 621f38c..97779db 100644 --- a/src/site/markdown/ddl-of-carbondata.md +++ b/src/site/markdown/ddl-of-carbondata.md @@ -56,12 +56,11 @@ CarbonData DDL statements are documented here,which includes: * [REFRESH TABLE](#refresh-table) * [COMMENTS](#table-and-column-comment) * [PARTITION](#partition) - * [STANDARD PARTITION(HIVE)](#standard-partition) - * [INSERT OVERWRITE PARTITION](#insert-overwrite) + * [CREATE PARTITION](#create-partition-table) * [SHOW PARTITIONS](#show-partitions) - * [ADD PARTITION](#add-a-new-partition) - * [SPLIT PARTITION](#split-a-partition) - * [DROP PARTITION](#drop-a-partition) + * [ADD PARTITION](#add-partition) + * [DROP PARTITION](#drop-partition) + * [INSERT OVERWRITE PARTITION](#insert-overwrite) * [BUCKETING](#bucketing) * [CACHE](#cache) @@ -876,8 +875,6 @@ Users can specify which columns to include and exclude for local dictionary gene ## PARTITION -### STANDARD PARTITION - The partition is similar as spark and hive partition, user can use any column to build partition: #### Create Partition Table @@ -915,10 +912,16 @@ Users can specify which columns to include and exclude for local dictionary gene ``` SHOW PARTITIONS [db_name.]table_name ``` +#### Add partition + This command adds the specified Hive partition. + + ``` + ALTER TABLE [db_name].table_name ADD PARTITION('new_partition') + ``` #### Drop Partition - This command drops the specified Hive partition only. + This command drops the specified Hive partition. ``` ALTER TABLE table_name DROP [IF EXISTS] PARTITION (part_spec, ...) ``` @@ -946,54 +949,6 @@ Users can specify which columns to include and exclude for local dictionary gene WHERE au.country = 'US'; ``` - -### Show Partitions - - The following command is executed to get the partition information of the table - - ``` - SHOW PARTITIONS [db_name.]table_name - ``` - -### Add a new partition - - ``` - ALTER TABLE [db_name].table_name ADD PARTITION('new_partition') - ``` - -### Split a partition - - ``` - ALTER TABLE [db_name].table_name SPLIT PARTITION(partition_id) INTO('new_partition1', 'new_partition2'...) - ``` - -### Drop a partition - - Only drop partition definition, but keep data - ``` - ALTER TABLE [db_name].table_name DROP PARTITION(partition_id) - ``` - - Drop both partition definition and data - ``` - ALTER TABLE [db_name].table_name DROP PARTITION(partition_id) WITH DATA - ``` - - **NOTE:** - * Hash partition table is not supported for ADD, SPLIT and DROP commands. - * Partition Id: in CarbonData like the hive, folders are not used to divide partitions instead partition id is used to replace the task id. It could make use of the characteristic and meanwhile reduce some metadata. - - ``` - SegmentDir/0_batchno0-0-1502703086921.carbonindex - ^ - SegmentDir/part-0-0_batchno0-0-1502703086921.carbondata - ^ - ``` - - Here are some useful tips to improve query performance of carbonData partition table: - * The partitioned column can be excluded from SORT_COLUMNS, this will let other columns to do the efficient sorting. - * When writing SQL on a partition table, try to use filters on the partition column. - ## BUCKETING Bucketing feature can be used to distribute/organize the table/partition data into multiple files such