You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope that fixed-length key values can be considered when designing the format. Many times, keys and values can be fixed-length (such as u64 id - file hash). I believe that fixed-length fields can be optimized a lot.
I hope that fixed-length key values can be considered when designing the format. Many times, keys and values can be fixed-length (such as u64 id - file hash). I believe that fixed-length fields can be optimized a lot.
I'm not sure if fixed lengths can really be optimized in block based tables. You would at most save 3 byte per K-V pair for a lot of added complexity. It could save you some decent space for huge data sets, but not in block-based tables, and right now I don't plan on adding other types of tables.
compressing it into parquet format.
Parquet is a column-based format with row groups. There is no notion of columns or rows here, so I'm not sure there is an advantage over packed K-V blocks. I have some interest in implementing an alternative block format that is row group based. The current blocks are KVKVKVKV, but an alternative Parquet-esque format could be KKKKVVVV, which would allow for better compression, depending on the values.
API
FlushMode
aliasbloom
feature by defaultData format
The text was updated successfully, but these errors were encountered: