Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import multiple tables at same time - 1 #2191

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

makalaaneesh
Copy link
Collaborator

@makalaaneesh makalaaneesh commented Jan 16, 2025

Describe the changes in this pull request

  • decouple batch producing and batch submitting logic to allow for importing multiple tables at the same time.
  • Refactor batch producing logic into a FileBatchProducer

Describe if there are any user-facing changes

How was this pull request tested?

Wrote unit tests.
To run integration tests:

  • resumption tests
  • long running tests

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB Yes/No
Name registry json Yes/No
Data File Descriptor Json Yes/No
Export Snapshot Status Json Yes/No
Import Data State Yes/No
Export Status Json Yes/No
Data .sql files of tables Yes/No
Export and import data queue Yes/No
Schema Dump Yes/No
AssessmentDB Yes/No
Sizing DB Yes/No
Migration Assessment Report Json Yes/No
Callhome Json Yes/No
YugabyteD Tables Yes/No
TargetDB Metadata Tables Yes/No

@makalaaneesh makalaaneesh marked this pull request as ready for review January 20, 2025 05:45
Copy link
Contributor

@priyanshi-yb priyanshi-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

Comment on lines 1009 to 1011
if err != nil {
utils.ErrExit("preparing for file import: %s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this PrepareForFileImport here? we are already doing it in NewFileBatchProducer

lastBatchNumber: lastBatchNumber,
lastOffset: lastOffset,
fileFullySplit: fileFullySplit,
completed: completed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: completed: len(pendingBatches) == 0 && fileFullySplit

return nil, err
}
if p.lineFromPreviousBatch != "" {
err = batchWriter.WriteRecord(p.lineFromPreviousBatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for explaining about this lineFromPreviousBatch

}

// 3 batches should be produced
// while calculating for the first batch, the header is also considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, while preparing the first batch - we add the bytes of the header to the batch's total bytes but for the further batches, we don't as we already have the header we don't include it in the batch's bytes.
I think worth testing if in some cases where the number of columns is huge can this header's bytes can also contribute to the batches' bytes and should be included.
Can you please add a TODO while we are adding a header to bthe atch file to fix this if required?

assert.NotNil(t, batch1)
assert.Equal(t, int64(2), batch1.RecordCount)

// simulate a crash and recover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test for recovery situation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants