Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create and arrow utility to materialize ScanResult into record batches #592

Open
zachschuermann opened this issue Dec 12, 2024 · 0 comments · May be fixed by #621
Open

create and arrow utility to materialize ScanResult into record batches #592

zachschuermann opened this issue Dec 12, 2024 · 0 comments · May be fixed by #621
Assignees

Comments

@zachschuermann
Copy link
Collaborator

in many places we've duplicated the following (copied from delta-sharing's) arrow materialization, should probably just create some utility and expose it behind an arrow feature flag

fn try_create_record_batch_iter(
    results: impl Iterator<Item = DeltaResult<ScanResult>>,
    result_schema: ArrowSchemaRef,
) -> RecordBatchIterator<impl Iterator<Item = Result<RecordBatch, ArrowError>>> {
    let record_batches = results.map(|res| {
        let scan_res = res.and_then(|res| Ok((res.full_mask(), res.raw_data?)));
        let (mask, data) = scan_res.map_err(|e| ArrowError::from_external_error(Box::new(e)))?;
        let record_batch: RecordBatch = data
            .into_any()
            .downcast::<ArrowEngineData>()
            .map_err(|_| ArrowError::CastError("Couldn't cast to ArrowEngineData".to_string()))?
            .into();
        match mask {
            Some(mask) => filter_record_batch(&record_batch, &mask.into()),
            None => Ok(record_batch),
        }
    });
    RecordBatchIterator::new(record_batches, result_schema)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant