Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(enriching): csv file enrichment tables no longer drop the first row #22257

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

B-Schmidt
Copy link

Summary

  • Currently, file enrichment tables loaded from a csv file without headers loose the first row of data in that file
  • Fix this by keeping the first data row in a separate variable and then chaining it into the remaining data rows when parsing columns
  • Provide unit tests for loading a csv file both with and without header row

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

  • Unit tests are provided in the PR
  • Create data.csv
a,1
b,2
  • Create vector.yaml
enrichment_tables:
  data:
    type: file
    file:
      encoding:
        type: csv
        include_headers: false
      path: ./data.csv

sources:
  in:
    type: stdin

transforms:
  remap:
    inputs: [in]
    type: remap
    source: |
      .row = get_enrichment_table_record("data", {"0": .message}) ?? null
      . = {"message": .message, "row": .row}

sinks:
  out:
    type: console
    inputs: [remap]
    encoding:
      codec: json
  • Run vector -c vector.yaml providing the following input:
a
b
c
  • Expected output:
{"message": "a", "row": {"0": "a", "1": "1"}}
{"message": "b", "row": {"0": "b", "1": "2"}}
{"message": "c", "row": null}

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
    • make check-all is a good command to run locally. This check is
      defined here. Some of these
      checks might not be relevant to your PR. For Rust changes, at the very least you should run:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

@B-Schmidt B-Schmidt requested a review from a team as a code owner January 20, 2025 15:42
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @B-Schmidt!

A more idiomatic way would be to use reader.records().peekable() but I can see how that can get complicated with the existing implementation.

@pront pront enabled auto-merge January 22, 2025 20:36
@pront pront changed the title fix(enrichment_tables): Fix csv file enrichment tables without header dropping the first row fix(enriching): Fix csv file enrichment tables without header dropping the first row Jan 22, 2025
@pront pront changed the title fix(enriching): Fix csv file enrichment tables without header dropping the first row fix(enriching): csv file enrichment tables no longer drop the first row Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants