Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading all null values from azure delta table. #121

Open
wsmckenz opened this issue Nov 20, 2024 · 1 comment
Open

Reading all null values from azure delta table. #121

wsmckenz opened this issue Nov 20, 2024 · 1 comment

Comments

@wsmckenz
Copy link

HI,

I am trying to read data from a delta table in azure storage. The table was created using databricks (not sure if that matters). It is overall working. I can connect, I can describe the table, and I can select the correct number of rows, but all of the return values are null. I am using the nodejs client. The code looks like this:

const duckdb = require('duckdb');
const db = new duckdb.Database(':memory:', {
    "access_mode": "READ_WRITE",
    "max_memory": "512MB",
    "threads": "4"
  }, errHandler);

db.exec("FORCE INSTALL AZURE; LOAD AZURE; FORCE INSTALL DELTA from core_nightly; LOAD DELTA", errHandler)
db.exec(`CREATE SECRET azure_spn ( 
  TYPE AZURE,
  PROVIDER SERVICE_PRINCIPAL,
  TENANT_ID '${AUTH_CONFIG.credentials.tenantID}',
  CLIENT_ID '${AUTH_CONFIG.credentials.clientID}',
  CLIENT_SECRET '${AUTH_CONFIG.credentials.clientSecret}',
  ACCOUNT_NAME 'xxxxxxxxxxxxx'
)`, errHandler);

db.all(`DESCRIBE SELECT * FROM delta_scan('abfss://produced/qep/country')`, (err: Error, res: any) => {
  if (err) { console.error(err); } else console.dir(res); });

db.all(`SELECT * FROM delta_scan('abfss://produced/qep/country')`, (err: Error, res: any) => {
  if (err) { console.error(err); } else console.dir(res); });

But the output looks like this:

[
  {
    column_name: 'CreatedDate',
    column_type: 'VARCHAR',
    null: 'YES',
    key: null,
    default: null,
    extra: null
  },
  {
    column_name: 'EntryId',
    column_type: 'BIGINT',
    null: 'YES',
    key: null,
    default: null,
    extra: null
  },
  {
    column_name: 'ISOPhoneCode',
    column_type: 'VARCHAR',
    null: 'YES',
    key: null,
    default: null,
    extra: null
  },

 etc... more columns
]
[
  {
    CreatedDate: null,
    EntryId: null,
    ISOPhoneCode: null,
    ModifiedDate: null,
    Country_entryListId: null,
    Country_id: null,
    Country_name: null,
    Country_type: null,
    CreatedBy_name: null,
    GlobalGeography_name: null,
    ModifiedBy_name: null,
    TelephoneCode_name: null
  },
  {
    CreatedDate: null,
    EntryId: null,
    ISOPhoneCode: null,
    ModifiedDate: null,
    Country_entryListId: null,
    Country_id: null,
    Country_name: null,
    Country_type: null,
    CreatedBy_name: null,
    GlobalGeography_name: null,
    ModifiedBy_name: null,
    TelephoneCode_name: null
  },
  {
    CreatedDate: null,
    EntryId: null,
    ISOPhoneCode: null,
    ModifiedDate: null,
    Country_entryListId: null,
    Country_id: null,
    Country_name: null,
    Country_type: null,
    CreatedBy_name: null,
    GlobalGeography_name: null,
    ModifiedBy_name: null,
    TelephoneCode_name: null
  },

/ / etc, correct number of rows, but every column value is NULL.

@samansmink
Copy link
Collaborator

Hi @wsmckenz thanks for reporting! Would you mind checking that delta-rs does correctly scan your data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants