Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot read excel files using the V2 API #896

Open
2 tasks done
massazan opened this issue Oct 7, 2024 · 5 comments
Open
2 tasks done

[BUG] Cannot read excel files using the V2 API #896

massazan opened this issue Oct 7, 2024 · 5 comments

Comments

@massazan
Copy link

massazan commented Oct 7, 2024

Am I using the newest version of the library?

  • I have made sure that I'm using the latest version of the library.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When using the V2 API.
When using the version 0.20.4, the following error occurs: ClassCastException: scala.Some cannot be cast to [Lorg.apache.spark.sql.catalyst.InternalRow;
Error occurs when you omit the end boundary cell on the DataAddress parameter i.e "'0'!A5"

Error is occurs for Scala and PySpark

Expected Behavior

Spark DataReader should return a DataFrame with no errors

Steps To Reproduce

Error occurs when you omit the end boundary cell on the DataAddress parameter i.e "'0'!A5"

val configs = Map(
"inferSchema" -> "false",
"dataAddress" -> "'0'!A5",
"header" -> "false"
)

// Ensure you're using the spark-excel package
val df = spark.read.format("excel")
.option("header", configs("header"))
.option("inferSchema", configs("inferSchema"))
.option("dataAddress", configs("dataAddress"))
.load(s3_path)

df.show()

Environment

- Spark version: DataBricks Runtime version: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
- Spark-Excel version:com.crealytics:spark-excel_2.12:3.4.1_0.20.4
- OS:
- Cluster environment

Anything else?

API V1 works fine.

Copy link

github-actions bot commented Oct 7, 2024

Please check these potential duplicates:

@nightscape
Copy link
Owner

@massazan looks like this one: #808

@massazan
Copy link
Author

massazan commented Oct 9, 2024

Hi @nightscape, yes it is the same issue. I tried to install the artifact 3.4.2 as mentioned, but I still got problems with the DataBricks Runtime 13.3. I tried on the Runtime 14.3 LTS and it works.
Is there any plans to solve the problem fro the Runtime 13.3?

Thanks

@sramesh-nlg
Copy link

Hi . I am also facing the same issue, When i am trying to read a excel from azure adls storage.
Error message -
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2.getPaths$(Lorg/apache/spark/sql/execution/datasources/v2/FileDataSourceV2;Lorg/apache/spark/sql/util/CaseInsensitiveStringMap;

i tried with both 13.3 LTS and 14.3 .

Environment

  • Spark version: DataBricks Runtime version: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)
  • Spark-Excel version: com.crealytics:spark-excel_2.13:3.3.4_0.20.4
  • OS:
  • Cluster environment

i have tried with 2.13:3.5.1 as well but still the same issue

@nightscape
Copy link
Owner

@sramesh-nlg you always need to use the version of spark-excel that best matches the Spark version:
https://mvnrepository.com/artifact/com.crealytics/spark-excel

@massazan unfortunately DataBricks has a little bit of a habit of breaking API compatibility with the officially released Spark versions...
I don't plan to fix issues with DataBricks as I'm not currently using it myself.
We're very open to PRs though 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants