Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with duplicated or missing names #4

Open
ThomasGro opened this issue Aug 27, 2019 · 2 comments
Open

issues with duplicated or missing names #4

ThomasGro opened this issue Aug 27, 2019 · 2 comments

Comments

@ThomasGro
Copy link

I want to extract data from a big and deep XML file. I followed your example 2 workflows with library(xml2) and it worked to generate a list of terminal nodes and xpaths. I can also run the "xml_dig_df" value extraction step. But the transformation of the data to a dataframe failed.

purrr::map(dplyr::bind_rows) => throws error "Argument 31 must have names"

Also, if I run the library(xml) workflow, setnames throws an error

"Can't assign 1 names to a 113576 column data.table"

traceback()
9: stop("Can't assign ", length(old), " names to a ", ncol(x), " column data.table")
8: setnames(x, value)
7: names<-.data.table(*tmp*, value = fields)
6: names<-(*tmp*, value = fields)
5: FUN(X[[i]], ...)
4: lapply(terminal_xpaths, xml_to_df, file = "cellosaurus.xml",
is_xml = FALSE, dig = FALSE)
3: eval(lhs, parent, parent)
2: eval(lhs, parent, parent)
1: lapply(terminal_xpaths, xml_to_df, file = "cellosaurus.xml",
is_xml = FALSE, dig = FALSE) %>% dplyr::bind_cols()

Is there a way to dig into the nested list of the
purrr::map(dplyr::bind_rows)
output to find our where/what the issue is ?

thank you
Thomas

@dantonnoriega
Copy link
Owner

resolved by #8 I think? @lecy did you get similar errors?

@lecy
Copy link
Contributor

lecy commented Mar 22, 2023

It looks like it might be. I think the pull request just needs to be merged for the updates to take effect, then you can test it out.

The default argument in tibble() was to check whether names are unique but not repair them if they weren't. I changed it to make names unique. I suspect that would address the error but there is no reproducible example so not 100% sure.

as_tibble( .name_repair = "unique" )

.name_repair argument: Treatment of problematic column names:

"minimal": No name repair or checks, beyond basic existence,
"unique": Make sure names are unique and not empty,
"check_unique": (default value), no name repair, but check they are unique,
"universal": Make the names unique and syntactic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants