Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some functions that return vertex or edge sequences ignore the value of the option return.vs.es #1614

Open
stibu81 opened this issue Dec 2, 2024 · 9 comments

Comments

@stibu81
Copy link
Contributor

stibu81 commented Dec 2, 2024

What happens, and what did you expect instead?

Several function that return vertex or edge sequences ignore the value of the option return.vs.es and return igraph.vs or igraph.es objects even if return.vs.es = FALSE. A few examples are demonstrated below.

I used two regex pattern to search of code that converts numeric indices to igraph.vs or igraph.es objects, respectively:

  • "create_(v|e)s" finds uses of create_vs() and create_es() (also when they are used inside lapply() etc.).
  • "(E|V)\([^)]\)\[" finds constructs of the form E(graph)[i + 1]

Most uses of those patterns are already combined with a check for the value of return.vs.es, but there are a few exceptions:

  • head_of()
  • tail_of()
  • as_adj_edge_list()
  • graph.get.isomorphisms.vf2()
  • graph.get.subisomorphisms.vf2()
  • graph.subisomorphic.lad()

I would expect that these functions return (lists of) numeric indices if return.vs.es = FALSE.

To reproduce

library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union

g <- make_tree(6, children = 2)
V(g)$name <- paste0("V", 1:6)

# make sure the option is set to the default
igraph_options(return.vs.es = TRUE)

# in this case, all results are as expected
head_of(g, E(g)[c(1, 4)])
#> + 2/6 vertices, named, from d6aad98:
#> [1] V2 V5
tail_of(g, E(g)[c(1, 4)])
#> + 2/6 vertices, named, from d6aad98:
#> [1] V1 V2
as_adj_edge_list(g)[1]
#> $V1
#> + 2/5 edges from d6aad98 (vertex names):
#> [1] V1->V2 V1->V3
graph.get.isomorphisms.vf2(g, g)[1]
#> [[1]]
#> + 6/6 vertices, named, from d6aad98:
#> [1] V1 V2 V3 V4 V5 V6
graph.get.subisomorphisms.vf2(g, g)[1]
#> [[1]]
#> + 6/6 vertices, named, from d6aad98:
#> [1] V1 V2 V3 V4 V5 V6
graph.subisomorphic.lad(g, g, all.maps = TRUE)$maps[1]
#> [[1]]
#> + 6/6 vertices, named, from d6aad98:
#> [1] V1 V2 V3 V4 V5 V6

# changing the option does not affect the results
igraph_options(return.vs.es = FALSE)

head_of(g, E(g)[c(1, 4)])
#> + 2/6 vertices, named, from d6aad98:
#> [1] V2 V5
tail_of(g, E(g)[c(1, 4)])
#> + 2/6 vertices, named, from d6aad98:
#> [1] V1 V2
as_adj_edge_list(g)[1]
#> $V1
#> + 2/5 edges from d6aad98 (vertex names):
#> [1] V1->V2 V1->V3
graph.get.isomorphisms.vf2(g, g)[1]
#> [[1]]
#> + 6/6 vertices, named, from d6aad98:
#> [1] V1 V2 V3 V4 V5 V6
graph.get.subisomorphisms.vf2(g, g)[1]
#> [[1]]
#> + 6/6 vertices, named, from d6aad98:
#> [1] V1 V2 V3 V4 V5 V6
graph.subisomorphic.lad(g, g, all.maps = TRUE)$maps[1]
#> [[1]]
#> + 6/6 vertices, named, from d6aad98:
#> [1] V1 V2 V3 V4 V5 V6

Created on 2024-12-02 with reprex v2.1.1

System information

R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=de_CH.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=de_CH.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] igraph_2.1.2

loaded via a namespace (and not attached):
[1] compiler_4.4.2 magrittr_2.0.3 cli_3.6.3 tools_4.4.2 rstudioapi_0.17.1
[6] lifecycle_1.0.4 pkgconfig_2.0.3 rlang_1.1.4

@stibu81
Copy link
Contributor Author

stibu81 commented Dec 2, 2024

I think I know how to fix this and am happy to do it. But I would appreciate a confirmation that for all the functions that I listed, the expected behaviour is to return numeric indices when return.vs.es = FALSE.

@maelle
Copy link
Contributor

maelle commented Dec 3, 2024

Thank you @stibu81 for the report!

@szhorvat any opinion?

@stibu81
Copy link
Contributor Author

stibu81 commented Dec 3, 2024

This might have similarly unpleasant consequences as #1606 so maybe it should be discussed and planned together with that topic.

@szhorvat
Copy link
Member

@szhorvat any opinion?

I agree with the principle of updating these functions to respect return.vs.es, but keep in mind that I rarely use R so there might be problematic consequences I am not thinking of.

I'd give these three an extra careful look:

  • head_of()

  • tail_of()

  • as_adj_edge_list()

I think this is ultimately an R programming / API decision that should be made by you and @krlmlr.

@krlmlr
Copy link
Contributor

krlmlr commented Jan 9, 2025

Agree that we want to be consistent here, but return.vs.es is something we may want to get rid of in the medium term.

You can achieve the desired behavior with as.integer(), see below. If it's slow, we may want to work on changing the default implementation by speeding up V(g) (where the most work is done) or with ALTREP. The goal of an ALTREP solution would be:

  • x <- head_of(g, E(g)[c(1, 4)]) is fast
  • as.integer(x) is still fast
  • igraph:::create_vs() is only called when x is printed or when the vertex data is used otherwise
options(conflicts.policy = list(warn = FALSE))
library(igraph)

g <- make_tree(6, children = 2)
V(g)$name <- paste0("V", 1:6)

# make sure the option is set to the default
igraph_options(return.vs.es = TRUE)

# in this case, all results are as expected
head_of(g, E(g)[c(1, 4)])
#> + 2/6 vertices, named, from 414f5c7:
#> [1] V2 V5
tail_of(g, E(g)[c(1, 4)])
#> + 2/6 vertices, named, from 414f5c7:
#> [1] V1 V2
as_adj_edge_list(g)[1]
#> $V1
#> + 2/5 edges from 414f5c7 (vertex names):
#> [1] V1->V2 V1->V3
graph.get.isomorphisms.vf2(g, g)[1]
#> [[1]]
#> + 6/6 vertices, named, from 414f5c7:
#> [1] V1 V2 V3 V4 V5 V6
graph.get.subisomorphisms.vf2(g, g)[1]
#> [[1]]
#> + 6/6 vertices, named, from 414f5c7:
#> [1] V1 V2 V3 V4 V5 V6
graph.subisomorphic.lad(g, g, all.maps = TRUE)$maps[1]
#> [[1]]
#> + 6/6 vertices, named, from 414f5c7:
#> [1] V1 V2 V3 V4 V5 V6

as.integer(head_of(g, E(g)[c(1, 4)]))
#> [1] 2 5
as.integer(tail_of(g, E(g)[c(1, 4)]))
#> [1] 1 2
as.integer(as_adj_edge_list(g)[[1]])
#> [1] 1 2
as.integer(graph.get.isomorphisms.vf2(g, g)[[1]])
#> [1] 1 2 3 4 5 6
as.integer(graph.get.subisomorphisms.vf2(g, g)[[1]])
#> [1] 1 2 3 4 5 6
as.integer(graph.subisomorphic.lad(g, g, all.maps = TRUE)$maps[[1]])
#> [1] 1 2 3 4 5 6

Created on 2025-01-09 with reprex v2.1.1

@szhorvat
Copy link
Member

szhorvat commented Jan 9, 2025

but return.vs.es is something we may want to get rid of in the medium term.

This option might have been added to allow better performance. I don't know if there is still a significant performance difference, but I seem to recall that at one point someone opened an issue claiming that there was one.

@schochastics
Copy link
Contributor

Just to confirm, this option still has quite some performance issues.

library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
set.seed(42)
g <- sample_gnp(500, 0.15, directed = FALSE)
names <- glue::glue("V{1:500}")
V(g)$name <- names
f_false <- function(g){
  igraph_options(return.vs.es = FALSE)
  max_cliques(g)
}

f_true <- function(g){
  igraph_options(return.vs.es = TRUE)
  max_cliques(g)
}

bench::mark(
  check = FALSE,
  iterations = 1,
  f_false(g),
  f_true(g)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 f_false(g)     33ms     33ms     30.3      684KB     30.3
#> 2 f_true(g)     268ms    268ms      3.73     923KB     48.4

Created on 2025-01-09 with reprex v2.1.1

the culprit is here

rigraph/R/cliques.R

Lines 270 to 272 in 1499403

if (igraph_opt("return.vs.es")) {
res <- lapply(res, unsafe_create_vs, graph = graph, verts = V(graph))
}

I am afraid there is no easy way to speed this up, besides what @krlmlr suggested.

@krlmlr
Copy link
Contributor

krlmlr commented Jan 13, 2025

A bit more detail on the data returned here.

The output is a list of 37k objects, all of the same structure. The difference is in the names, the class, and some auxiliary information. Creating the latter two should be cheap (in particular if done in a loop), I suspect that extracting the "names" attribute is what's expensive here. If we store the names in an ALTREP vector that is computed only on demand, we might achieve similar performance.

options(conflicts.policy = list(warn = FALSE))
library(igraph)

set.seed(42)
g <- sample_gnp(500, 0.15, directed = FALSE)
names <- glue::glue("V{1:500}")
V(g)$name <- names

igraph_options(return.vs.es = TRUE)
named <- max_cliques(g)
length(named)
#> [1] 37152
names(named[[1]])
#> [1] "V464" "V493" "V236" "V129"
constructive::construct(named[[1]])
#> c(V464 = 464L, V493 = 493L, V236 = 236L, V129 = 129L) |>
#>   structure(
#>     class = "igraph.vs",
#>     env = rlang::new_weakref(constructive::.env("0x10b8841c0")),
#>     graph = "fd9a2703-33fd-4f77-89ab-15f9cb70c245"
#>   )

igraph_options(return.vs.es = FALSE)
pure <- max_cliques(g)
length(pure)
#> [1] 37152
names(pure[[1]])
#> NULL
constructive::construct(pure[[1]])
#> c(464, 493, 236, 129)

waldo::compare(named[[1]], pure[[1]])
#> `old` is an S3 object of class <igraph.vs>, an integer vector
#> `new` is a double vector (464, 493, 236, 129)
waldo::compare(unclass(named[[1]]), pure[[1]])
#> `old` is an integer vector (464, 493, 236, 129)
#> `new` is a double vector (464, 493, 236, 129)
waldo::compare(unclass(named[[1]]), as.integer(pure[[1]]))
#> `names(old)` is a character vector ('V464', 'V493', 'V236', 'V129')
#> `names(new)` is absent
#> 
#> `attr(old, 'env')` is a weak reference
#> `attr(new, 'env')` is absent
#> 
#> `attr(old, 'graph')` is a character vector ('fd9a2703-33fd-4f77-89ab-15f9cb70c245')
#> `attr(new, 'graph')` is absent

Created on 2025-01-13 with reprex v2.1.1

@stibu81
Copy link
Contributor Author

stibu81 commented Jan 14, 2025

I just wanted to quickly confirm that it was performance that made me look into return.vs.es. I use igraph to prepare a specific data structure from a tree graph in a shiny app, so this needs to be fast, because the user is waiting on the result. I did several things to speed up the performance by about a factor 1000 and this option was very important in this, even though I cannot remember how much of the performance gain it caused. Simply removing this option without improving the performance considerably would make igraph unusable for my use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants