Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function name streamlining #2

Open
BPJandree opened this issue Oct 29, 2020 · 2 comments
Open

function name streamlining #2

BPJandree opened this issue Oct 29, 2020 · 2 comments

Comments

@BPJandree
Copy link

intro

In the package, I see that currently

  1. There are functions to interact with the catalog which are in catalog_entries.R. This seems to be mostly on the "request" side of things. Maybe some simplifications can be made, but I have not yet carefully reviewed this.
  2. There are some functions in datasets.R. I don't immediatly see a common theme across the functions in here but the create functions seems to be important and most of my comments below have to do with this "push" side of the package.
  3. There are external_resources.R, I have not yet carefully looked at this.
  4. There are seperate functions that wrap around the create function, most of my comments are on this. All of these functions follow type_add syntax.

On naming convention, classes and dispatching methods

Suggestion 1: naming of functions that share a common goal

Looking at the code, the current naming convention of "add" methods (that wrap around create) follows:

script_add
image_add
# etc

My suggestion is to adopt the following:

add.script
add.image
# etc

This allows creating a generic add method that dispatches different functions depending on the "class type" of the supplied input (the metadata scheme). I will explain this simplification in more detail below.

Suggestion 2: Harmonized use of a single main function - wrapper versus dispatch

Not suggested: wrapper with if-else approach

In the current package we have different type_add functions. We could write one main add function with a type= field (this is essentially taking current situation, and adding one wrapper function with a lot of if-else statements):

  # different type_add functions that have the same goal (in this case to "print" something") but a slightly different execution (they print something different)
  a_add <-function(x){
    print("a")
  }
  b_add <- function(x){
    print("b")
  }

 # main wrapper that uses the add_type function based on a "type" field and if-else statements
  add_ifelse <- function(x, type){
    if(type=="a"){
        a_add(x)
    }
    if(type=="b"){
        b_add(x)
    }
  }

 # test the code:
  add_ifelse("metadata type a", type="a")
  add_ifelse("metadata type b", type="b")

The main add_ifelse function takes inputs x and type, checks the type field using an if statement, and passes on x to the appropriate type_add(x) function. For the user, this is slightly easier than having to figure out him/herself which type_add function to use out of a possibly long list of variations. In particular, in the current sitaution, each time a new metadata scheme gets added, a new type_add function need to be added to the package and the user manual grows and the user needs to write new code that calls these different functions. With a single wrapper, everything stays the same for the user who only makes use of the main add_ifelse function written above.

The problem with this "if-else" approach is that the complexity of the add_ifelse grows over time with more and more statements needing to be added. It would be nice if R could take care of this "type-checking" automatically. With the current approach, it is also easy to make mistakes like the following:

add_ifelse("metadata type a", type="b")

Which would apply the b_add function to a metadata scheme of type "a". This is likely a situation that should not be possible at all and lead to API errors that can be difficult to debugg.

Suggested: dispatch methods using classes

Using the function naming convention add.type I proposed, the code above could be simplified and made idiot-proof as follows:

# main add function that dispatches based on class type:
  add <- function(x)  {
    UseMethod('add')  
  }

# methods for different class types
  add.a <-function(x){print("a")}
  add.b <-function(x){print("b")}

# test it:
  x=1 # create some object
  oldClass(x)<-"a" # assign class "a" to it

  # this automatically dispatches add.a because x is of type "a"
  add(x)

  # now create something of type "b" 
  y=1
  oldClass(y) <-"b"
  # our main add function will now dispatch add.b because y is of class "b"
  add(y)

of course the oldClass() <- parts can be automated inside some function so that the user does not need to this. For example, if certain metadata schemes have unique fields that distinguish them from others, this could also be inferred in a "smart" way. For example, if reports are always documented by metadata scheme "document", and reports always carry a field "report number" then this is easy to detect. I am not familiar enough with the schemes yet to make smart suggestions. Maybe each scheme simply has a field that says which type of metadata scheme it is?

Suggestion 3: Simplified documentation with dispatchable methods

When using the suggested dispatch appraoch, the package description can also be drastically simplified. In particular, since the user only makes use of the add function we only need to document the "add" function in the manual. Effectively, instead of documenting each individual type_adde function in the package we can simply choose to "hide" them as follows. In the current package, each type_add function is followed by

#' @export

which exports the function to the namespace fo the package. Since the add.type functions will automatically be dispatched for objects of classes "type", the user does not need to make explicit use of any of the add.type functions. Hence, we can remove the #' @export line after each add.type function so that the individual methods do not show up in the package documentation. This allows us to add, remove, merge, edit etc. any of the add.type functions int he future without the package manual needing to be changed (drastically).

backwards compatibility

For full backwards compatibility, one could add the following code to the package:

image_add <- add.image
script_add <- add.script

and #' @export these individual functions. In this case we can just simply write something along the lines "deprecated, available for backward compatibility. see add function for suggested use" as description for these functions.

Suggestion 4: removing redundant code using dispatch

Taking a closer look at the type_add functions, there is a common pattern. Essentially, all of these functions perform the following four tasks of which some are optional.

  1. check the api key
  2. perform checks on the metadata fields and make changes if needed
  3. pass the inputs onto the create function
  4. perform some checks on the api response

First, item 1 can be turned into a check_api function rather than a scripted check. This avoids the possibility we make a typo mistake in one of the type_add functions. Also, if at some point we want to make changes to the api check we just edit the single function rather than going back to all the individual type_add functions and editing all the different pieces individually.

Item 2, using the distpatch technique explained above, we can create one single check_scheme function, wich dispatches check_scheme.type depending on the type of metadata scheme at hand. In the current case, this would perform the actions of lines 47-53 in scripts.R to a metadata scheme of type "scripts", or do nothing at all for metadata schems of type "image". So for image, the function would just be check_scheme.image(x){return{x}) which maybe we want to change later.

Item 3, all functions do this part (same goal) but this is also where each function differs in how they do it (type field). Using the dispatch methods above, we could actually have one create function with different create.type methods, which would be the safer version of the current type= field approach.

Item 4, like item 2 this could be a single check_response with different check_response.type.

Overall, we could then replace all the different type_add functions, with the following single function:

add <- function(metadata_scheme){
    check_api()
    checked_cheme <- check_scheme(metadata_scheme)
    result <- create(checked_scheme)
    checked_result <- checke_response(results)
    return(results)
}

In which case the package development focuses on building out the check_scheme.type library and the create.type library.

@thijsbenschop
Copy link
Collaborator

Thanks @BPJandree for all the suggestions for the package.

On naming of functions, I’ve posted a suggestion in #3.

A few observations for discussion:

  • There is currently hardly any difference between the different add_* functions, as they all check the API and then call the create function. In the create function, the type argument is only used for creating the correct URL, which wouldn’t even require an if/else statement, as we could directly use the argument. Only in script_add, document_add and table_add there is one additional step, which is the same for all three, namely ‘change file_uri value to file basename’. Therefore, we won’t need a lengthy set of if/else statements for each single type. Perhaps new metadata schemas would require other actions, but I think in general the actions stay the same. Only the location in the metadata scheme where the file uri’s are stored may differ. This would mean that the wrapper function isn’t really a wrapper function, as it doesn’t call the individual add_* functions, but rather a generalized function. This of course only applies if we don’t want to add more type specific functionality to the add functions, such as checking there metadata schema.
  • Having a single function for each entry type allows to use the documentation of that function to provide examples for that specific type. Currently, the documentation of a function contains the full metadata schema. These schemas should move to the documentation. Yet, it may be a good place to provide some good examples. Having only a single add wrapper function would require to list the examples for all types in a long list. Perhaps we could keep both the single functions and a wrapper function? This depends on what we would like to put in the documentation: just the technical stuff or also the examples with the different metadata schemas.
  • Regarding the use of classes to dispatch the appropriate function: I don’t think that manually assigning a class or specifying a type argument in the function call is much different in terms of being idiot-proof. In fact, specifying it right at the function call may be more explicit and less error-prone than assigning a class to the metadata schema at some point and not able to readily see that class in the function call. One can still apply method a to metadata schema b when assigning classes. I like the idea of guessing the entry type from the metadata schema. However, this could be risky when adding new schemas, e.,g., a new metadata schema could also have a report number field and be wrongly classified as report. Then again, not much can go wrong when the wrong method is dispatched (at least at the time being), as the functions don’t differ much (see first point). In fact, only the files could be not found, which would result in an error. Adding the type field to the different metadata schema's may be an option. One could even drop in the create function.
  • Completely agree with creating functions for some common tasks, also to make the functions more readable and quicker to understand. I will go ahead with that.
  • Regardless of the implementation, we should ensure backward compatibility for the current function names.

@thijsbenschop
Copy link
Collaborator

See 7e21297 for an example how a generic function could look like. It works for all types except for scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants