dRisk.R and issue in interval measure definition #49

leodecarlo · 2024-08-06T06:15:29Z

Dear Authors,

I copy the message in the Issue I had opened in the developing Git: sdcTools/sdcMicro#351 (comment) .
Bernhard-da agrees that there is something to change, quoting: " hi, thanks for your question. I agree that there is some kind of ambiguity. I want to note that the sdcguide is not written by the maintainers of sdcMicro so I would suggest to create an issue for the authors of the guide https://github.com/ihsn/SDCPractice/issues "

Me and a colleague think that there is a problem with the dRisk.R method in the sdcMicro library:

dRisk_link ,

the guide on interval measure :

interval_measure

says " intervals are created around each perturbed value and then a determination is made as to whether the original value of that perturbed observation is contained in this interval."
we agree that this is what the lines from 84 to 87 do in
dRisk_link .
Which count 1 when x is inside the created interval around x_m and 0 when x is outside.

But we find that the next lines in
interval_measure
seem to say something from what the method does:

" Values that are within the interval around the initial value after perturbation are considered too close to the initial value and hence unsafe and need more perturbation. Values that are outside of the intervals are considered safe. "

and

"The result 1 indicates that all (100 percent) the observations are outside the interval of 0.1 times the standard deviation around the original values." ,

namely it refers to intervals created around the original values, while the intervals are created around the perturbed values x_m, and it says that a value is counted as 0 when inside and 1 when outside, while we understand the function is doing the opposite around x_m.

We paste the following R script to test the strange behavior of the function, where increasing the noise in the perturbed values, the dRisk() method gives 1 for very high noise and very small values for very low noise. Here the script:

library(sdcMicro)

keys <- c('sex', 'age')
num_var <- c('expend')

sdc1<-createSdcObj(dat=testdata2, keyVars = keys, numVars = num_var)

set.seed(100)
out <- addNoise(sdc1, noise = 500)
high_noise <- out@risk$numeric

set.seed(100)
out <- addNoise(sdc1, noise = 0.001)
out@risk$numeric
low_noise <- out@risk$numeric

sprintf("Level of anonimity with insignificant noise %f. Level of anonymity with high noise %f", low_noise, high_noise)

So we think that a part of the guide should be changed and the dRisk.R() method should be changed or not along the actual intention (i.e. it can stay like it is if the guide changes meaning or not).

The text was updated successfully, but these errors were encountered:

thijsbenschop · 2024-08-06T07:57:53Z

@leodecarlo, thanks you very much for noting this and raising this issue. We will adjust the text in the guide to reflect the calculations in the dRisk function of the sdcMicro package.

leodecarlo · 2024-08-06T08:00:44Z

thanks to you for the replay.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dRisk.R and issue in interval measure definition #49

dRisk.R and issue in interval measure definition #49

leodecarlo commented Aug 6, 2024 •

edited

Loading

thijsbenschop commented Aug 6, 2024

leodecarlo commented Aug 6, 2024

dRisk.R and issue in interval measure definition #49

dRisk.R and issue in interval measure definition #49

Comments

leodecarlo commented Aug 6, 2024 • edited Loading

thijsbenschop commented Aug 6, 2024

leodecarlo commented Aug 6, 2024

leodecarlo commented Aug 6, 2024 •

edited

Loading