Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searchsorted function #108

Closed
aplavin opened this issue May 26, 2022 · 5 comments
Closed

searchsorted function #108

aplavin opened this issue May 26, 2022 · 5 comments

Comments

@aplavin
Copy link
Contributor

aplavin commented May 26, 2022

I'm looking for a way to find elements in a sorted array that fall into a given interval. There's nothing ready-made in IntervalSets, so here's my solution attempt:

searchsorted(arr, int::Interval{:closed, :closed}) = searchsortedfirst(arr, minimum(int)):searchsortedlast(arr, maximum(int))
searchsorted(arr, int::Interval{:closed,   :open}) = searchsortedfirst(arr, minimum(int)):(searchsortedfirst(arr, supremum(int)) - 1)
searchsorted(arr, int::Interval{  :open, :closed}) = (searchsortedlast(arr, infimum(int)) + 1):searchsortedlast(arr, maximum(int))
searchsorted(arr, int::Interval{  :open,   :open}) = (searchsortedlast(arr, infimum(int)) + 1):(searchsortedfirst(arr, supremum(int)) - 1)

These functions seem correct from a bit of manual testing, but I'm not sure if I missed some cases somewhere.
Would it be useful to add these methods into IntervalSets itself?

@hyrodium
Copy link
Collaborator

It seems the second argument of searchsorted should be an element but not a set. (i.e. it should not be interval.)
So, I think we should avoid adding the method to searchsorted.

help?> searchsorted
search: searchsorted searchsortedlast searchsortedfirst

  searchsorted(a, x; by=<transform>, lt=<comparison>, rev=false)

  Return the range of indices of a which compare as equal to x (using binary search) according to the order specified by the by, lt and rev keywords,
  assuming that a is already sorted in that order. Return an empty range located at the insertion point if a does not contain values equal to x.

  See also: insorted, searchsortedfirst, sort, findall.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> searchsorted([1, 2, 4, 5, 5, 7], 4) # single match
  3:3
  
  julia> searchsorted([1, 2, 4, 5, 5, 7], 5) # multiple matches
  4:5
  
  julia> searchsorted([1, 2, 4, 5, 5, 7], 3) # no match, insert in the middle
  3:2
  
  julia> searchsorted([1, 2, 4, 5, 5, 7], 9) # no match, insert at end
  7:6
  
  julia> searchsorted([1, 2, 4, 5, 5, 7], 0) # no match, insert at start
  1:0

@aplavin
Copy link
Contributor Author

aplavin commented May 27, 2022

Sure, naming and interface can be different. For example:

searchsorted(A, ∈(int))
searchsorted_interval(A, int)

My main point is that this is not a trivial function to implement correctly for users. So, wouldn't it make sense to add to the package?

@hyrodium
Copy link
Collaborator

The name searchsorted_interval looks better for me. However, since other more primitive features such as #106 are missing, I think that implementation of that one is a higher priority...

Just curious, but under what circumstances would the function searchsorted_interval be needed?

@aplavin
Copy link
Contributor Author

aplavin commented May 30, 2022

I don't really see how this searchsorted function is related to setdiff you linked. Seems like they are completely independent and either can be added first.

I needed such a searchsorted function recently in FlexiJoins package: https://gitlab.com/aplavin/FlexiJoins.jl/-/blob/010000089640de9206cc71628014b1340fc5b529/src/bypredicate.jl#L74-80. There, it's used to find matches when joining two datasets by a condition like left.x in right.x1..right.x2. That implementation (same as in the 1st post here) looks correct, but I'm still not 100% sure that it covers everything.

@hyrodium
Copy link
Collaborator

I don't really see how this searchsorted function is related to setdiff you linked. Seems like they are completely independent and either can be added first.

Sorry for my confusing comment. I was totally misunderstanding. ([1,2,3] ∩ 1..5 is also missing just like setdiff. I was thinking that the set operations are more primitive than searchsorted, but I now agree with your comment.)

I think adding the function searchsorted_interval would be great. Could you open a PR? I will review it:+1:

That implementation (same as in the 1st post here) looks correct, but I'm still not 100% sure that it covers everything.

I think the following error should be avoided with an empty range, just like searchsorted([1,2],3).

julia> searchsorted_interval([1,2,3,4,4],1..3)
1:3

julia> searchsorted_interval([1,2,3,4,4],3..1)
ERROR: ArgumentError: Infimum not defined for empty intervals
Stacktrace:
 [1] infimum
   @ ~/.julia/dev/IntervalSets/src/IntervalSets.jl:83 [inlined]
 [2] minimum
   @ ~/.julia/dev/IntervalSets/src/IntervalSets.jl:172 [inlined]
 [3] searchsorted_interval(arr::Vector{Int64}, int::ClosedInterval{Int64})
   @ Main ./REPL[3]:1
 [4] top-level scope
   @ REPL[18]:1

This can be fixed by replacing minimum/maximum with leftendpoint/rightendpoint.

julia> searchsorted_interval(arr, int::Interval{:closed, :closed}) = searchsortedfirst(arr, leftendpoint(int)):searchsortedlast(arr, rightendpoint(int))
searchsorted_interval (generic function with 1 method)

julia> searchsorted_interval([1,2,3,4,4],3..1)
3:2

julia> searchsorted_interval([1,2,3,4,4],1..3)
1:3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants