-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Triangular distribution support #56
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @CraigP17 !
That is some good code, I've marked a few sections though that I think require changes, once that's done this is almost ready to get merged :)
probdists/Triangulardistribution.py
Outdated
else: | ||
# Multiple modes | ||
# Approximates mode using mean | ||
self.mode = sum(self.data) / len(self.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a reference for this? Even if multiple modes exist, isn't the mode still the same? I might be wrong but I'd appreciate if you could link a source confirming that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, mode is the numbers with the highest frequency appearing within data, and could have 0, 1, 1+ modes. For example, with [1, 2, 2, 3, 3, 4], the mode would be both 2 and 3. For triangular distribution, I believe you can only have 1 mode, otherwise it's not a triangle, so rather than raising an error, I use an approximated mode.
If you have other suggestions for what to do in multi-modal cases, let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a better idea to let the user know that there are multiple modes which is a violation for triangular distribution rather than returning an arbitrary number(in this case the mean) as the mode. This is because the user needs to know there's a problem, returning the average doesn't indicate that.
We could also send the list of modes in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, after the data file is read and I calculate the minimum, maximum, and mode, during that mode calculation, if it returns an array of elements I can send an error message. Do you want me to create a custom Exception for it as well? That way it can throw the Exception when it occurs which will be relayed to the user and prevent them from continuing with the Triangular distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be good. Please go ahead and make the required changes.
value = 0 # default value for when x < min or x > max | ||
if self.a <= x <= self.mode: | ||
value = (2 * (x - self.a)) / ( | ||
(self.b - self.a) * (self.mode - self.a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we plan to handle division by zero here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best way would be to check for if a == b during instantiation, and raise a ValueError because that would suggest the data is the same number since a is minimum and b is maximum. Then in any calculations, it should be safe to assume that a != b != mode
This applies to the other comments about ZeroDivision cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds like a good idea!
(self.b - self.a) * (self.mode - self.a)) | ||
elif self.mode < x <= self.b: | ||
value = (2 * (self.b - x)) / ( | ||
(self.b - self.a) * (self.b - self.mode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we plan to handle division by zero here?
value = 0 | ||
elif self.a <= x <= self.mode: | ||
num = ((x - self.a) ** 2) | ||
den = (self.b - self.a) * (self.mode - self.a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we plan to handle division by zero here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculating the CDF when min = mode becomes difficult and would cause zero division. Working on a workaround...any ideas?
elif self.mode < x <= self.b: | ||
num = ((self.b - x) ** 2) | ||
den = (self.b - self.a) * (self.b - self.mode) | ||
value = 1 - (num / den) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we plan to handle division by zero here?
""" | ||
x = [self.a, self.mode, self.b] | ||
|
||
peak = 2 / (self.b - self.a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we plan to handle division by zero here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialization input values must be a < b so cannot happen in this case
self.assertEqual(self.triangle.calculate_cdf(7), 0.5) | ||
self.assertEqual(self.triangle.calculate_cdf(9), 0.82) | ||
self.assertEqual(self.triangle.calculate_cdf(12), 1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also confirm that you've run these new tests and that they all pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've created the tests, manually calculated results to confirm, and have ran the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, I'll ask you one final time right before I merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to keep you waiting, I've been drowned with some chores irl, I will get back to this the coming Sunday. Thank you for your patience :)
No worries. |
Sure, I didn't quite get the 'control the input values' part. Could you tell me a bit more about that? |
I've left updated comments, most of it LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think code quality is good. Just add functionality to handle zerodivision error. Other than that ill approve this PR.
@hot9cups Regarding the 'control the input values' part that you would like to know more about... Let me know if this value prevention works for you and about making a custom Exception for file imported data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CraigP17 Yes, that sounds fine! Value prevention during initialisation and the custom exception for imported data.
I suppose we could use the same custom exception in both places? Maybe, we'll have a look at that in the next commit.
So everything sounds good to me, please go ahead and make the proposed changes :)
@hot9cups Perfect, I'll get it done early next week, when I have some more free time to work on it. |
Committed new changes that should prevent ZeroDivision. All tests are working, had to update one. Let me know if I'm missing anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mhdzumair
This is one of the neatest written class we have in the codebase!
Very nicely done!
Thank you @CraigP17 for contributing to probdists! I'll be sure to include you as a contributor when the next version gets released(very soon)!
This was some really really good code!
Oh also, if you find the library helpful and/or know someone who might, do share it!
Consider giving probdists a star if you like it :)
Thanks again, for the time and effort you put into this !!
Exceptional work! We had to go back and forth multiple times but I'm glad and grateful you stuck till the very end! Thank you <3
Thank you! Happy to help and hopefully I can contribute in the future too. |
Oh totally! Then there's also other things that I think could be inculcated here, for example Inverting the probability distribution. It's a very interesting feature in my opinion, definitely looking forward to include that in the future. Apart from that, I'm open to other 'probability things' this library could include, you're also welcome to create issues and suggest potential ideas :) I look forward to your future contributions. Cheers :) |
Contains new Triangular class, and updates to usage and test files. Fixes #48