Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Triangular distribution support #56

Merged
merged 8 commits into from
Nov 28, 2020
Merged

Conversation

CraigP17
Copy link
Contributor

@CraigP17 CraigP17 commented Nov 5, 2020

Contains new Triangular class, and updates to usage and test files. Fixes #48

Copy link
Owner

@hot9cups hot9cups left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @CraigP17 !

That is some good code, I've marked a few sections though that I think require changes, once that's done this is almost ready to get merged :)

probdists/Triangulardistribution.py Outdated Show resolved Hide resolved
probdists/Triangulardistribution.py Outdated Show resolved Hide resolved
else:
# Multiple modes
# Approximates mode using mean
self.mode = sum(self.data) / len(self.data)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reference for this? Even if multiple modes exist, isn't the mode still the same? I might be wrong but I'd appreciate if you could link a source confirming that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, mode is the numbers with the highest frequency appearing within data, and could have 0, 1, 1+ modes. For example, with [1, 2, 2, 3, 3, 4], the mode would be both 2 and 3. For triangular distribution, I believe you can only have 1 mode, otherwise it's not a triangle, so rather than raising an error, I use an approximated mode.
If you have other suggestions for what to do in multi-modal cases, let me know

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a better idea to let the user know that there are multiple modes which is a violation for triangular distribution rather than returning an arbitrary number(in this case the mean) as the mode. This is because the user needs to know there's a problem, returning the average doesn't indicate that.

We could also send the list of modes in the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, after the data file is read and I calculate the minimum, maximum, and mode, during that mode calculation, if it returns an array of elements I can send an error message. Do you want me to create a custom Exception for it as well? That way it can throw the Exception when it occurs which will be relayed to the user and prevent them from continuing with the Triangular distribution

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be good. Please go ahead and make the required changes.

probdists/Triangulardistribution.py Outdated Show resolved Hide resolved
value = 0 # default value for when x < min or x > max
if self.a <= x <= self.mode:
value = (2 * (x - self.a)) / (
(self.b - self.a) * (self.mode - self.a))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we plan to handle division by zero here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best way would be to check for if a == b during instantiation, and raise a ValueError because that would suggest the data is the same number since a is minimum and b is maximum. Then in any calculations, it should be safe to assume that a != b != mode
This applies to the other comments about ZeroDivision cases

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds like a good idea!

(self.b - self.a) * (self.mode - self.a))
elif self.mode < x <= self.b:
value = (2 * (self.b - x)) / (
(self.b - self.a) * (self.b - self.mode))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we plan to handle division by zero here?

value = 0
elif self.a <= x <= self.mode:
num = ((x - self.a) ** 2)
den = (self.b - self.a) * (self.mode - self.a)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we plan to handle division by zero here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculating the CDF when min = mode becomes difficult and would cause zero division. Working on a workaround...any ideas?

elif self.mode < x <= self.b:
num = ((self.b - x) ** 2)
den = (self.b - self.a) * (self.b - self.mode)
value = 1 - (num / den)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we plan to handle division by zero here?

"""
x = [self.a, self.mode, self.b]

peak = 2 / (self.b - self.a)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we plan to handle division by zero here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialization input values must be a < b so cannot happen in this case

self.assertEqual(self.triangle.calculate_cdf(7), 0.5)
self.assertEqual(self.triangle.calculate_cdf(9), 0.82)
self.assertEqual(self.triangle.calculate_cdf(12), 1)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also confirm that you've run these new tests and that they all pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've created the tests, manually calculated results to confirm, and have ran the tests

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, I'll ask you one final time right before I merge :)

Copy link
Owner

@hot9cups hot9cups left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep you waiting, I've been drowned with some chores irl, I will get back to this the coming Sunday. Thank you for your patience :)

@CraigP17
Copy link
Contributor Author

No worries.
I'm thinking to handle any ZeroDivision errors that can occur while calculating PDF and CDF from (maximum - minimum), we can control the input values when initializing the class so it throws a ValueError, and same when importing data from a file. This should prevent the error since b-a will never be 0.

@hot9cups
Copy link
Owner

No worries.
I'm thinking to handle any ZeroDivision errors that can occur while calculating PDF and CDF from (maximum - minimum), we can control the input values when initializing the class so it throws a ValueError, and same when importing data from a file. This should prevent the error since b-a will never be 0.

Sure, I didn't quite get the 'control the input values' part. Could you tell me a bit more about that?

@hot9cups hot9cups requested a review from mhdzumair November 14, 2020 12:21
@hot9cups
Copy link
Owner

I've left updated comments, most of it LGTM.
request @mhdzumair to review as well!

Copy link
Collaborator

@mhdzumair mhdzumair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think code quality is good. Just add functionality to handle zerodivision error. Other than that ill approve this PR.

@CraigP17
Copy link
Contributor Author

@hot9cups Regarding the 'control the input values' part that you would like to know more about...
When users initialize the Triangular distribution, they need to set the parameter values for minimum, maximum, and mode. With those values, the PDF and CDF are calculated. However ZeroDivision error occurs when (minimum == maximum) or (minimum == mode) or (maximum == mode). I can prevent a ZeroDivison error from occurring by preventing users from initializing the TriangularDistribution by throwing a ValueError when any of those values are equal.

Let me know if this value prevention works for you and about making a custom Exception for file imported data

Copy link
Owner

@hot9cups hot9cups left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CraigP17 Yes, that sounds fine! Value prevention during initialisation and the custom exception for imported data.
I suppose we could use the same custom exception in both places? Maybe, we'll have a look at that in the next commit.

So everything sounds good to me, please go ahead and make the proposed changes :)

@CraigP17
Copy link
Contributor Author

@hot9cups Perfect, I'll get it done early next week, when I have some more free time to work on it.

@CraigP17
Copy link
Contributor Author

Committed new changes that should prevent ZeroDivision. All tests are working, had to update one. Let me know if I'm missing anything else.

Copy link
Collaborator

@mhdzumair mhdzumair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code blocks are very clean. good job @CraigP17

You create an exception class that was a great idea. @hot9cups with his idea we can create our main exception class and create subclasses for distribution separately like value error etc.

Copy link
Owner

@hot9cups hot9cups left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mhdzumair
This is one of the neatest written class we have in the codebase!
Very nicely done!

Thank you @CraigP17 for contributing to probdists! I'll be sure to include you as a contributor when the next version gets released(very soon)!
This was some really really good code!

Oh also, if you find the library helpful and/or know someone who might, do share it!
Consider giving probdists a star if you like it :)

Thanks again, for the time and effort you put into this !!
Exceptional work! We had to go back and forth multiple times but I'm glad and grateful you stuck till the very end! Thank you <3

@hot9cups hot9cups merged commit ddc9d90 into hot9cups:main Nov 28, 2020
@CraigP17
Copy link
Contributor Author

Thank you! Happy to help and hopefully I can contribute in the future too.

@hot9cups
Copy link
Owner

Oh totally!
I think the work that's left to be done for this library in the immediate future includes improved documentation, addition of a few more probability distribution classes(We have 4 issues of that sort open at the moment), and I suppose switching from matplotlib to seaborn.

Then there's also other things that I think could be inculcated here, for example Inverting the probability distribution.
Ie, solving questions like "Given I have a Gaussian Distribution of x mean and y std, for what value of t is P(X < T) = z%".
That requires the calculation of cdf and pdf and smart manipulation, I talked about it here.

It's a very interesting feature in my opinion, definitely looking forward to include that in the future. Apart from that, I'm open to other 'probability things' this library could include, you're also welcome to create issues and suggest potential ideas :)

I look forward to your future contributions. Cheers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Triangular distribution
3 participants