Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Auto-detection #18

Open
spratt opened this issue Jul 4, 2012 · 2 comments
Open

Language Auto-detection #18

spratt opened this issue Jul 4, 2012 · 2 comments

Comments

@spratt
Copy link
Member

spratt commented Jul 4, 2012

As the user enters code in the code submission window, we should make a reasonable guess at which language they are using, but still let the user pick their language, and stop trying to auto-detect once they've chosen.

@Gankra
Copy link
Contributor

Gankra commented Jul 14, 2012

This doesn't seem tractable, as indicated by the ubiquity of "C-Like". Also if you have a string with another language in it, how would you possibly resolve that? Is this some HTML with javascript, or some javascript with HTML?

@spratt
Copy link
Member Author

spratt commented Jul 15, 2012

Our first idea is a Bayesian classifier. Basically build a score for each language, maybe based on the number of keywords matched, and calculate the probability of each language. When the probability passes a certain threshold, make that guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants