Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specification of input file format #3

Open
KasperFyhn opened this issue Oct 10, 2023 · 1 comment
Open

Add specification of input file format #3

KasperFyhn opened this issue Oct 10, 2023 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@KasperFyhn
Copy link

Both issues #1 and #2 stem from problems with the format of the input file for supervised_classification.py, as there are some assumptions that are not entirely clear before running the script and digging into it, namely that: the positive label of a "label column" should be the same as the name of that label column. Additionally, if the script should be runnable, the input file most contain the columns raw_text, exemplar and political.

You partly address some of these things already in the README, but it wasn't fully clear to me, as is evident from the issues I have opened. 😄

So, specifying the format of an input file and/or giving an example file would definitely help.

This issue supersedes #1 and #2.

@KasperFyhn KasperFyhn added the documentation Improvements or additions to documentation label Oct 10, 2023
@x-tabdeveloping
Copy link
Member

I think the solution would ultimately be to let users pass the name of the positive label to the script, because it is a pretty wild assumption to make that the column and the positive label are the same, especially if it isn't binary classification (which is what it happens to be, but then nowhere in the documentation do we say that this is necessarily the case).
This issue mostly concerns evaluation code, which was written by @miscodisco. Should we fix that up, or just document our way out of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants