Skip to content

Commit

Permalink
IRSI: Preliminary information about the project
Browse files Browse the repository at this point in the history
- added extended information about the project, spectrum used, and
  preliminary results from the predictive experiment
- added the feature importance, and y_true, and y_pred plots within the
  README
- added gitignore

Signed-off-by: akhilpandey95 <[email protected]>
  • Loading branch information
akhilpandey95 committed Jul 1, 2024
1 parent c3e7466 commit 51fea6b
Show file tree
Hide file tree
Showing 5 changed files with 189 additions and 0 deletions.
156 changes: 156 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# add the specific text dataset to ignore
data/acmpapers_fulltext_labels_04_24.csv
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,35 @@
# IRSI

> **I**nfluence of **R**eproducibility on **S**cientific **I**mpact
Reproducibility is an important feature of science; experiments are retested, and analyses are repeated.we examine a myriad of features in scholarly articles published in computer science conferences and journals and model how they influence scientific impact.

### Reproducibility Spectrum

The author-centric framework focuses on acknowledging availability, accessibility, and quality of the artifact available within scientific document to signal satisfying prerequisites to reproduce a paper. The Author-Centric framework within the spectrum includes, $A_i = A_{PWA}$ (Papers without artifacts), $A_{PUNX}$ (Papers with artifacts that aren't permanantly archived), and $A_{PAX}$ (Papers with artifacts that are permanantly archived).

The external-agent framework that presents the reproducibility evaluation status of a paper. This includes $E_i = E_{NR}$ (Paper that cannot be reproduced), $E_{AR}$ (Paper Awaiting-Reproducibility), $E_{R^{Re}}$ (Reproduced paper), and $E_{R}$ (Reproducible paper).

<img src="media/AC_EA_DARK.png"/>

### Influence on Scientific Impact

The concept of reproducibility echoes diverse sentiments, and from a taxonomy standpoint, the formal definition of reproducibility has evolved as a term and a concept. We align with the National Academy of Sciences in defining \textit{reproducibility} as the process of obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis. This definition provides an ideal generalizable standard applicable to large sections of scientific research within the sub-domains of computer science. Consensus on this definition can make it easier to recognize procedures and protocols for validating and verifying scientific claims. The relevance and importance of reproducibility are heightened more than ever, given the current outgrowth of Artificial Intelligence (AI) models into diverse public domains. The modus operandi of scientific workflows in AI has shifted from offering posterior fixes to building a-priori reproducible AI pipelines. Regardless of the complexity, we can observe the push for making models, datasets, and algorithms available in containers, open-source repositories, and webpages. The significance of reproducibility is multifaceted. First, it upholds a standard for sustaining
quality in the results and analysis of scholarly works, ensuring that scientific findings are robust, reliable, and unbiased. Second, it enables researchers to innovate and expand on proven findings quickly. Third, in the context of AI, reproducibility addresses essential safety and trust considerations by ensuring accountability within the systems implementing algorithms that make decisions affecting human lives.

<img src="media/IRSSI_test_val_mdi_feat_imp.png"/>

> Fig.1.a Important features observed while predicting scholarly impact of reproducible papers in computer science
Preliminary evidence from utilizing diverse feature groups central to
computational sciences in predicting scholarly impact highlight the importance of transparency, reproducibility, clear communication, and practical contributions in enhancing the scholarly impact of academic papers. They reflect broader trends in the scientific community towards open science and reproducibility, which are key to our interest in addressing the **Reproducibility crisis in AI**.

<img src="media/IRSSI_test_val_ytrue_ypred.png"/>

> Fig.1.b Plotting the true citation counts against the predicted values
### Authors
[Akhil Pandey](https://github.com/akhilpandey95), [Hamed Alhoori](https://github.com/alhoori)

### Acknowledgement
This work is supported in part by NSF Grant No. [2022443](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2022443&HistoricalAwards=false).
Binary file added media/AC_EA_DARK.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/IRSSI_test_val_mdi_feat_imp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/IRSSI_test_val_ytrue_ypred.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 51fea6b

Please sign in to comment.