Skip to content

Commit

Permalink
Merge pull request #11 from scverse/update-docs
Browse files Browse the repository at this point in the history
Improve documentation
  • Loading branch information
grst authored Dec 15, 2024
2 parents 76007d5 + ad40ebf commit ceebb5c
Show file tree
Hide file tree
Showing 5 changed files with 185 additions and 37 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,14 @@ and this project adheres to [Semantic Versioning][].
[keep a changelog]: https://keepachangelog.com/en/1.0.0/
[semantic versioning]: https://semver.org/spec/v2.0.0.html

## v1.0.0

- Update tutorials and docstrings of `.cond()` and `.contrast()` ([#11](https://github.com/scverse/formulaic-contrasts/pull/11))
- No other changes, but the API is considered stable now.

## v0.2.0

- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix`
- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix`

## v0.1.0

Expand Down
154 changes: 139 additions & 15 deletions docs/contrasts.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -251,13 +251,135 @@
"For instance, we could \n",
"investigate differences between responders and non-responders, independent of treatment by fitting the model \n",
"`~ response + treatment` and then comparing the category `\"responder\"` in the column `response` with the category `\"non_responder\"`.\n",
"This can be achieved using the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` method. "
"\n",
"Given the data frame from above and the model `~ response + treatment`, the design matrix contains the following distinct\n",
"entries, encoding the different combinations of response and drug. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Intercept</th>\n",
" <th>response[T.responder]</th>\n",
" <th>treatment[T.drugB]</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Intercept response[T.responder] treatment[T.drugB]\n",
"0 1.0 0 0\n",
"10 1.0 1 0\n",
"40 1.0 0 1\n",
"70 1.0 1 1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from formulaic import model_matrix\n",
"\n",
"model_matrix(\"~ response + treatment\", df).drop_duplicates()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `response[T.responder]` column encodes `\"responder\"` as 1 and `\"non_responder\"` as 0. The \n",
"intercept is always 1 and the other column is irrelevant for our desired comparison. The entries a contrast vector \n",
"always correspond to the columns of the design matrix. We therefore need a contrast vector\n",
"that compares `(1, 1, 0)` vs. `(1, 0, 0)`:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 0])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"contrast = np.array((1, 1, 0)) - np.array((1, 0, 0))\n",
"contrast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using formulaic-contrast's {func}`~formulaic_contrasts.FormulaicContrasts.cond` function, we can build the same\n",
"contrast vector by specifying the categories of interest:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -268,7 +390,7 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 3,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -278,24 +400,21 @@
"\n",
"mod = FormulaicContrasts(df, \"~ response + treatment\")\n",
"\n",
"contrast = mod.contrast(\n",
" column=\"response\",\n",
" baseline=\"non_responder\",\n",
" group_to_compare=\"responder\",\n",
")\n",
"contrast = mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")\n",
"contrast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is equivalent to the following {func}`~formulaic_contrasts.FormulaicContrasts.cond` call:"
"For this very common case of comparing two categories of the same variable, {func}`~formulaic_contrasts.FormulaicContrasts.contrast` \n",
"provides a convenient shortcut for building the same contrast:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand All @@ -307,13 +426,18 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")"
"contrast = mod.contrast(\n",
" column=\"response\",\n",
" baseline=\"non_responder\",\n",
" group_to_compare=\"responder\",\n",
")\n",
"contrast"
]
},
{
Expand All @@ -328,7 +452,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand All @@ -341,7 +465,7 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -368,7 +492,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand All @@ -381,7 +505,7 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand Down
18 changes: 9 additions & 9 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,11 +155,11 @@ This will automatically create a git tag and trigger a Github workflow that crea
Please write documentation for new or changed features and use-cases.
This project uses [sphinx][] with the following features:

- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text
- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension).
- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types
- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)
- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text
- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension).
- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types
- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)

See scanpy’s {doc}`scanpy:dev/documentation` for more information on how to write your own.

Expand All @@ -183,10 +183,10 @@ please check out [this feature request][issue-render-notebooks] in the `cookiecu

#### Hints

- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`.
Only if you do so can sphinx automatically create a link to the external documentation.
- If building the documentation fails because of a missing link that is outside your control,
you can add an entry to the `nitpick_ignore` list in `docs/conf.py`
- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`.
Only if you do so can sphinx automatically create a link to the external documentation.
- If building the documentation fails because of a missing link that is outside your control,
you can add an entry to the `nitpick_ignore` list in `docs/conf.py`

(docs-building)=

Expand Down
24 changes: 14 additions & 10 deletions docs/model_usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"for the use with `formulaic-contrasts`. The aim is to build a model that takes a pandas DataFrame and a formulaic formula as input\n",
"allows to fit the model to a continuous variable from the dataframe and perform a statistical test for a given contrast. \n",
"\n",
"This can be achived with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n",
"This can be achieved with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n",
"base class:"
]
},
Expand All @@ -28,9 +28,13 @@
"import formulaic_contrasts\n",
"import numpy as np\n",
"import statsmodels.api as sm\n",
"import pandas as pd\n",
"\n",
"\n",
"class StatsmodelsOLS(formulaic_contrasts.FormulaicContrasts):\n",
" def __init__(self, data: pd.DataFrame, design: str):\n",
" super().__init__(data, design)\n",
"\n",
" def fit(self, variable: str):\n",
" self.mod = sm.OLS(self.data[variable], self.design_matrix)\n",
" self.mod = self.mod.fit()\n",
Expand Down Expand Up @@ -198,7 +202,7 @@
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"c0 -1.6492 0.935 -1.764 0.082 -3.512 0.213\n",
"c0 1.9563 0.775 2.525 0.014 0.413 3.499\n",
"=============================================================================="
]
},
Expand All @@ -208,7 +212,7 @@
}
],
"source": [
"model = StatsmodelsOLS(df, \"~ treatment * response\")\n",
"model = StatsmodelsOLS(df, \"~ treatment + response\")\n",
"model.fit(\"biomarker\")\n",
"model.t_test(\n",
" model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n",
Expand Down Expand Up @@ -273,7 +277,7 @@
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"c0 -1.6492 0.935 -1.764 0.082 -3.512 0.213\n",
"c0 1.9563 0.775 2.525 0.014 0.413 3.499\n",
"=============================================================================="
]
},
Expand All @@ -283,7 +287,7 @@
}
],
"source": [
"model = StatsmodelsOLS(df, \"~ treatment * response\")\n",
"model = StatsmodelsOLS(df, \"~ treatment + response\")\n",
"model.fit(\"biomarker\")\n",
"model.t_test(\n",
" model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n",
Expand Down Expand Up @@ -338,7 +342,7 @@
"outputs": [],
"source": [
"design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(\n",
" \"~ treatment * response\"\n",
" \"~ treatment + response\"\n",
")"
]
},
Expand Down Expand Up @@ -371,15 +375,15 @@
" drop_field='non_responder',\n",
" column_names=('non_responder',\n",
" 'responder'),\n",
" colname_format='{name}[T.{field}]')],\n",
" colname_format='{name}[{field}]')],\n",
" 'treatment': [FactorMetadata(name='treatment',\n",
" reduced_rank=True,\n",
" custom_encoder=False,\n",
" categories=('drugA', 'drugB'),\n",
" kind=<Kind.CATEGORICAL: 'categorical'>,\n",
" drop_field='drugA',\n",
" column_names=('drugA', 'drugB'),\n",
" colname_format='{name}[T.{field}]')]})\n"
" colname_format='{name}[{field}]')]})\n"
]
}
],
Expand Down Expand Up @@ -571,10 +575,10 @@
"defaultdict(set,\n",
" {'np.log': {'np.log(biomarker)'},\n",
" 'biomarker': {'np.log(biomarker)'},\n",
" 'C': {'C(response)',\n",
" \"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'contr.treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'C': {'C(response)',\n",
" \"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'response': {'C(response)'}})"
]
},
Expand Down
Loading

0 comments on commit ceebb5c

Please sign in to comment.