diff --git a/CHANGELOG.md b/CHANGELOG.md index 6338823..b5f7c83 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,9 +8,14 @@ and this project adheres to [Semantic Versioning][]. [keep a changelog]: https://keepachangelog.com/en/1.0.0/ [semantic versioning]: https://semver.org/spec/v2.0.0.html +## v1.0.0 + +- Update tutorials and docstrings of `.cond()` and `.contrast()` ([#11](https://github.com/scverse/formulaic-contrasts/pull/11)) +- No other changes, but the API is considered stable now. + ## v0.2.0 -- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix` +- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix` ## v0.1.0 diff --git a/docs/contrasts.ipynb b/docs/contrasts.ipynb index d4ec4e4..f80801a 100644 --- a/docs/contrasts.ipynb +++ b/docs/contrasts.ipynb @@ -251,13 +251,135 @@ "For instance, we could \n", "investigate differences between responders and non-responders, independent of treatment by fitting the model \n", "`~ response + treatment` and then comparing the category `\"responder\"` in the column `response` with the category `\"non_responder\"`.\n", - "This can be achieved using the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` method. " + "\n", + "Given the data frame from above and the model `~ response + treatment`, the design matrix contains the following distinct\n", + "entries, encoding the different combinations of response and drug. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Interceptresponse[T.responder]treatment[T.drugB]
01.000
101.010
401.001
701.011
\n", + "
" + ], + "text/plain": [ + " Intercept response[T.responder] treatment[T.drugB]\n", + "0 1.0 0 0\n", + "10 1.0 1 0\n", + "40 1.0 0 1\n", + "70 1.0 1 1" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from formulaic import model_matrix\n", + "\n", + "model_matrix(\"~ response + treatment\", df).drop_duplicates()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `response[T.responder]` column encodes `\"responder\"` as 1 and `\"non_responder\"` as 0. The \n", + "intercept is always 1 and the other column is irrelevant for our desired comparison. The entries a contrast vector \n", + "always correspond to the columns of the design matrix. We therefore need a contrast vector\n", + "that compares `(1, 1, 0)` vs. `(1, 0, 0)`:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0, 1, 0])" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "\n", + "contrast = np.array((1, 1, 0)) - np.array((1, 0, 0))\n", + "contrast" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using formulaic-contrast's {func}`~formulaic_contrasts.FormulaicContrasts.cond` function, we can build the same\n", + "contrast vector by specifying the categories of interest:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, "outputs": [ { "data": { @@ -268,7 +390,7 @@ "Name: 0, dtype: float64" ] }, - "execution_count": 3, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -278,11 +400,7 @@ "\n", "mod = FormulaicContrasts(df, \"~ response + treatment\")\n", "\n", - "contrast = mod.contrast(\n", - " column=\"response\",\n", - " baseline=\"non_responder\",\n", - " group_to_compare=\"responder\",\n", - ")\n", + "contrast = mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")\n", "contrast" ] }, @@ -290,12 +408,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is equivalent to the following {func}`~formulaic_contrasts.FormulaicContrasts.cond` call:" + "For this very common case of comparing two categories of the same variable, {func}`~formulaic_contrasts.FormulaicContrasts.contrast` \n", + "provides a convenient shortcut for building the same contrast:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -307,13 +426,18 @@ "Name: 0, dtype: float64" ] }, - "execution_count": 4, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")" + "contrast = mod.contrast(\n", + " column=\"response\",\n", + " baseline=\"non_responder\",\n", + " group_to_compare=\"responder\",\n", + ")\n", + "contrast" ] }, { @@ -328,7 +452,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -341,7 +465,7 @@ "Name: 0, dtype: float64" ] }, - "execution_count": 5, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -368,7 +492,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -381,7 +505,7 @@ "Name: 0, dtype: float64" ] }, - "execution_count": 6, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } diff --git a/docs/contributing.md b/docs/contributing.md index d54236f..da4a605 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -155,11 +155,11 @@ This will automatically create a git tag and trigger a Github workflow that crea Please write documentation for new or changed features and use-cases. This project uses [sphinx][] with the following features: -- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text -- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension). -- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks)) -- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types -- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/) +- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text +- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension). +- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks)) +- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types +- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/) See scanpy’s {doc}`scanpy:dev/documentation` for more information on how to write your own. @@ -183,10 +183,10 @@ please check out [this feature request][issue-render-notebooks] in the `cookiecu #### Hints -- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`. - Only if you do so can sphinx automatically create a link to the external documentation. -- If building the documentation fails because of a missing link that is outside your control, - you can add an entry to the `nitpick_ignore` list in `docs/conf.py` +- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`. + Only if you do so can sphinx automatically create a link to the external documentation. +- If building the documentation fails because of a missing link that is outside your control, + you can add an entry to the `nitpick_ignore` list in `docs/conf.py` (docs-building)= diff --git a/docs/model_usage.ipynb b/docs/model_usage.ipynb index 4340d4d..bf5bdf9 100644 --- a/docs/model_usage.ipynb +++ b/docs/model_usage.ipynb @@ -15,7 +15,7 @@ "for the use with `formulaic-contrasts`. The aim is to build a model that takes a pandas DataFrame and a formulaic formula as input\n", "allows to fit the model to a continuous variable from the dataframe and perform a statistical test for a given contrast. \n", "\n", - "This can be achived with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n", + "This can be achieved with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n", "base class:" ] }, @@ -28,9 +28,13 @@ "import formulaic_contrasts\n", "import numpy as np\n", "import statsmodels.api as sm\n", + "import pandas as pd\n", "\n", "\n", "class StatsmodelsOLS(formulaic_contrasts.FormulaicContrasts):\n", + " def __init__(self, data: pd.DataFrame, design: str):\n", + " super().__init__(data, design)\n", + "\n", " def fit(self, variable: str):\n", " self.mod = sm.OLS(self.data[variable], self.design_matrix)\n", " self.mod = self.mod.fit()\n", @@ -198,7 +202,7 @@ "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", - "c0 -1.6492 0.935 -1.764 0.082 -3.512 0.213\n", + "c0 1.9563 0.775 2.525 0.014 0.413 3.499\n", "==============================================================================" ] }, @@ -208,7 +212,7 @@ } ], "source": [ - "model = StatsmodelsOLS(df, \"~ treatment * response\")\n", + "model = StatsmodelsOLS(df, \"~ treatment + response\")\n", "model.fit(\"biomarker\")\n", "model.t_test(\n", " model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n", @@ -273,7 +277,7 @@ "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", - "c0 -1.6492 0.935 -1.764 0.082 -3.512 0.213\n", + "c0 1.9563 0.775 2.525 0.014 0.413 3.499\n", "==============================================================================" ] }, @@ -283,7 +287,7 @@ } ], "source": [ - "model = StatsmodelsOLS(df, \"~ treatment * response\")\n", + "model = StatsmodelsOLS(df, \"~ treatment + response\")\n", "model.fit(\"biomarker\")\n", "model.t_test(\n", " model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n", @@ -338,7 +342,7 @@ "outputs": [], "source": [ "design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(\n", - " \"~ treatment * response\"\n", + " \"~ treatment + response\"\n", ")" ] }, @@ -371,7 +375,7 @@ " drop_field='non_responder',\n", " column_names=('non_responder',\n", " 'responder'),\n", - " colname_format='{name}[T.{field}]')],\n", + " colname_format='{name}[{field}]')],\n", " 'treatment': [FactorMetadata(name='treatment',\n", " reduced_rank=True,\n", " custom_encoder=False,\n", @@ -379,7 +383,7 @@ " kind=,\n", " drop_field='drugA',\n", " column_names=('drugA', 'drugB'),\n", - " colname_format='{name}[T.{field}]')]})\n" + " colname_format='{name}[{field}]')]})\n" ] } ], @@ -571,10 +575,10 @@ "defaultdict(set,\n", " {'np.log': {'np.log(biomarker)'},\n", " 'biomarker': {'np.log(biomarker)'},\n", - " 'C': {'C(response)',\n", - " \"C(treatment, contr.treatment(base='drugB'))\"},\n", " 'treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n", " 'contr.treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n", + " 'C': {'C(response)',\n", + " \"C(treatment, contr.treatment(base='drugB'))\"},\n", " 'response': {'C(response)'}})" ] }, diff --git a/src/formulaic_contrasts/_contrasts.py b/src/formulaic_contrasts/_contrasts.py index 2924fc1..62a7db0 100644 --- a/src/formulaic_contrasts/_contrasts.py +++ b/src/formulaic_contrasts/_contrasts.py @@ -33,6 +33,17 @@ def cond(self, **kwargs): """ Get a contrast vector representing a specific condition. + The `kwargs` are key/value pairs where the key refers to a variable used in the + design and the value represents a category of that variable. Variables not specified + will be filled with their default/baseline value. + + The vectors generated by `.cond` can be combined using standard arithmetic operations + to obtain the desired contrast, e.g. + + >>> contrast = model.cond(treatment="drugA") - model.cond(treatment="placebo") + + For more information on how to build contrasts, see :doc:`/contrasts`. + Parameters ---------- **kwargs @@ -40,7 +51,8 @@ def cond(self, **kwargs): Returns ------- - A contrast vector that aligns to the columns of the design matrix. + A vector with one element per column in the design matrix, + where the kwargs arguments are coded as in the design matrix. """ cond_dict = kwargs if not set(cond_dict.keys()).issubset(self.variables): @@ -58,7 +70,10 @@ def cond(self, **kwargs): def contrast(self, column, baseline, group_to_compare): """ - Build a simple contrast for pairwise comparisons. + Build a simple contrast for pairwise comparisons of a single variable. + + For more complex contrasts, please use construct a contrast vector using + :func:`~formulaic_contrasts.FormulaicContrasts.cond`. Parameters ----------