Merge pull request #11 from scverse/update-docs

Improve documentation
scverse · Dec 15, 2024 · ceebb5c · ceebb5c
2 parents 76007d5 + ad40ebf
commit ceebb5c
Show file tree

Hide file tree

Showing 5 changed files with 185 additions and 37 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,9 +8,14 @@ and this project adheres to [Semantic Versioning][].
 [keep a changelog]: https://keepachangelog.com/en/1.0.0/
 [semantic versioning]: https://semver.org/spec/v2.0.0.html
 
+## v1.0.0
+
+- Update tutorials and docstrings of `.cond()` and `.contrast()` ([#11](https://github.com/scverse/formulaic-contrasts/pull/11))
+- No other changes, but the API is considered stable now.
+
 ## v0.2.0
 
--   Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix`
+- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix`
 
 ## v0.1.0
 

diff --git a/docs/contrasts.ipynb b/docs/contrasts.ipynb
@@ -251,13 +251,135 @@
     "For instance, we could \n",
     "investigate differences between responders and non-responders, independent of treatment by fitting the model \n",
     "`~ response + treatment` and then comparing the category `\"responder\"` in the column `response` with the category `\"non_responder\"`.\n",
-    "This can be achieved using the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` method. "
+    "\n",
+    "Given the data frame from above and the model `~ response + treatment`, the design matrix contains the following distinct\n",
+    "entries, encoding the different combinations of response and drug. "
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
    "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Intercept</th>\n",
+       "      <th>response[T.responder]</th>\n",
+       "      <th>treatment[T.drugB]</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>40</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>70</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    Intercept  response[T.responder]  treatment[T.drugB]\n",
+       "0         1.0                      0                   0\n",
+       "10        1.0                      1                   0\n",
+       "40        1.0                      0                   1\n",
+       "70        1.0                      1                   1"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from formulaic import model_matrix\n",
+    "\n",
+    "model_matrix(\"~ response + treatment\", df).drop_duplicates()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `response[T.responder]` column encodes `\"responder\"` as 1 and `\"non_responder\"` as 0. The \n",
+    "intercept is always 1 and the other column is irrelevant for our desired comparison. The entries a contrast vector \n",
+    "always correspond to the columns of the design matrix. We therefore need a contrast vector\n",
+    "that compares `(1, 1, 0)` vs. `(1, 0, 0)`:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([0, 1, 0])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "contrast = np.array((1, 1, 0)) - np.array((1, 0, 0))\n",
+    "contrast"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using formulaic-contrast's {func}`~formulaic_contrasts.FormulaicContrasts.cond` function, we can build the same\n",
+    "contrast vector by specifying the categories of interest:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -268,7 +390,7 @@
        "Name: 0, dtype: float64"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -278,24 +400,21 @@
     "\n",
     "mod = FormulaicContrasts(df, \"~ response + treatment\")\n",
     "\n",
-    "contrast = mod.contrast(\n",
-    "    column=\"response\",\n",
-    "    baseline=\"non_responder\",\n",
-    "    group_to_compare=\"responder\",\n",
-    ")\n",
+    "contrast = mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")\n",
     "contrast"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This is equivalent to the following {func}`~formulaic_contrasts.FormulaicContrasts.cond` call:"
+    "For this very common case of comparing two categories of the same variable, {func}`~formulaic_contrasts.FormulaicContrasts.contrast` \n",
+    "provides a convenient shortcut for building the same contrast:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
@@ -307,13 +426,18 @@
        "Name: 0, dtype: float64"
       ]
      },
-     "execution_count": 4,
+     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")"
+    "contrast = mod.contrast(\n",
+    "    column=\"response\",\n",
+    "    baseline=\"non_responder\",\n",
+    "    group_to_compare=\"responder\",\n",
+    ")\n",
+    "contrast"
    ]
   },
   {
@@ -328,7 +452,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
@@ -341,7 +465,7 @@
        "Name: 0, dtype: float64"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -368,7 +492,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
@@ -381,7 +505,7 @@
        "Name: 0, dtype: float64"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
     }

diff --git a/docs/contributing.md b/docs/contributing.md
@@ -155,11 +155,11 @@ This will automatically create a git tag and trigger a Github workflow that crea
 Please write documentation for new or changed features and use-cases.
 This project uses [sphinx][] with the following features:
 
--   The [myst][] extension allows to write documentation in markdown/Markedly Structured Text
--   [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension).
--   Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
--   [sphinx-autodoc-typehints][], to automatically reference annotated input and output types
--   Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)
+- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text
+- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension).
+- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
+- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types
+- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)
 
 See scanpy’s {doc}`scanpy:dev/documentation` for more information on how to write your own.
 
@@ -183,10 +183,10 @@ please check out [this feature request][issue-render-notebooks] in the `cookiecu
 
 #### Hints
 
--   If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`.
-    Only if you do so can sphinx automatically create a link to the external documentation.
--   If building the documentation fails because of a missing link that is outside your control,
-    you can add an entry to the `nitpick_ignore` list in `docs/conf.py`
+- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`.
+  Only if you do so can sphinx automatically create a link to the external documentation.
+- If building the documentation fails because of a missing link that is outside your control,
+  you can add an entry to the `nitpick_ignore` list in `docs/conf.py`
 
 (docs-building)=
 

diff --git a/docs/model_usage.ipynb b/docs/model_usage.ipynb
@@ -15,7 +15,7 @@
     "for the use with `formulaic-contrasts`. The aim is to build a model that takes a pandas DataFrame and a formulaic formula as input\n",
     "allows to fit the model to a continuous variable from the dataframe and perform a statistical test for a given contrast. \n",
     "\n",
-    "This can be achived with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n",
+    "This can be achieved with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n",
     "base class:"
    ]
   },
@@ -28,9 +28,13 @@
     "import formulaic_contrasts\n",
     "import numpy as np\n",
     "import statsmodels.api as sm\n",
+    "import pandas as pd\n",
     "\n",
     "\n",
     "class StatsmodelsOLS(formulaic_contrasts.FormulaicContrasts):\n",
+    "    def __init__(self, data: pd.DataFrame, design: str):\n",
+    "        super().__init__(data, design)\n",
+    "\n",
     "    def fit(self, variable: str):\n",
     "        self.mod = sm.OLS(self.data[variable], self.design_matrix)\n",
     "        self.mod = self.mod.fit()\n",
@@ -198,7 +202,7 @@
        "==============================================================================\n",
        "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
        "------------------------------------------------------------------------------\n",
-       "c0            -1.6492      0.935     -1.764      0.082      -3.512       0.213\n",
+       "c0             1.9563      0.775      2.525      0.014       0.413       3.499\n",
        "=============================================================================="
       ]
      },
@@ -208,7 +212,7 @@
     }
    ],
    "source": [
-    "model = StatsmodelsOLS(df, \"~ treatment * response\")\n",
+    "model = StatsmodelsOLS(df, \"~ treatment + response\")\n",
     "model.fit(\"biomarker\")\n",
     "model.t_test(\n",
     "    model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n",
@@ -273,7 +277,7 @@
        "==============================================================================\n",
        "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
        "------------------------------------------------------------------------------\n",
-       "c0            -1.6492      0.935     -1.764      0.082      -3.512       0.213\n",
+       "c0             1.9563      0.775      2.525      0.014       0.413       3.499\n",
        "=============================================================================="
       ]
      },
@@ -283,7 +287,7 @@
     }
    ],
    "source": [
-    "model = StatsmodelsOLS(df, \"~ treatment * response\")\n",
+    "model = StatsmodelsOLS(df, \"~ treatment + response\")\n",
     "model.fit(\"biomarker\")\n",
     "model.t_test(\n",
     "    model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n",
@@ -338,7 +342,7 @@
    "outputs": [],
    "source": [
     "design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(\n",
-    "    \"~ treatment * response\"\n",
+    "    \"~ treatment + response\"\n",
     ")"
    ]
   },
@@ -371,15 +375,15 @@
       "                                         drop_field='non_responder',\n",
       "                                         column_names=('non_responder',\n",
       "                                                       'responder'),\n",
-      "                                         colname_format='{name}[T.{field}]')],\n",
+      "                                         colname_format='{name}[{field}]')],\n",
       "             'treatment': [FactorMetadata(name='treatment',\n",
       "                                          reduced_rank=True,\n",
       "                                          custom_encoder=False,\n",
       "                                          categories=('drugA', 'drugB'),\n",
       "                                          kind=<Kind.CATEGORICAL: 'categorical'>,\n",
       "                                          drop_field='drugA',\n",
       "                                          column_names=('drugA', 'drugB'),\n",
-      "                                          colname_format='{name}[T.{field}]')]})\n"
+      "                                          colname_format='{name}[{field}]')]})\n"
      ]
     }
    ],
@@ -571,10 +575,10 @@
        "defaultdict(set,\n",
        "            {'np.log': {'np.log(biomarker)'},\n",
        "             'biomarker': {'np.log(biomarker)'},\n",
-       "             'C': {'C(response)',\n",
-       "              \"C(treatment, contr.treatment(base='drugB'))\"},\n",
        "             'treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n",
        "             'contr.treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n",
+       "             'C': {'C(response)',\n",
+       "              \"C(treatment, contr.treatment(base='drugB'))\"},\n",
        "             'response': {'C(response)'}})"
       ]
      },