From f1fc24d733954efc21db37049a0373a27317730c Mon Sep 17 00:00:00 2001
From: Chris Cummings <ccummings@nvidia.com>
Date: Mon, 13 Jan 2025 15:55:38 +0000
Subject: [PATCH] Better writing

---
 docs/mapping.rst | 144 +++++++++++++++++++++++------------------------
 1 file changed, 70 insertions(+), 74 deletions(-)

diff --git a/docs/mapping.rst b/docs/mapping.rst
index 5c05ebd..575e480 100644
--- a/docs/mapping.rst
+++ b/docs/mapping.rst
@@ -1,67 +1,69 @@
 Mapping
 =======
 
-In the previous broadcasting section, we saw how SlangPy applies broadcasting rules to automatically vectorize a function. Mapping gives more control over this process by allowing the user to explicitly specify the relationship between argument and kernel dimensions.
+Mapping provides a way to explicitly control the relationship between argument dimensions and kernel dimensions in SlangPy. This extends the broadcasting rules discussed earlier, giving you more precise control over vectorization.
 
-Consider the following simple call to an `add` function that adds 2 floats:
+A Simple Example
+----------------
 
-.. code-block:: python 
+Consider this call to an `add` function that adds two floats:
 
-    a = np.random.rand(10,3,4)
-    b = np.random.rand(10,3,4)
-    result = mymodule.add(a,b, _result='numpy')
+.. code-block:: python
+
+    a = np.random.rand(10, 3, 4)
+    b = np.random.rand(10, 3, 4)
+    result = mymodule.add(a, b, _result='numpy')
 
-In this example:
+In this case:
 
-- ``a`` and ``b`` are `arguments` to the ``add`` kernel, each with shape ``(10,3,4)``. 
-- The kernel is dispatched with overall shape ``(10,3,4)``.
-- Any given thread, ``[i,j,k]``, reads ``a[i,j,k]`` and ``b[i,j,k]`` and writes ``result[i,j,k]``. 
+- ``a`` and ``b`` are arguments to the ``add`` kernel, each with the shape ``(10, 3, 4)``.
+- The kernel is dispatched with an overall shape of ``(10, 3, 4)``.
+- Each thread indexed by ``[i, j, k]`` processes ``a[i, j, k]`` and ``b[i, j, k]``, writing the result to ``result[i, j, k]``.
 
-There is a simple 1-to-1 mapping of argument dimensions to kernel dimensions.
+This represents a straightforward 1-to-1 mapping between argument dimensions and kernel dimensions.
 
-Re-mapping dimensions
----------------------
+Re-Mapping Dimensions
+----------------------
 
-``map`` can be used to change how argument dimensions correspond to kernel dimensions. In the above example, we could have written:
+The ``map`` function allows you to modify how argument dimensions correspond to kernel dimensions. For example, the earlier code could be rewritten as:
 
-.. code-block:: python 
+.. code-block:: python
 
-    a = np.random.rand(10,3,4)
-    b = np.random.rand(10,3,4)
-    result = mymodule.add.map((0,1,2), (0,1,2))(a,b, _result='numpy')
+    a = np.random.rand(10, 3, 4)
+    b = np.random.rand(10, 3, 4)
+    result = mymodule.add.map((0, 1, 2), (0, 1, 2))(a, b, _result='numpy')
 
-The tuples passed to map specify how to map dimensions for each argument. In this case we're mapping dimension 0 to dimension 0, dimension 1 to dimension 1 and dimension 2 to dimension 2 for both a and b. This 1-to-1 mapping is the default behaviour. 
+Here, the tuples passed to ``map`` explicitly define the mapping: dimension 0 maps to 0, dimension 1 to 1, and dimension 2 to 2 for both ``a`` and ``b``. This is the default behavior in SlangPy.
 
-Mapping works with named parameters too, which can be a little clearer:
+Alternatively, you can use named parameters for clarity:
 
-.. code-block:: python 
+.. code-block:: python
 
-    # Assume the slang add function has signature add(float3 a, float3 b)
-    a = np.random.rand(10,3,4)
-    b = np.random.rand(10,3,4)
-    result = mymodule.add.map(a=(0,1,2), b=(0,1,2))(a=a,b=b, _result='numpy')
+    # Assuming the add function has the signature add(float3 a, float3 b)
+    a = np.random.rand(10, 3, 4)
+    b = np.random.rand(10, 3, 4)
+    result = mymodule.add.map(a=(0, 1, 2), b=(0, 1, 2))(a=a, b=b, _result='numpy')
 
 ----
 
 **Mapping arguments with different dimensionalities**
 
-As we've already seen, unlike Numpy, SlangPy by design doesn't auto-pad dimensions. When this behaviour is desirable, explicit mapping can be used to tell SlangPy exactly how to map the smaller inputs to those of the overall kernel:
+Unlike NumPy, SlangPy does not auto-pad dimensions by default. If this behavior is needed, ``map`` lets you explicitly define how smaller inputs are aligned with the kernel:
 
-.. code-block:: python 
+.. code-block:: python
 
-    a = np.random.rand(8,8).astype(np.float32)
+    a = np.random.rand(8, 8).astype(np.float32)
     b = np.random.rand(8).astype(np.float32)
 
-    # Fails in SlangPy, as b is not auto-extended
-    result = mymodule.add(a,b, _result='numpy')
+    # This will fail in SlangPy, as `b` is not automatically extended:
+    result = mymodule.add(a, b, _result='numpy')
 
-    # Works, as we are explicilty mapping 
-    # This is equivalent to padding b with empty dimensions, as numpy would
-    # result[i,j] = a[i,j] + b[j]
-    result = mymodule.add.map(a=(0,1), b=(1,))(a=a,b=b, _result='numpy')
+    # Use explicit mapping instead:
+    # Equivalent to padding `b` as NumPy would
+    result = mymodule.add.map(a=(0, 1), b=(1,))(a=a, b=b, _result='numpy')
 
-    # The same thing (didn't need to specify a as 1-to-1 mapping is default)
-    result = mymodule.add.map(b=(1,))(a=a,b=b, _result='numpy')
+    # Alternatively, you can omit mapping for a as it defaults to 1-to-1:
+    result = mymodule.add.map(b=(1,))(a=a, b=b, _result='numpy')
 
 ----
 
@@ -69,55 +71,49 @@ As we've already seen, unlike Numpy, SlangPy by design doesn't auto-pad dimensio
 
 Another use case is performing some operation in which you wish to broadcast all the elements of one argument across the other. The simplest is the mathematical outer-product:
 
-.. code-block:: python 
+.. code-block:: python
 
-    # Assume the slang multiply function has signature multiply(float a, float b)
-    # a is mapped to dimension 0, giving kernel dimension [0] size 10
-    # b is mapped to dimension 1, giving kernel dimension [1] size 20
-    # overall kernel (and thus result) shape is (10,20)
-    # result[i,j] = a[i] * b[j]
+    # Assuming the multiply function has the signature multiply(float a, float b)
     a = np.random.rand(10).astype(np.float32)
     b = np.random.rand(20).astype(np.float32)
-    result = mymodule.multiply.map(a=(0,), b=(1,))(a=a,b=b, _result='numpy')
+
+    # Map dimensions:
+    # - a maps to dimension 0 (size 10)
+    # - b maps to dimension 1 (size 20)
+    # Resulting kernel and output shape: (10, 20)
+    result = mymodule.multiply.map(a=(0,), b=(1,))(a=a, b=b, _result='numpy')
 
 ----
 
 **Mapping to re-order dimensions**
 
-Similarly, dimension indices can be adjusted to re-order the dimensions of an argument. A trivial example to transpose a matrix (replace rows with columns) would be:
+Re-ordering argument dimensions is straightforward with ``map``. For example, to transpose a matrix:
 
-.. code-block:: python 
+.. code-block:: python
+
+    # Assuming the copy function has the signature float copy(float val)
+    a = np.random.rand(10, 20).astype(np.float32)
 
-    # Assume the slang copy function has signature float copy(float val)
-    # and just returns the value you pass it.
-    # result[i,j] = a[j,i]
-    a = np.random.rand(10,20).astype(np.float32)
-    result = mymodule.copy.map(val=(1,0))(val=a, _result='numpy')
+    # Swap rows and columns:
+    result = mymodule.copy.map(val=(1, 0))(val=a, _result='numpy')
 
 ----
 
 **Mapping to resolve ambiguities**
 
-Mapping can also be used to resolve ambiguities that would prevent SlangPy vectorizing normally. For example, consider the following generic function (from the `nested` section):
+``map`` can resolve ambiguities that would otherwise prevent SlangPy from vectorizing. For example:
 
-.. code-block::
+.. code-block:: python
 
-    void copy_generic<T>(T src, out T dest)
-    {
+    # A generic function from the 'nested' section:
+    void copy_generic<T>(T src, out T dest) {
         dest = src;
     }
 
-One way to resolve the ambiguities is to map dimensions as follows:
-
-.. code-block:: python
-
-    # Map argument types explicitly
+    # Explicitly map dimensions to remove ambiguity:
     src = np.random.rand(100).astype(np.float32)
     dest = np.zeros_like(src)
-    module.copy_generic.map(src=(0,), dest=(0,))(
-        src=src,
-        dest=dest
-    )
+    result = module.copy_generic.map(src=(0,), dest=(0,))(src=src, dest=dest)
 
 Slangpy now knows:
 
@@ -126,22 +122,18 @@ Slangpy now knows:
 
 Thus it can infer that you want to pass ``float`` into ``copy_generic`` and generates the correct kernel.
 
-Mapping types
+Mapping Types
 -------------
 
-Mapping can also be used to specify the type of the argument. Whilst this approach cannot be used 
-to re-order dimensions, it can be a more readable way to resolve simple ambiguities. For example, we
-could write the ``copy_generic`` call from above as follows:
+``map`` can also define argument types directly, which may improve readability for simple cases:
 
 .. code-block:: python
 
-    # Map argument types explicitly
     src = np.random.rand(100)
     dest = np.zeros_like(src)
-    module.copy_generic.map(src='float', dest='float')(
-        src=src,
-        dest=dest
-    )
+
+    # Map argument types explicitly:
+    result = module.copy_generic.map(src='float', dest='float')(src=src, dest=dest)
 
 Where in the previous example SlangPy inferred type from dimensionality, it now knows:
 
@@ -153,7 +145,11 @@ Thus it can infer that you want a 1D kernel.
 Summary
 -------
 
-This section has shown how to use the ``map`` function to fully control how arguments are mapped to kernel dimensions in SlangPy. This powerful functionality allows the vectorization of algorithms
-that are more than simply running the same function on many elements in an array.
+The ``map`` function in SlangPy provides powerful tools for customizing how arguments align with kernel dimensions. This capability allows you to:
 
+- Precisely control dimension mappings for arguments, enabling efficient vectorization of complex operations.
+- Handle cases where arguments have different dimensionalities by explicitly aligning dimensions, avoiding the need for auto-padding.
+- Perform operations like broadcasting (e.g., outer products) and reordering dimensions (e.g., matrix transposition) with ease.
+- Resolve ambiguities in generic functions, ensuring correct kernel generation and execution.
 
+These features make ``map`` particularly useful for machine learning algorithms, where operations often involve multi-dimensional data with varying shapes and alignment requirements. By enabling fine-grained control over dimension mappings, SlangPy helps optimize operations like tensor manipulations, matrix multiplications, and custom kernels, which are foundational to modern ML workflows.