diff --git a/docs/broadcasting.rst b/docs/broadcasting.rst
index d32cf10..65480f8 100644
--- a/docs/broadcasting.rst
+++ b/docs/broadcasting.rst
@@ -12,13 +12,13 @@ are passed either equally sized buffers, or single values. For example:
 
 .. code-block:: python
 
-    # Adding 2 buffers of equal size
+    # Adding all elements in 2 buffers of equal size
     mymodule.add(np.array([1, 2, 3]), np.array([4, 5, 6])
 
     # Adding a single value to every element of a buffer 
     mymodule.add(5, np.array([4, 5, 6])    
 
-    # Adding 2 2D buffers of the same shape (2x2)
+    # Adding all elements in 2 2D buffers of the same shape (2x2)
     mymodule.add(np.array([[1, 2], [3, 4]]), np.array([[5, 6], [7, 8]]))
 
 The process of taking the arguments and inferring how to vectorize the function is known as 
@@ -30,6 +30,8 @@ Before diving in, some terminology:
   a 2D buffer has a `dimensionality` of 2, a volume texture has a `dimensionality` of 3 etc.
 - `Shape`: The size of each dimension of a value. For example, a 1D buffer of size 3 has a `shape` of (3,), a 32x32x32 volume texture has a shape of (32,32,32).
 
+In effect, `dimensionality` is equal to the length of the `shape` tuple.
+
 Note: For those new to broadcasting, a common point of confusion is that a `3D vector` does **not** have
 a `dimensionality` of 3! Instead, it has a `dimensionality` of 1, and its `shape` is (3,).
 
@@ -69,16 +71,20 @@ and generates an output of a given **shape**:
     B       (5,3,4)
     Out     Error
 
-SlangPy will also support broadcasting a single value to all dimensions of the output. Conceptually,
-this is similar to adding dimensions of size 1 to the value until it matches the output's dimensionality:
+SlangPy will also support broadcasting a single value to all dimensions of the output. Programmatically,
+a single value can be thought of as a value that isn't indexed - its dimensionality is 0, and its shape 
+is (). 
+
+Conceptually, broadcasting the same value to all dimensions is similar to adding dimensions of size 
+1 to the value until it matches the output's dimensionality:
 
 .. code-block:: python
 
     # For a function Out[x,y,z] = A[x,y,z] + B[x,y,z]
 
     # A single value is broadcast to all dimensions of the output
-    # Out[x,y,z] = A[0] + B[x,y,z]
-    A       (1)
+    # Out[x,y,z] = A + B[x,y,z]
+    A       ()
     B       (10,3,4)
     Out     (10,3,4)
 
diff --git a/docs/mapping.rst b/docs/mapping.rst
new file mode 100644
index 0000000..3b60517
--- /dev/null
+++ b/docs/mapping.rst
@@ -0,0 +1,125 @@
+Mapping
+=======
+
+In the previous broadcasting section, we saw how SlangPy applies broadcasting rules to automatically vectorize a function. Mapping gives more control over this process by allowing the user to explicitly specify the relationship between argument and kernel dimensions.
+
+.. code-block:: python 
+
+    a = np.random.rand(10,3,4)
+    b = np.random.rand(10,3,4)
+    result = mymodule.add(a,b, _result=numpy)
+
+In this example:
+
+- ``a`` and ``b`` are `arguments` to the ``add`` kernel, each with shape ``(10,3,4)``. 
+- The kernel is dispatched with overall shape ``(10,3,4)``.
+- Any given thread, ``[i,j,k]``, reads ``a[i,j,k]`` and ``b[i,j,k]`` and writes ``result[i,j,k]``. 
+
+There is a simple 1-to-1 mapping of argument dimensions to kernel dimensions.
+
+Re-mapping dimensions
+---------------------
+
+``map`` can be used to change how argument dimensions correspond to kernel dimensions. In the above example, we could have written:
+
+.. code-block:: python 
+
+    a = np.random.rand(10,3,4)
+    b = np.random.rand(10,3,4)
+    result = mymodule.add.map((0,1,2), (0,1,2))(a,b, _result=numpy)
+
+The tuples passed to map specify how to map dimensions for each argument. In this case we're mapping dimension 0 to dimension 0, dimension 1 to dimension 1 and dimension 2 to dimension 2 etc. This 1-to-1 mapping is the default behaviour. 
+
+Mapping works with named parameters too, which can be a little clearer:
+
+.. code-block:: python 
+
+    # Assume the slang add function has signature add(float3 a, float3 b)
+    a = np.random.rand(10,3,4)
+    b = np.random.rand(10,3,4)
+    result = mymodule.add.map(a=(0,1,2), b=(0,1,2))(a=a,b=b, _result=numpy)
+
+----
+
+**Mapping arguments with different dimensionalities**
+
+As we've already seen, unlike Numpy, SlangPy by design doesn't auto-pad dimensions. When this behaviour is desirable, explicit mapping can be used to tell SlangPy exactly how to map the smaller inputs to those of the overall kernel:
+
+.. code-block:: python 
+
+    a = np.random.rand(8,8)
+    b = np.random.rand(8)
+
+    # Fails in SlangPy, as b is not auto-extended
+    result = mymodule.add(a,b, _result=numpy)
+
+    # Works, as we are explicilty mapping 
+    # This is equivalent to padding b with empty dimensions, as numpy would
+    # result[i,j] = a[i,j] + b[j]
+    result = mymodule.add.map(a=(0,1), b=(1,))(a=a,b=b, _result=numpy)
+
+    # The same thing (didn't need to specify a as 1-to-1 mapping is default)
+    result = mymodule.add.map(b=(1,))(a=a,b=b, _result=numpy)
+
+    # Also works, as we are explicilty mapping 
+    # result[i,j] = a[i,j] + b[i]
+    result = mymodule.add.map(b=(0,))(a=a,b=b, _result=numpy)
+
+----
+
+**Mapping arguments to different dimensions**
+
+Another use case is performing some operation in which you wish to broadcast all the elements of one argument across the other. The simplest is the mathematical outer-product:
+
+.. code-block:: python 
+
+    # Assume the slang multiply function has signature multiply(float a, float b)
+    # a is mapped to dimension 0, giving kernel dimension [0] size 10
+    # b is mapped to dimension 1, giving kernel dimension [1] size 20
+    # overall kernel (and thus result) shape is (10,20)
+    # result[i,j] = a[i] * b[j]
+    a = np.random.rand(10)
+    b = np.random.rand(20)
+    result = mymodule.multiply.map(a=(0,), b=(1,))(a=a,b=b, _result=numpy)
+
+----
+
+**Mapping to re-order dimensions**
+
+Similarly, dimension indices can be adjusted to re-order the dimensions of an argument. A trivial example to transpose a matrix (replace rows with columns) would be:
+
+.. code-block:: python 
+
+    # Assume the slang copy function has signature float copy(float val)
+    # and just returns the value you pass it.
+    # result[i,j] = a[j,i]
+    a = np.random.rand(10,20)
+    result = mymodule.copy.map(val=(1,0))(val=a, _result=numpy)
+
+----
+
+**Mapping to resolve ambiguities**
+
+In addition to performaning more complex broadcasting, mapping can also be used to resolve ambiguities that would prevent SlangPy vectorizing normally. For example, consider the following generic function (from the `nested` section):
+
+.. code-block::
+
+    void copy_generic<T>(T src, out T dest)
+    {
+        dest = src;
+    }
+
+One way to resolve the ambiguities is to map dimensions as follows:
+
+.. code-block:: python
+
+    # Map argument types explicitly
+    src = np.random.rand(100)
+    dest = np.zeros_like(a)
+    module.copy_generic.map(src=(0,), dest=(0,))(
+        src=src,
+        dest=dest
+    )
+
+By telling SlangPy that both `src` and `dest` should map 1 dimension, and they are both 1D arrays of floats, SlangPy can infer that you want to pass `float` into `copy_generic` and generates the correct kernel.
+
diff --git a/examples/broadcasting/broken_sampler.py b/examples/broadcasting/broken_sampler.py
index 97c2dd3..9e774bc 100644
--- a/examples/broadcasting/broken_sampler.py
+++ b/examples/broadcasting/broken_sampler.py
@@ -15,7 +15,6 @@
 
 # Load module
 module = spy.Module.load_from_file(device, "example.slang")
-"""
 
 # Add 2 identically shaped 2d float buffers
 a = np.random.rand(10, 5).astype(np.float32)
@@ -63,7 +62,7 @@
 print("")
 
 # Add a float3 and an array of 3 floats!
-a = sgl.float3(1,2,3)
+a = sgl.float3(1, 2, 3)
 b = np.random.rand(3).astype(np.float32)
 res = module.add_floats(a, b, _result='numpy')
 print(f"A Shape:   {a.shape}")
@@ -74,40 +73,38 @@
 # Should get a shape mismatch error, as slangpy won't 'pad' dimensions
 try:
     a = np.random.rand(3).astype(np.float32)
-    b = np.random.rand(5,3).astype(np.float32)
+    b = np.random.rand(5, 3).astype(np.float32)
     res = module.add_floats(a, b, _result='numpy')
 except ValueError as e:
-    #print(e)
+    # print(e)
     pass
 
-# Now using add_vectors(float3, float3), no shape mismatch error 
+# Now using add_vectors(float3, float3), no shape mismatch error
 # as a is treated as a single float3, and b is an array of 5 float3s,
 # and SlangPy will auto-pad single values.
 a = np.random.rand(3).astype(np.float32)
-b = np.random.rand(5,3).astype(np.float32)
+b = np.random.rand(5, 3).astype(np.float32)
 res = module.add_vectors(a, b, _result='numpy')
 print(f"A Shape:   {a.shape}")
 print(f"B Shape:   {b.shape}")
 print(f"Res Shape: {res.shape}")
 print("")
 
-"""
-
 # Create a sampler and texture
 sampler = device.create_sampler()
 tex = device.create_texture(width=32, height=32, format=sgl.Format.rgb32_float,
                             usage=sgl.ResourceUsage.shader_resource)
 tex.from_numpy(np.random.rand(32, 32, 3).astype(np.float32))
-"""
+
 
 # Sample the texture at a single UV coordinate. Results in 1 thread,
 # as the uv coordinate input is a single float 2.
-a = sgl.float2(0.5,0.5)
+a = sgl.float2(0.5, 0.5)
 res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy')
 print(f"A Shape: {a.shape}")
 print(f"Res Shape: {res.shape}")
 print(res)
-"""
+
 # Sample the texture at a single UV coordinate. Results in 1 thread,
 # as the uv coordinate input is a single float 2.
 ad = np.random.rand(20, 2).astype(np.float32)
@@ -117,4 +114,3 @@
 res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy')
 print(f"A Shape: {a.shape}")
 print(f"Res Shape: {res.shape}")
-print(res)
diff --git a/examples/broadcasting/example.slang b/examples/broadcasting/example.slang
index c5714fb..1bf0997 100644
--- a/examples/broadcasting/example.slang
+++ b/examples/broadcasting/example.slang
@@ -13,5 +13,5 @@ float3 add_vectors(float3 a, float3 b)
 
 float4 sample_texture_at_uv(float2 uv, SamplerState sampler, Texture2D<float4> texture)
 {
-    return texture.Sample(sampler, uv);
+    return texture.SampleLevel(sampler, uv, 0);
 }
diff --git a/examples/broadcasting/main.py b/examples/broadcasting/main.py
index 97c2dd3..024fa61 100644
--- a/examples/broadcasting/main.py
+++ b/examples/broadcasting/main.py
@@ -15,7 +15,6 @@
 
 # Load module
 module = spy.Module.load_from_file(device, "example.slang")
-"""
 
 # Add 2 identically shaped 2d float buffers
 a = np.random.rand(10, 5).astype(np.float32)
@@ -63,7 +62,7 @@
 print("")
 
 # Add a float3 and an array of 3 floats!
-a = sgl.float3(1,2,3)
+a = sgl.float3(1, 2, 3)
 b = np.random.rand(3).astype(np.float32)
 res = module.add_floats(a, b, _result='numpy')
 print(f"A Shape:   {a.shape}")
@@ -74,47 +73,42 @@
 # Should get a shape mismatch error, as slangpy won't 'pad' dimensions
 try:
     a = np.random.rand(3).astype(np.float32)
-    b = np.random.rand(5,3).astype(np.float32)
+    b = np.random.rand(5, 3).astype(np.float32)
     res = module.add_floats(a, b, _result='numpy')
 except ValueError as e:
-    #print(e)
+    # print(e)
     pass
 
-# Now using add_vectors(float3, float3), no shape mismatch error 
+# Now using add_vectors(float3, float3), no shape mismatch error
 # as a is treated as a single float3, and b is an array of 5 float3s,
 # and SlangPy will auto-pad single values.
 a = np.random.rand(3).astype(np.float32)
-b = np.random.rand(5,3).astype(np.float32)
+b = np.random.rand(5, 3).astype(np.float32)
 res = module.add_vectors(a, b, _result='numpy')
 print(f"A Shape:   {a.shape}")
 print(f"B Shape:   {b.shape}")
 print(f"Res Shape: {res.shape}")
 print("")
 
-"""
-
 # Create a sampler and texture
 sampler = device.create_sampler()
 tex = device.create_texture(width=32, height=32, format=sgl.Format.rgb32_float,
                             usage=sgl.ResourceUsage.shader_resource)
 tex.from_numpy(np.random.rand(32, 32, 3).astype(np.float32))
-"""
 
 # Sample the texture at a single UV coordinate. Results in 1 thread,
 # as the uv coordinate input is a single float 2.
-a = sgl.float2(0.5,0.5)
+a = sgl.float2(0.5, 0.5)
 res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy')
 print(f"A Shape: {a.shape}")
 print(f"Res Shape: {res.shape}")
-print(res)
-"""
-# Sample the texture at a single UV coordinate. Results in 1 thread,
-# as the uv coordinate input is a single float 2.
-ad = np.random.rand(20, 2).astype(np.float32)
-a = spy.NDBuffer(device, element_type=sgl.float2, shape=(20,))
-a.from_numpy(ad)
-print(a)
+
+# Sample the texture at 20 UV coordinates. Results in 20 threads.
+# Although the texture has shape [32,32,3] (32x32 pixels of float3s),
+# in this case it acts as a single value, as it is being passed to
+# a function that takes an [n,m,3] structure (a float3 texture). As a
+# result, the texture is effectively *broadcast* to all threads.
+a = np.random.rand(20, 2).astype(np.float32)
 res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy')
 print(f"A Shape: {a.shape}")
 print(f"Res Shape: {res.shape}")
-print(res)