diff --git a/docs/broadcasting.rst b/docs/broadcasting.rst index d32cf10..65480f8 100644 --- a/docs/broadcasting.rst +++ b/docs/broadcasting.rst @@ -12,13 +12,13 @@ are passed either equally sized buffers, or single values. For example: .. code-block:: python - # Adding 2 buffers of equal size + # Adding all elements in 2 buffers of equal size mymodule.add(np.array([1, 2, 3]), np.array([4, 5, 6]) # Adding a single value to every element of a buffer mymodule.add(5, np.array([4, 5, 6]) - # Adding 2 2D buffers of the same shape (2x2) + # Adding all elements in 2 2D buffers of the same shape (2x2) mymodule.add(np.array([[1, 2], [3, 4]]), np.array([[5, 6], [7, 8]])) The process of taking the arguments and inferring how to vectorize the function is known as @@ -30,6 +30,8 @@ Before diving in, some terminology: a 2D buffer has a `dimensionality` of 2, a volume texture has a `dimensionality` of 3 etc. - `Shape`: The size of each dimension of a value. For example, a 1D buffer of size 3 has a `shape` of (3,), a 32x32x32 volume texture has a shape of (32,32,32). +In effect, `dimensionality` is equal to the length of the `shape` tuple. + Note: For those new to broadcasting, a common point of confusion is that a `3D vector` does **not** have a `dimensionality` of 3! Instead, it has a `dimensionality` of 1, and its `shape` is (3,). @@ -69,16 +71,20 @@ and generates an output of a given **shape**: B (5,3,4) Out Error -SlangPy will also support broadcasting a single value to all dimensions of the output. Conceptually, -this is similar to adding dimensions of size 1 to the value until it matches the output's dimensionality: +SlangPy will also support broadcasting a single value to all dimensions of the output. Programmatically, +a single value can be thought of as a value that isn't indexed - its dimensionality is 0, and its shape +is (). + +Conceptually, broadcasting the same value to all dimensions is similar to adding dimensions of size +1 to the value until it matches the output's dimensionality: .. code-block:: python # For a function Out[x,y,z] = A[x,y,z] + B[x,y,z] # A single value is broadcast to all dimensions of the output - # Out[x,y,z] = A[0] + B[x,y,z] - A (1) + # Out[x,y,z] = A + B[x,y,z] + A () B (10,3,4) Out (10,3,4) diff --git a/docs/mapping.rst b/docs/mapping.rst new file mode 100644 index 0000000..3b60517 --- /dev/null +++ b/docs/mapping.rst @@ -0,0 +1,125 @@ +Mapping +======= + +In the previous broadcasting section, we saw how SlangPy applies broadcasting rules to automatically vectorize a function. Mapping gives more control over this process by allowing the user to explicitly specify the relationship between argument and kernel dimensions. + +.. code-block:: python + + a = np.random.rand(10,3,4) + b = np.random.rand(10,3,4) + result = mymodule.add(a,b, _result=numpy) + +In this example: + +- ``a`` and ``b`` are `arguments` to the ``add`` kernel, each with shape ``(10,3,4)``. +- The kernel is dispatched with overall shape ``(10,3,4)``. +- Any given thread, ``[i,j,k]``, reads ``a[i,j,k]`` and ``b[i,j,k]`` and writes ``result[i,j,k]``. + +There is a simple 1-to-1 mapping of argument dimensions to kernel dimensions. + +Re-mapping dimensions +--------------------- + +``map`` can be used to change how argument dimensions correspond to kernel dimensions. In the above example, we could have written: + +.. code-block:: python + + a = np.random.rand(10,3,4) + b = np.random.rand(10,3,4) + result = mymodule.add.map((0,1,2), (0,1,2))(a,b, _result=numpy) + +The tuples passed to map specify how to map dimensions for each argument. In this case we're mapping dimension 0 to dimension 0, dimension 1 to dimension 1 and dimension 2 to dimension 2 etc. This 1-to-1 mapping is the default behaviour. + +Mapping works with named parameters too, which can be a little clearer: + +.. code-block:: python + + # Assume the slang add function has signature add(float3 a, float3 b) + a = np.random.rand(10,3,4) + b = np.random.rand(10,3,4) + result = mymodule.add.map(a=(0,1,2), b=(0,1,2))(a=a,b=b, _result=numpy) + +---- + +**Mapping arguments with different dimensionalities** + +As we've already seen, unlike Numpy, SlangPy by design doesn't auto-pad dimensions. When this behaviour is desirable, explicit mapping can be used to tell SlangPy exactly how to map the smaller inputs to those of the overall kernel: + +.. code-block:: python + + a = np.random.rand(8,8) + b = np.random.rand(8) + + # Fails in SlangPy, as b is not auto-extended + result = mymodule.add(a,b, _result=numpy) + + # Works, as we are explicilty mapping + # This is equivalent to padding b with empty dimensions, as numpy would + # result[i,j] = a[i,j] + b[j] + result = mymodule.add.map(a=(0,1), b=(1,))(a=a,b=b, _result=numpy) + + # The same thing (didn't need to specify a as 1-to-1 mapping is default) + result = mymodule.add.map(b=(1,))(a=a,b=b, _result=numpy) + + # Also works, as we are explicilty mapping + # result[i,j] = a[i,j] + b[i] + result = mymodule.add.map(b=(0,))(a=a,b=b, _result=numpy) + +---- + +**Mapping arguments to different dimensions** + +Another use case is performing some operation in which you wish to broadcast all the elements of one argument across the other. The simplest is the mathematical outer-product: + +.. code-block:: python + + # Assume the slang multiply function has signature multiply(float a, float b) + # a is mapped to dimension 0, giving kernel dimension [0] size 10 + # b is mapped to dimension 1, giving kernel dimension [1] size 20 + # overall kernel (and thus result) shape is (10,20) + # result[i,j] = a[i] * b[j] + a = np.random.rand(10) + b = np.random.rand(20) + result = mymodule.multiply.map(a=(0,), b=(1,))(a=a,b=b, _result=numpy) + +---- + +**Mapping to re-order dimensions** + +Similarly, dimension indices can be adjusted to re-order the dimensions of an argument. A trivial example to transpose a matrix (replace rows with columns) would be: + +.. code-block:: python + + # Assume the slang copy function has signature float copy(float val) + # and just returns the value you pass it. + # result[i,j] = a[j,i] + a = np.random.rand(10,20) + result = mymodule.copy.map(val=(1,0))(val=a, _result=numpy) + +---- + +**Mapping to resolve ambiguities** + +In addition to performaning more complex broadcasting, mapping can also be used to resolve ambiguities that would prevent SlangPy vectorizing normally. For example, consider the following generic function (from the `nested` section): + +.. code-block:: + + void copy_generic(T src, out T dest) + { + dest = src; + } + +One way to resolve the ambiguities is to map dimensions as follows: + +.. code-block:: python + + # Map argument types explicitly + src = np.random.rand(100) + dest = np.zeros_like(a) + module.copy_generic.map(src=(0,), dest=(0,))( + src=src, + dest=dest + ) + +By telling SlangPy that both `src` and `dest` should map 1 dimension, and they are both 1D arrays of floats, SlangPy can infer that you want to pass `float` into `copy_generic` and generates the correct kernel. + diff --git a/examples/broadcasting/broken_sampler.py b/examples/broadcasting/broken_sampler.py index 97c2dd3..9e774bc 100644 --- a/examples/broadcasting/broken_sampler.py +++ b/examples/broadcasting/broken_sampler.py @@ -15,7 +15,6 @@ # Load module module = spy.Module.load_from_file(device, "example.slang") -""" # Add 2 identically shaped 2d float buffers a = np.random.rand(10, 5).astype(np.float32) @@ -63,7 +62,7 @@ print("") # Add a float3 and an array of 3 floats! -a = sgl.float3(1,2,3) +a = sgl.float3(1, 2, 3) b = np.random.rand(3).astype(np.float32) res = module.add_floats(a, b, _result='numpy') print(f"A Shape: {a.shape}") @@ -74,40 +73,38 @@ # Should get a shape mismatch error, as slangpy won't 'pad' dimensions try: a = np.random.rand(3).astype(np.float32) - b = np.random.rand(5,3).astype(np.float32) + b = np.random.rand(5, 3).astype(np.float32) res = module.add_floats(a, b, _result='numpy') except ValueError as e: - #print(e) + # print(e) pass -# Now using add_vectors(float3, float3), no shape mismatch error +# Now using add_vectors(float3, float3), no shape mismatch error # as a is treated as a single float3, and b is an array of 5 float3s, # and SlangPy will auto-pad single values. a = np.random.rand(3).astype(np.float32) -b = np.random.rand(5,3).astype(np.float32) +b = np.random.rand(5, 3).astype(np.float32) res = module.add_vectors(a, b, _result='numpy') print(f"A Shape: {a.shape}") print(f"B Shape: {b.shape}") print(f"Res Shape: {res.shape}") print("") -""" - # Create a sampler and texture sampler = device.create_sampler() tex = device.create_texture(width=32, height=32, format=sgl.Format.rgb32_float, usage=sgl.ResourceUsage.shader_resource) tex.from_numpy(np.random.rand(32, 32, 3).astype(np.float32)) -""" + # Sample the texture at a single UV coordinate. Results in 1 thread, # as the uv coordinate input is a single float 2. -a = sgl.float2(0.5,0.5) +a = sgl.float2(0.5, 0.5) res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy') print(f"A Shape: {a.shape}") print(f"Res Shape: {res.shape}") print(res) -""" + # Sample the texture at a single UV coordinate. Results in 1 thread, # as the uv coordinate input is a single float 2. ad = np.random.rand(20, 2).astype(np.float32) @@ -117,4 +114,3 @@ res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy') print(f"A Shape: {a.shape}") print(f"Res Shape: {res.shape}") -print(res) diff --git a/examples/broadcasting/example.slang b/examples/broadcasting/example.slang index c5714fb..1bf0997 100644 --- a/examples/broadcasting/example.slang +++ b/examples/broadcasting/example.slang @@ -13,5 +13,5 @@ float3 add_vectors(float3 a, float3 b) float4 sample_texture_at_uv(float2 uv, SamplerState sampler, Texture2D texture) { - return texture.Sample(sampler, uv); + return texture.SampleLevel(sampler, uv, 0); } diff --git a/examples/broadcasting/main.py b/examples/broadcasting/main.py index 97c2dd3..024fa61 100644 --- a/examples/broadcasting/main.py +++ b/examples/broadcasting/main.py @@ -15,7 +15,6 @@ # Load module module = spy.Module.load_from_file(device, "example.slang") -""" # Add 2 identically shaped 2d float buffers a = np.random.rand(10, 5).astype(np.float32) @@ -63,7 +62,7 @@ print("") # Add a float3 and an array of 3 floats! -a = sgl.float3(1,2,3) +a = sgl.float3(1, 2, 3) b = np.random.rand(3).astype(np.float32) res = module.add_floats(a, b, _result='numpy') print(f"A Shape: {a.shape}") @@ -74,47 +73,42 @@ # Should get a shape mismatch error, as slangpy won't 'pad' dimensions try: a = np.random.rand(3).astype(np.float32) - b = np.random.rand(5,3).astype(np.float32) + b = np.random.rand(5, 3).astype(np.float32) res = module.add_floats(a, b, _result='numpy') except ValueError as e: - #print(e) + # print(e) pass -# Now using add_vectors(float3, float3), no shape mismatch error +# Now using add_vectors(float3, float3), no shape mismatch error # as a is treated as a single float3, and b is an array of 5 float3s, # and SlangPy will auto-pad single values. a = np.random.rand(3).astype(np.float32) -b = np.random.rand(5,3).astype(np.float32) +b = np.random.rand(5, 3).astype(np.float32) res = module.add_vectors(a, b, _result='numpy') print(f"A Shape: {a.shape}") print(f"B Shape: {b.shape}") print(f"Res Shape: {res.shape}") print("") -""" - # Create a sampler and texture sampler = device.create_sampler() tex = device.create_texture(width=32, height=32, format=sgl.Format.rgb32_float, usage=sgl.ResourceUsage.shader_resource) tex.from_numpy(np.random.rand(32, 32, 3).astype(np.float32)) -""" # Sample the texture at a single UV coordinate. Results in 1 thread, # as the uv coordinate input is a single float 2. -a = sgl.float2(0.5,0.5) +a = sgl.float2(0.5, 0.5) res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy') print(f"A Shape: {a.shape}") print(f"Res Shape: {res.shape}") -print(res) -""" -# Sample the texture at a single UV coordinate. Results in 1 thread, -# as the uv coordinate input is a single float 2. -ad = np.random.rand(20, 2).astype(np.float32) -a = spy.NDBuffer(device, element_type=sgl.float2, shape=(20,)) -a.from_numpy(ad) -print(a) + +# Sample the texture at 20 UV coordinates. Results in 20 threads. +# Although the texture has shape [32,32,3] (32x32 pixels of float3s), +# in this case it acts as a single value, as it is being passed to +# a function that takes an [n,m,3] structure (a float3 texture). As a +# result, the texture is effectively *broadcast* to all threads. +a = np.random.rand(20, 2).astype(np.float32) res = module.sample_texture_at_uv(a, sampler, tex, _result='numpy') print(f"A Shape: {a.shape}") print(f"Res Shape: {res.shape}") -print(res)