diff --git a/ipa-multipoint/docs/1-vcs-high-level.md b/ipa-multipoint/docs/1-vcs-high-level.md
new file mode 100644
index 0000000..37cfe71
--- /dev/null
+++ b/ipa-multipoint/docs/1-vcs-high-level.md
@@ -0,0 +1,156 @@
+# Vector Commitment Scheme - High Level
+
+_Familiarity with binary merkle trees is assumed._
+
+## Commitment Scheme
+
+Commitment schemes in general are at the heart of every scenario where you want to prove something to another person. Lets list two examples from our daily lives.
+
+**Lottery**
+
+Before you are able to see the winning results of a lottery, you must first commit to your choice of numbers. This commitment will allow you to prove that you did indeed choose these numbers _before_ seeing the results. This commitment is often referred to as a lottery ticket.
+
+> We cannot trust people to be honest about their results, or more generously, we cannot trust people to attest to the truth; they could have bad memory.
+
+If you trust everyone to tell the truth or if it is not advantageous for a rational actor to lie, then you _might_ be able to omit the commitment scheme. This is not usually the case, especially in a scenario where it may be impossible to find out the truth.
+
+> Sometimes we cannot even assume that actors will behave rationally!
+
+> There are certain features that a lottery ticket must have like not being able to edit it after the fact. Many of these features draw a parallel with vector commitment schemes.
+
+**Registration and Login**
+
+A lot of social applications require you to prove your digital identity to use them. There are two stages;
+
+- **Registration**: This is where you put in your details such as your email address, name, password and phone number. You can think of this as a commitment to a particular identity.
+- **Login**: This is where you use the email address and password from registration to prove that you are the same person. Ideally, only you know these login details.
+
+> Without the registration phase, you would not be able to later prove your digital identity.
+
+As you can see, commitment schemes are crucial where one needs to prove something after an event has happened. This analogy also carries over to the cryptographic settings we will consider.
+
+## Why do we need a commitment scheme?
+
+- For the lottery example, one could call it a **ticket commitment scheme**.
+- For the registration example, one could call it an **identity commitment scheme**.
+- For verkle trees and indeed merkle trees, we need a **vector commitment scheme**.
+
+Analogously, this means that we need to commit to a vector and later attest to values in that vector.
+
+> As a spoiler, with verkle/merkle trees, when one is tasked with proving that a particular value is in the tree, we can reduce this to many instances of proving that particular values are in a vector.
+
+## Brief overview of a vector
+
+Think of a vector as a list of items where the length of the vector and the position of each item is also a part of the definition.
+
+**Example 1**
+
+$v_1 =<a,b,c>$
+
+$v_2 =<b,a,c>$
+
+Here the vectors $v_1$ and $v_2$ are not equal because the first and second items in the vectors are not equal. This may seem obvious but it is not true for mathematical objects such as sets.
+
+**Example 2**
+
+$v_1 = <1,2,3>$
+
+$v_2 =<1,2,3,3>$
+
+Here the vectors are also not equal, because their lengths are not equal. Note also that as a set, they would be considered equal.
+
+> We will later see that vector commitment schemes, must encode both of these properties (position of each item and length of the vector) when committing to a vector.
+
+## Binary Merkle Tree as a vector commitment scheme
+
+![](https://i.imgur.com/bnCVsy0.png)
+*Figure1: Image of a binary merkle tree*
+
+First bring your attention to $H_a, H_b$ in Figure 1. One can define some function $f_c$ which takes both of these values as inputs and transforms them into a single output value $H_{ab}$.
+
+**Encoding the position**
+
+We specify that $f_c(H_a,H_b)$ should not be equal to $f_c(H_b, H_a)$. This means that the function $f_c$ implicitly encodes the positions of its input values. In this case $H_{ab}$ conveys the fact that $H_a$ is first and $H_b$ is second.
+
+**Encoding the length**
+
+Another property of $f_c$ is that $f_c(H_a, H_b,k)$ should not equal $f_c(H_a, H_b)$, meaning that $f_c$ should also encode the number of inputs, which is conversely the length of the vector. (Even if $k$ has a value of $0$)
+
+Elaborating, if there are two items as inputs, one should not get the same answer when there are three items. No matter what the third input is.
+
+**Committing to a vector**
+
+We now ask the reader to view $H_a$ and $H_b$ as two elements in a vector; ie $<H_a, H_b>$. The function $f_c$ allows us to commit to such a vector, encoding the length of the vector and the position of each element in the vector. In the above merkle tree, one can repeatedly use $f_c$ until we arrive at the top of the tree. The final output at the top is denoted as the _root_.
+
+By induction, we can argue that the root is summary of all of the items below it. Whether the summary is succinct, depends on $f_c$.
+
+> Popular choices for $f_c$ include the following hash functions: sha256, blake2s and keccak. But one could just as easily define it to be the concatentation of the input.
+
+**Opening a value**
+
+Say we are given the root $H_{abcdefgh}$ in Figure 1 and we want to show that $H_b$ is indeed a part of the tree that this root represents.
+
+To show that $H_b$ is in the tree with root $H_{abcdefgh}$, we can do it by showing:
+
+- $H_{abcd}$ is the first element in the vector $<H_{abcd}, H_{efgh}>$ and applying $f_c$ to this vector yields $H_{abcdefgh}$
+- Then we can show that $H_{ab}$ is the first element in the vector $<H_{ab}, H_{cd}>$ and applying $f_c$ to the vector yields $H_{abcd}$
+- Finally, we can show that $H_b$ is the second element in the vector $<H_a, H_b>$ and applying $f_c$ to the vector yields $H_{ab}$
+
+We now define a new function $f_o$ to *show that an element is in a certain position in a vector and that when $f_c$ is applied to said vector, it yields an expected value*
+
+$f_o$ takes four arguments:
+
+- A commitment to a vector $C_v$. This is the output of $f_c$ on a vector.
+- An index, $i$
+- An element in some vector, $e_v$
+- A proof $\pi$ attesting to the fact that $C_v$ is the commitment to $v$, and $e_v$ is the element at index $i$ of $v$.
+
+$f_o$ returns true if for some vector $v$:
+
+- $C_v$ is the commitment of $v$. i.e. $f_c(v) = C_v$
+- The i'th element in $v$ is indeed $e_v$. i.e. $v[i] = e_v$
+
+**Example**
+
+Lets use $f_o$ to demonstrate us checking:
+
+> $H_{abcd}$ is the first element in the vector $<H_{abcd}, H_{efgh}>$ and applying $f_c$ to this vector yields $H_{abcdefgh}$
+
+$C_v= H_{abcdefgh}$
+$i = 0$ (zero indicates the first element)
+$e_v = H_{abcd}$
+
+if $f_o(H_{abcdefgh}, 0, H_{abcd}, \pi)$ returns true, then we can be sure that $H_{abcdefgh}$ commits to some vector $v$ using $f_c$ and at the first index of that vector, we have the value $H_{abcd}$.
+
+> We must trust that $H_{abcdefgh}$ was computed correctly, ie it corresponds to the tree in question. This is outside the scope of verkle/merkle trees in general and is usually handled by some higher level protocol.
+
+**What is $\pi$ ?**
+
+For a binary merkle tree, $\pi$ would be $H_{efgh}$. Now given $H_{abcd}$ and $\pi$, we can apply $f_c$ to check that $C_v = f_c(H_{abcd}, \pi)$ . This also allows us to check that $H_{abcd}$ is the first element in the vector.
+
+**Proof cost For Binary Merkle Tree**
+
+For a binary merkle tree, our vectors have size $2$ and so $\pi$ only has to contain 1 extra element to show $C_v = f_c(a, \pi)$. If we had a hexary merkle tree, where our vector had 16 elements, $\pi$ would need to contain 15 elements. Hence the proof grows in proportion to the vector sizes that we are using for merkle trees.
+
+Even more disparaging, is that fact that there is not just one $\pi$. In our case there is actually 3 $\pi$ to show $H_b$ is in the tree. The overall proof size thus also grows, with the amount of vectors/levels/depths.
+
+In general, we can compute the overall proof size by first defining the number of items in the tree, this is also known as the tree width $t_w$, we then define the size of our vectors, this is sometimes referred to as the node width $n_w$: We can compute the proof size with : $$log_{n_w}(t_w) * (n_w) = \text{depth} * n_w$$
+
+## Verkle Tree Improvements
+
+The problem with $f_c$ being a hash function like sha256 in the case of a merkle tree is that in order to attest to a single value that was hashed, we need to reveal everything in the hash. The main reason being that these functions by design do not preserve the structure of the input. For example, $\text{sha256(a)}$ + $\text{sha256(b)}$ != $\text{sha256(a + b)}$.
+
+Fortunately, we only require a property known as collision resistance and there are many other vector commitment schemes in the literature which are more efficient and do not require all values for the opening. Depending on the one you choose, there are indeed different trade offs to consider.
+
+Some trade offs to consider are:
+
+- Proof creation time; How long it takes to make $\pi$
+- Proof verification time; How long it takes to verify $\pi$
+
+Moreover, with some of the schemes in the wider literature, it is possible to aggregate many proofs together so one only needs to verify a single proof $\pi$. With this in mind, it may be unsurprising that with verkle trees, the node width/vector size has increased substantially, since the proof size in the chosen scheme does not grow linearly with the node width.
+
+## Summary
+
+- Merkle trees use a vector commitment scheme which is really inefficient.
+- Verkle trees use a commitment scheme which has better efficiency for proof size and allows one to minimise the proof size using aggregation.
+- Verkle trees also increase the node width, which decreases the depth of the tree.
diff --git a/ipa-multipoint/docs/2-vcs-multipoint-arg.md b/ipa-multipoint/docs/2-vcs-multipoint-arg.md
new file mode 100644
index 0000000..1fe6d6a
--- /dev/null
+++ b/ipa-multipoint/docs/2-vcs-multipoint-arg.md
@@ -0,0 +1,299 @@
+# Vector Commitment Scheme - Multipoint/Index
+
+## Vector Commitment Scheme vs Polynomial Commitment Scheme
+
+We may use these two terms interchangeably however they are not the same, a vector commitment scheme is strictly more powerful than a polynomial commitment scheme. One can take the dot product between two vectors and if one vector is of the form $<1, t, t^2, t^3,..., t^n>$ then one can realise the dot product as the evaluation of a polynomial in monomial basis at the point $t$.
+
+Converting a vector to a polynomial can be done by either interpreting the elements in the vector as the coefficients for the polynomial or interpreting the elements as evaluations of the polynomial. Hence, we can state our schemes in terms of a polynomial commitment scheme and the translation would be done as mentioned above.
+
+Similarly, the term multipoint will be used when referring to a polynomial commitment scheme and multi-index when referring to a vector commitment scheme. they mean the same thing, but just in different contexts.
+
+## Introduction
+
+A vector commitment scheme allows you to prove that an element $e$ in a vector $v$ is indeed at some specific index $i$, ie the fact that $v[i]=e$.
+
+A multi-index vector commitment scheme, takes in a list of vectors $v_k$, a list of indices $i_k$ and a list of values $e_k$ and produces a proof that for all $k$ the following holds : $v_k[i_k]=e_k$.
+
+One could simply call a single index vector commitment scheme $k$ times and produce $k$ proofs in order to simulate a multi-index vector commitment scheme. However we are interested in multi-index vector commitment schemes which are more efficient than doing this. The most common strategy to do this, is to call a function which aggregates all of the tuples $(v_k, i_k, e_k)$ into a single tuple and calls the single index vector commitment scheme on the aggregated tuples.
+
+This is the strategy that our specific algorithm will also follow.
+
+## Assumptions
+
+The particular single index vector commitment scheme being used does not matter. We only require it to be homomorphic.
+
+This means that commitments to polynomials can be summed, and the result will be a commitment to the sum of polynomials.
+
+> KZG and IPA/bulletproofs both have this property. Hash based commitment schemes do not have this property.
+
+# Multipoint scheme
+
+## Singlepoint scheme
+
+We describe a singlepoint polynomial scheme using the following algorithms:
+
+- $\text{Commit}$
+- $\text{Prove}$
+- $\text{Verify}$
+
+**Commit**
+
+Input: A univariate polynomial, $f(X)$
+Output: A commitment to $f(X)$ denoted $[f(X)]$ or $C$
+
+**Prove**
+
+Input: A polynomial $f(X)$, an evaluation point $z$ and a purported evaluated point $y=f(z)$
+Output : A proof $\pi$ that the polynomial $f(X)$ gives a value of $y$ when evaluated at $z$
+
+**Verify**
+
+Input: A proof $\pi$, a commitment $C$ to a polynomial, an evaluation point $z$ and a purported evaluation $y$
+Output: True if the committed polynomial in $C$ does indeed evaluate to $y$ on $z$
+
+## Quotient styled commitment schemes
+
+A quotient styled polynomial commitment scheme is one which uses the factor theorem, in order to provide opening proofs. The factor theorem, is well known, so the proof is omitted for brevity.
+
+*Theorem 1:* Given a polynomial $p(X)$, if $p(t)=0$ then $(X-t)$ factors $p(X)$.
+
+*Theorem 2: Given a polynomial $p(X)$, if $p(k) = r$, then there exists a polynomial $q(X) = \frac{p(X)-r}{X-k}$*
+
+- If $p(k) = r$, then this implies that $p(k)-r = 0$
+- Let $p_1(X) = p(X)-r$, this means we have $p_1(k)=0$
+- Using Theorem 1, this implies that $(X-k)$ factors $p_1(X)$
+- Which implies the following equation $p_1(X) = q(X)(X-k)$ for some $q(X)$
+- Rearranging, we have $q(X)= \frac{p_1(X)}{X-k} = \frac{p(X)-r}{X-k}$
+
+Observe that $q(X)$, the quotient, is only a polynomial, if $p(k) = r$. If it is not, then $q(X)$ will be a rational function. We then use the fact that a polynomial commitment scheme is only able to commit to polynomials, in order to provide soundness guarantees.
+
+---
+
+In what follows, we will describe the multipoint scheme using the singe point scheme as an opaque algorithm.
+
+---  
+
+## Statement
+
+Given $m$ commitments $C_0 = [f_0(X)] ... C_{m-1} = [f_{m-1}(X)]$, we want to prove evaluations:
+
+$$
+    f_0(z_0) = y_0 \\\vdots \\f_{m-1}(z_{m-1}) = y_{m-1}
+$$
+
+where $z_i \in \{0,...,d-1\}$
+
+**Observations**
+
+- $C_0 = [f_0(X)]$ refers to a commitment to the univariate polynomial $f_0(X)$
+- The evaluation points must be taken from the domain $[0,d)$, we can apply this restriction without loss of generality. Noting that $d$ will be the length of our vectors.
+- It is possible to open the same polynomial at different points and different polynomials at the same points.
+- It is also possible to open the same polynomial twice at the same point, it would only be wasting time.
+
+## Proof
+
+We will first detail two sub-optimal proofs for explanation purposes and optimise after. For the final proof, you can click [here](#proof---final)
+
+---
+
+We use $H(\cdot)$ to denote a hash function which can heuristically be realised as a random oracle.
+
+---
+
+1. Let $r \leftarrow H(C_0,...C_{m-1}, z_0, ..., z_{m-1}, y_0, ..., y_{m-1})$
+
+$$
+g(X) =  r^0 \frac{f_0(X) - y_0}{X-z_0} + r^1 \frac{f_1(X) - y_1}{X-z_1} + \ldots +r^{m-1} \frac{f_{m-1}(X) - y_{m-1}}{X-z_{m-1}}
+$$
+
+The prover starts off by committing to $g(X)$ using the commit function from the single point commitment scheme, we denote this by $D$ or $[g(X)]$.
+
+The prover's job is to now convince the verifier that $D$ is a commitment to a polynomial $g(X)$. We do this by evaluating $g(X)$ at some random point $t$. If $g(X)$ is not a polynomial, then it is not possible to commit to it.
+
+2. Let $t \leftarrow H(r,D)$
+
+We split the evaluation of $g(X)$ into two parts $g_1(t)$ and $g_2(t)$, $g_2(t)$ can be computed by the verifier, while $g_1(t)$ cannot, because it involves random evaluations at the polynomials $f_i(X)$.
+
+> - The verifier is able to compute the $g_2(t)$.
+> - The prover will compute $g_1(t)$ and send a proof of it's correctness.
+
+$$
+g_1(t) = \sum_{i=0}^{m-1}{r^i \frac{f_i(t)}{t-z_i}}
+$$
+
+$$
+g_2(t) = \sum_{i=0}^{m-1} {r^i \frac{y_i}{t-z_i}}
+$$
+
+We note that $g_1(X) = r^i \frac{f_i(X)}{X-z_i}$, however, we specify it as $r^i \frac{f_i(X)}{t-z_i}$ because the latter is also able to prove an opening for $g_1(t)$ **and** the verifier is able to compute the commitment for it.
+
+Now we form two proofs using a single point polynomial commitment scheme:
+
+- One for $g_1(X)$ at $t$. We call this $\pi$. This is computed using $\text{Prove}(g_1(X), t, g_1(t))$
+- One for $g(X)$ at $t$. We call this $\rho$. This is computed using $\text{Prove}(g(X), t, g(t))$
+
+The proof consists of $D, (\pi, g_1(t)), \rho$
+
+## Verification
+
+The Verifier ultimately wants to verify that $D$ is the commitment to the polynomial $g(x)$.
+
+The verifier computes the challenges $r$ and $t$.
+
+The verifier also computes $g_2(t)$, we mentioned above that they can do this by themselves.
+
+### Computing $g(t)$
+
+The verifier now needs to compute $g(t)$:
+
+$g(t) = g_1(t) - g_2(t)$
+
+- $g_1(t)$ was supplied in the proof.
+- $g_2(t)$ can be computed by the verifier.
+
+Hence the verifier can compute $g(t)$.
+
+**Note however, the verifier cannot be sure that $g_1(t)$ is the correct computation by the prover ie they cannot be sure that it is indeed the evaluation of $g_1(X)$ at $t$. They need to build $[g_1(X)]$ themselves and verify it against $g_1(t)$**
+
+#### Computing $[g_1(X)]$
+
+Consider $g_1(X)$:
+
+$$
+g_1(X) = r^i \frac{f_i(X)}{t-z_i}
+$$
+
+$[g_1(X)]$ is therefore:
+
+$$
+[g_1(X)] = \frac{r_i}{t-z_i}C_i
+$$
+
+The verifier is able to compute this commitment themselves, and so is able to verify that $g_1(t)$ was computed correctly using the $\text{Verify}$ function .
+
+The verifier now calls $\text{Verify}([g_1(X)], g_1(t), \pi)$ and aborts if the return value is false.
+
+#### Correctness of $g(t)$
+
+Since $g_1(t)$ was verified to be correct and $g_2(t)$ was computed by the verifier, $g(t)$ is correct.
+
+## Verify $g(x)$ at $t$
+
+The verifier now calls $\text{Verify}(D, g(t), \rho)$ and aborts if the return value is false.
+
+## Aggregated Proof
+
+In the above protocol, the prover needed to compute two proofs, one for $g(X)$ and another for $g_1(X)$. We now present a protocol which aggregates both proofs together.
+
+---
+
+3. Let $q \leftarrow H(t, [g_1(X)])$
+
+> The prover no longer computes an IPA proof for $g_1(X)$ and $g(X)$ instead they combine both polynomials using a new random challenge $q$.
+
+$g_3(X) = g_1(X) + q \cdot g(X)$
+
+Now we form an single polynomial commitment scheme proof for $g_3(X)$ at $t$. Lets call this $\sigma$. This is computed using $\text{Prove}(g_3(X), t, g_3(t))$
+
+The prover still computes $g_1(t)$.
+
+The proof consists of $D, \sigma, g_1(t)$
+
+## Aggregated Verification
+
+In the previous step, the verifier called $\text{Verify}([g_1(X)], g_1(t), \pi)$. Instead they now delay this verification and instead compute the commitment to the aggregated polynomials and the evaluation of the aggregated polynomial at $t$:
+
+- $[g_3(X)] = [g_1(X)] + q \cdot [g(X)]$
+- $g_3(t) = g_1(t) + q \cdot g(t)$
+
+The verifier now computes $\text{Verify}([g_3(X)], g_3(t), \sigma)$
+
+> With overwhelming probability over $q$ this will only return true iff $[g_1(X)]$ and $[g(X)]$ opened at $t$ are $g_1(t)$ and $g(t)$ respectively.
+
+## Opening $g_2(X)$
+
+This optimisation allows us to reduce the proof size by one element, by revisiting $g(X)$ and opening at $g_2(X)$. The gist is that if we open at $g_2(X)$ then we do not need to send any evaluations since the verifier can compute this themselves.  
+
+In particular, we opened the polynomial : $g_3(X) = g_1(X) + q \cdot g(X)$
+
+- First note that $g(X) = g_1(X) - g_2(X)$ which implies that $g_2(X) =g_1(X) - g(X)$
+- It is argued that if the verifier can open $g_2(X)$ at $t$ using $D = [g(X)]$, then this implies that $g(X)$ can be correctly opened at $t$ using $[g(X)]$.
+
+We now list out the full protocol using this optimisation.
+
+## Proof - Final
+
+1. Let $r \leftarrow H(C_0,...C_{m-1}, z_0, ..., z_{m-1}, y_0, ..., y_{m-1})$
+
+$$
+g(X) =  r^0 \frac{f_0(X) - y_0}{X-z_0} + r^1 \frac{f_1(X) - y_1}{X-z_1} + \ldots +r^{m-1} \frac{f_{m-1}(X) - y_{m-1}}{X-z_{m-1}}
+$$
+
+The prover starts off by committing to $g(X)$ using the commit function from the single point commitment scheme, we denote this by $D$ or $[g(X)]$.
+
+The prover's job is to now convince the verifier that $D$ is a commitment to a polynomial $g(X)$. We do this by indirectly evaluating $g(X)$ at some random point $t$. If $g(X)$ is a rational function, then it is not possible to commit to it as a polynomial, and consequently, it is not possible to prove that $g(t)= k$ using $D$.
+
+2. Let $t \leftarrow H(r,D)$
+
+$\text{Define } g_1(X):$
+
+$$r^i \frac{f_i(X)}{t-z_i}$$
+
+$\text{Define } g_2(X):$
+
+$$r^i \frac{y_i}{X-z_i}$$
+
+It is clear to see that $g(t) = g_1(t) - g_2(t)$.
+
+$g_2(t)$ can be computed by the verifier, while $g_1(t)$ cannot, because it involves random evaluations at the polynomials $f_i(X)$.
+
+> We note that the natural definition for $g_1(X)$ would be $r^i \frac{f_i(X)}{X-z_i}$, however, we specify it as $r^i \frac{f_i(X)}{t-z_i}$ because the latter is also able to prove an opening for $g_1(t)$ **and** the verifier is able to compute the commitment for it.
+
+- The prover will compute an opening proof for $g_2(X)$. Correctness of $g_2(X)$ implies correctness for $g(X)$ since $g_2(t) = g_1(t) - g(t)$.
+
+The prover forms an opening proof for $g_2(X)$ using a single point polynomial commitment scheme:
+
+- We call this $\pi$. This is computed using $\text{Prove}(g_2(X), t, g_2(t))$
+
+The proof consists of $(D, \pi)$
+
+## Verification - Final
+
+The Verifier ultimately wants to verify that $D$ is the commitment to the polynomial $g(x)$.
+
+The verifier computes the challenges $r$, $t$ and $g_2(t)$
+
+#### Computing $[g_1(X)]$
+
+Consider $g_1(X)$:
+
+$$
+g_1(X) = r^i \frac{f_i(X)}{t-z_i}
+$$
+
+$[g_1(X)]$ is therefore:
+
+$$
+[g_1(X)] = \frac{r_i}{t-z_i}C_i
+$$
+
+Noting that the verifier is able to compute this value themselves.
+
+### Verifying $g_2(t)$
+
+Since : $g_2(t) = g_1(t) - g(t)$
+
+The commitment to $g_2(X)$ with respects to $t$* is therefore:
+
+$[g_2(X)] = [g_1(X)] - D$
+
+> *We again note that $[g_1(X)]$ is only valid, if the point being evaluated is $t$ because $g_1(X)$ has already been partially evaluated at $t$.
+
+Since the verifier computed $[g_1(X)]$ , if $D$ is indeed a commitment to $g(X)$, then $[g_1(X)] -D$ is a commitment to $g_2(X)$.
+
+Now if $[g_2(X)]$ is a commitment to $g_2(X)$, then it will pass the following verification check $\text{Verify}([g_2(X)], t, g_2(t))$.
+
+## Summary
+
+- We describe the multipoint commitment scheme which we will use for verkle trees.
+- We did not describe the exact single point commitment scheme being used, however we note that at the time of writing this document, the bulletpoofs variant described in section A.1 of [BCMS20](https://eprint.iacr.org/2020/499.pdf) is what has been implemented.
diff --git a/ipa-multipoint/docs/3-vcs-divide-lagrange-basis b/ipa-multipoint/docs/3-vcs-divide-lagrange-basis
new file mode 100644
index 0000000..dfe2d4f
--- /dev/null
+++ b/ipa-multipoint/docs/3-vcs-divide-lagrange-basis
@@ -0,0 +1,235 @@
+# Dividing In Lagrange basis when one of the points is zero - Generalised
+
+## Reference
+
+The formulas were derived by reading the following academic article [here](https://people.maths.ox.ac.uk/trefethen/barycentric.pdf)
+
+## Problem
+
+In the multipoint protocol, we had a polynomial of the form:
+
+$$
+g(X) =  r^0 \frac{f_0(X) - y_0}{X-z_0} + r^1 \frac{f_1(X) - y_1}{X-z_1} + \ldots +r^{m-1} \frac{f_{m-1}(X) - y_{m-1}}{X-z_{m-1}}
+$$
+
+In our context, $z_i$ is an element in the domain, so naively we cannot compute this division in lagrange form. We also do not want to use monomial form, as we would need to interpolate our polynomials, which is exp  
+
+Simplifying the problem:
+
+We have $\frac{f(X)}{g(X)} = \frac{f(X)}{X - x_m} = \sum_{i=0}^{d-1} {f_i\frac{\mathcal{L_i(X)}}{X - x_m}}$
+
+In what follows, we re-derive all of the necessary formulas that will allows us to divide by a linear polynomial that vanishes on the domain in lagrange basis, where the domain can be arbitrary.
+
+## Lagrange polynomial
+
+We briefly restate the formula for a lagrange polynomial:
+
+$$
+\mathcal{L_i}(X) = \prod_{j \neq i, j = 0}\frac{X -x_j}{x_i - x_j} 
+$$
+
+> The i'th lagrange polynomial evaluated at $x_i$ is 1 and 0 everywhere else **on the domain**
+
+## First form of the barycentric interpolation formula
+
+We introduce the polynomial $A(X) = (X - x_0)(X - x_1)...(X-x_n)$.
+
+We also introduce the derivative of $A'(X) = \sum_{j=0}^{d-1}\prod_{i \neq j}(X - x_i)$ . 
+
+> You can derive this yourself by generalising the product rule: https://en.wikipedia.org/wiki/Product_rule#Product_of_more_than_two_factors
+
+
+In general this derivative does not have a succinct/sparse form. We do however have a succinct form if the domain is the roots of unity!
+
+Now note that $A'(x_j) = \prod_{i=0,i \neq j}(x_j - x_i)$
+
+> If we plug in $x_k$ into $A'(X)$ all the terms with $X - x_k$ will vanish, this is why the sum disappears into a single product.
+
+We can use $A$ and $A'$ to re-define our lagrange polynomial as :
+
+$$
+\mathcal{L_i}(X) = \frac{A(X)}{A'(x_i) (X - x_i)}
+$$
+
+>Looking at the original lagrange formula, $A'(x_i)$ is the denominator and $\frac{A(X)}{X - x_i}$ is the numerator.
+
+The first barycentric form for a polynomial $f(X)$ can now be defined as :
+
+
+$$
+f(X) = \sum_{i=0}^{d-1}{\frac{A(X)}{A'(x_i) (X - x_i)} f_i}
+$$
+
+#### Remarks
+
+- $A(X)$ is not dependent on the values of $f_i$ and so can be brought out of the summation.
+- $A'(X)$ is only dependent on the domain, so it can be precomputed, along with $A(X)$
+
+## Re-defining the quotient
+
+Note that our original problem was that the polynomial:
+
+$$\sum_{i=0}^{d-1} {f_i\frac{\mathcal{L_i(X)}}{X - x_m}}$$
+
+Had a $X - x_m$ term in the denominator. We will use the first barycentric form as a way to get rid of this.
+
+First we rewrite $\frac{\mathcal{L_i(X)}}{X - x_m}$ using the first form:
+
+$$
+\frac{\mathcal{L_i}(X)}{X - x_m} = \frac{A(X)}{A'(x_i) (X - x_i)(X-x_m)}
+$$
+
+We then note that:
+
+$$
+A(X) = \mathcal{L_m}(X) \cdot A'(x_m) \cdot (X - x_m)
+$$
+
+> I just re-arranged the formula for the first form to equal $A(X)$ for $\mathcal{L_m}(X)$
+
+We can hence plug this into our previous equation:
+
+$$
+\frac{\mathcal{L_i}(X)}{X - x_m} = \frac{\mathcal{L_m}(X) \cdot A'(x_m) \cdot (X - x_m)}{A'(x_i) (X - x_i)(X-x_m)}
+$$
+
+Simplifying since we have a $X - x_m$ in the numerator and denominator:
+
+$$
+\frac{\mathcal{L_i}(X)}{X - x_m} = \frac{A'(x_m) \cdot \mathcal{L_m}(X) }{A'(x_i)\cdot (X - x_i)}
+$$
+
+> Note that when the elements in the domain are roots of unity; $A'(x_k) = d(x^k)^{d-1} = dx^{-k}$ 
+>
+> The nice simplification here is due to two reasons: roots of unity form a cyclic group, and we can succinctly represent the d'th roots of unity in a sparse equation $X^d -1$ which is nice to derivate.
+
+We have now re-defined $q(X)$ to not include $X-x_m$ !
+
+We now summarise and state that:
+
+$$
+q(X) = \sum_{i=0}^{d-1} f_i \frac{\mathcal{L_i}(X)}{X - x_m} = f_i \frac{A'(x_m) \cdot \mathcal{L_m}(X) }{A'(x_i)\cdot (X - x_i)}
+$$
+
+## Explicit formulas for each case
+
+### Computing $q_m$
+
+When dealing with the point which vanishes on zero, the above formula becomes:
+
+> Note: $\mathcal{L_m}(x_m) = 1$
+
+$$
+q_m = q(x_m) = \sum_{i=0}^{d-1}\frac{A'(x_m)}{A'(x_i)} \frac{f_i}{x_m - x_i} 
+$$
+
+### Computing $q_j$
+
+For the case that the evaluation does not vanish on the domain, we can use the original formula.
+
+For all $j \neq m$
+
+$$
+q_j = q(x_j) = \sum_{i=0}^{d-1} f_i \frac{\mathcal{L_i}(x_j)}{x_j - x_m}
+$$
+
+We note that the terms of the sum are zero, except for when $i=j$ from the definition of the lagrange polynomial , hence we can simplify this to be:
+
+$$
+   q_j = \frac{f_j}{x_j - x_m}
+$$
+
+## Optimisations
+
+If we use the formulas as shown above, $q_m$ will take $d$ steps due to the sum, and $q_j$ will take $d-1$ steps. We describe a way to reduce this complexity in the code.
+
+
+### 1. Rewrite $q_m$ in terms of $q_j$
+
+Note that if we multiply $q_m$ by $\frac{-1}{-1}$ we get:
+
+$$
+q_m = q(x_m) = -\sum_{i=0}^{d-1}\frac{A'(x_m)}{A'(x_i)} \frac{f_i}{x_i - x_m} 
+$$
+
+We can now substite in $q_i$
+
+$$
+q_m = q(x_m) = -\sum_{i=0}^{d-1}\frac{A'(x_m)}{A'(x_i)} q_i 
+$$
+
+
+
+### 2. Removing field inversions in $q_j$
+
+Note that $q_j$ has a division which is many times more expensive than a field multiplication. We now show a way to precompute in such a way that we do not need to invert elements.
+
+> With the roots of unity, we were able to use the fact that they formed a group.
+
+Again note that:
+
+$$
+    q_j = \frac{f_j}{x_j - x_m}
+$$
+
+The expensive division occurs here $\frac{1}{x_j-x_m}$. In our particular case, we note that the domain is the discrete interval $[0,255]$ this means we need only to precompute $\frac{1}{x_i}$ for $x_i \in [-255, 255]$. This is 510 values, so we would store $510 * 32 = 16Kb$. If this is too much space, one could halve the storage by not storing the negated points.
+
+**How would I lookup and store these values in practice?**
+
+First we imagine that we have stored the values in an array as such:
+
+$[\frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \frac{1}{4}... \frac{1}{255},\frac{1}{-1},\frac{1}{-2},...\frac{1}{-255}]$
+
+We first note that we can easily get from $\frac{1}{k}$ to $\frac{1}{-k}$ in the array by jumping forward 255 indices. Our strategy will be to find $\frac{1}{k}$ then jump to $\frac{1}{-k}$ if we need to.
+
+**Example**
+
+We want to compute $\frac{1}{0 - 255}$.
+
+- Compute the $abs(0-255) = 255 = i$ 
+
+> In practice, we can use an if statement to check whether 255 or 0 is larger, and subtract accordingly.
+
+- Note that $\frac{1}{i}$ is at index $i-1$
+- Since our original computation was $0 - 255$ which is negative, we need to get the element at index: $(i - 1) + 255$ where $i=255$.
+
+### 3. Precompute $\frac{A'(x_m)}{A'(x_i)}$
+
+> With the roots of unity, we did not need this optimisation as $\frac{A'(x_m)}{A'(x_i)}$ equaled $\frac{\omega^i}{\omega^m}$ which was trivial to fetch from the domain due to the roots of unity forming a domain.
+
+For our case, we will need to store precomputed values, if we want to efficiently compute $q_m$ in $O(d)$ steps, and to also avoid inversions.
+
+The strategy is that, we precompute $A'(x_i)$ and $\frac{1}{A'(x_i)}$. Given that we have 256 points in the domain. This will cost us $256 * 2 * 32 \text{ bytes} = 16kB$.
+
+**How would I lookup and store these values in practice?**
+
+Similar to the previous optimisation, we store $A'(x_i)$ in an array as such:
+
+$[A'(0), A'(1), A'(2), A'(3)... A'(255),\frac{1}{A'(0)},\frac{1}{A'(1)},\frac{1}{A'(2)},...\frac{1}{A'(255)}]$
+
+
+**Example**
+
+We want to compute $\frac{A'(0)}{A'(5)}$
+
+- We can fetch $A'(0)$ by looking up the element at index $0$ in the array.
+- We can fetch $\frac{1}{A'(5)}$ by looking up the element at index 5, then jumping forward 256 positions.
+
+In general:
+
+- To fetch $A(x_i)$ we need to fetch the element at index $i$
+- To fetch $\frac{1}{A(x_i)}$ we need to fetch the element at index $i + 256$
+
+> Gotcha: You may produce an off by one error, by not realising that the second optimisation skips ahead 255 points for negative values, while the third optimisation skips ahead 256. This is because the second optimisation omits the value $\frac{1}{0}$.
+
+## Evaluate polynomial in evaluation form on a point outside of the domain
+
+Suppose $z$ is a point outside of the domain.
+
+$$
+f(z) = \sum_{i=0}^{d-1}f_i\mathcal{L_i}(z) = \sum_{i=0}^{d-1}{\frac{A(z)}{A'(x_i) (z - x_i)} f_i} = A(z)\sum_{i=0}^{d-1}\frac{f_i}{A'(x_i)(z-x_i)}
+$$
+
+**Optimising:**
+- We already store precomputations for $\frac{1}{A'(x_i)}$
+- We should compute $z-x_i$ separately, then batch invert using the montgomery trick, so that we only pay for one inversion.
\ No newline at end of file