Polycirc: Differentiable arithmetic circuits for ZKML

I’ve just finished the first release of polycirc: a library for building differentiable arithmetic circuits for Zero-Knowledge ML. The basic idea is that you can build an arithmetic circuit c, then polycirc will transform it into another circuit d computing a single step of gradient descent. You can then use d with whatever zero-knowledge backend you like!

This post is a quick rundown of what polycirc can do; I’ll include pointers to more in-depth docs in each section.

Arithmetic Circuits

Arithmetic circuits are built from basic operations like addition, multiplication, and constants. Here’s a simple example, representing the expression x₀ * (-x₁):

    x₀ ───────────┐   ┌───┐
                  └───┤   │
                      │ * ├──── x₀ * (-x₁)
          ┌───┐   ┌───┤   │
    x₁ ───┤ - ├───┘   └───┘
          └───┘

Think of each wire as carrying a value in some semiring (e.g., \(\mathbb{N}\)). Circuits are “differentiable”, which means that polycirc can transform the circuit above into an “optic”, which is a circuit of this general shape:

Note that some wires are “bent backwards”: we use this to model the backwards information flow of the gradients of the forward circuit. However, this means that optics aren’t functions - polycirc’s learner module lets us turn these optics into a function we can use for gradient descent.

Building a circuit

The main thing that polycirc does is represent circuits. The docs describe how to build circuits, but in short, we can build our original example like this:

from polycirc import ir
circuit = (ir.identity(1) @ ir.negate(1)) >> ir.mul(1)
# Builds this circuit:
#
#    x₀ ───────────┐   ┌───┐
#                  └───┤   │
#                      │ * ├──── x₀ * (-x₁)
#          ┌───┐   ┌───┤   │
#    x₁ ───┤ - ├───┘   └───┘
#          └───┘

Circuits are built by combining smaller circuits with sequential >> and parallel @ composition. Sequential composition f >> g means “plug the outputs of f into the inputs of g”:

   ┌───┐   ┌───┐
───┤ f ├───┤ g ├───
   └───┘   └───┘

While parallel composition f @ g runs f and g in parallel:

   ┌───┐
───┤ f ├──
   └───┘
   ┌───┐
───┤ g ├───
   └───┘

Now we’ve built our circuit, we can compile it to python source code and see that it computes what we expect:

from polycirc.ast import diagram_to_ast
print(diagram_to_ast(circuit, 'multiply_negate'))

Which prints this python code:

def multiply_negate(x0, x1):
    x2 = - (x1)
    x3 = x0 * x2
    return [x3]

Zero-Knowledge ML

Now that we can build and compile circuits, we’d also like to differentiate them to train machine learning models. For an end-to-end example of training a linear model, see this example; I’ll sketch the details here.

Let’s start by defining our model circuit, then turn it into an “optic” using rdiff¹:

# mat_mul is like a single (linear) layer neural network
model_circuit = ir.mat_mul(OUTPUT_SIZE, INPUT_SIZE)
r_model = rdiff(model_circuit)

Remember that r_model is an optic whose backwards information flow is the gradients of the model, so we need to do two things:

Use the gradients to update our model parameters
Get a function which does a single step of gradient descent

The make_learner function does both of these steps for us:

u = fixed_gd(ONE >> 7)(NPARAM) # learning rate of 1/2^7
d = cliploss(OUTPUT_SIZE)  # mean squared error loss
step_circuit = learner.make_learner(f, u, d, NPARAM, INPUT_SIZE)

Here, the u and d maps are the “update” and “displacement” maps, which serve the same purpose as the “optimiser” and “loss” function. We can then compile step_circuit and use it for gradient descent:

step = diagram_to_function(step_circuit)
for i in range(0, NUM_ITER):
    # parameter update step
    p[i+1] = step(*p[i], *x[i], *y[i])[OUTPUT_SIZE:OUTPUT_SIZE+NPARAM]

A quick final note: step actually returns more than just new parameters, which is why we have to take a slice of its outputs. Its type is actually step : p + x + y → y + p + x: the first y outputs are actually the model’s prediction using the old parameters.

Summary

Hopefully that gave a quick sense of what polycirc is for. If you’re interested in using it, feel free to drop by the discord or ping me on twitter!

See these papers for the theoretical background.↩︎

—