Compile and run MNIST with Lattigo#2614
Conversation
|
So the existing go pipeline won't compile mnist, because when it tries to generate primes to configure lattigo it hits (we use openfhe to generate primes) terminate called after throwing an instance of 'lbcrypto::OpenFHEException' So now I'm working on rolling the central loops. I will need to also add lattigo bootstrap ops and configure that properly in the pipeline. |
|
Quick update here: I am able to compile mnist to lattigo (fully unrolled) and it runs (about 1 minute per inference) but the outputs are wrong, even after rebasing over @asraa's change that disables the known buggy in-place transform. So now I'm comparing the compiled code to openfhe as well as the data inputs, since the go code has a manually-written loader for the mnist weights and inputs |
|
Confirmed that the input vectors and weights loaded from the test data are the same (this was not entirely clear because I had to roll my own data loader for loading torch weights and inputs from go). |
|
More debugging notes: Overriding the default lattigo config (from HEIR) with And encrypting the input with scale Does actually cause the resulting program to produce a different inference. So the working theory is that our configuration for lattigo is insufficient, not that the circuit is wrong. |
|
modifying the openfhe reference and lattigo implementation to output the raw logits as well: OpenfheLattigoObviously, not only are the logits wrong, but they're the same for all three samples. |
There was a problem hiding this comment.
are the image pixels one for one compared with openfhe and actually different per i? edit: oh okay, yes, you said weights AND inputs.
ok - then i would think there totally might be something wrong with in-place, but it works w/o in place too?!?!?! what!
|
So we checked that compiling a matvec to lattigo works, next I will truncate the mnist example down to its first layer and compare that with openfhe. I will also try running a ReLU in isolation and compare between lattigo and openfhe. |
|
Marking this as pull_ready so I can get it into google3 and do some more testing. |
|
Some notes:
While openfhe path chooses In particular, lattigo chooses a smaller logN, 7 vs 15 standard primes (corresponding to the initial choice of 14 levels in openfhe), one fewer special prime, and the extended encryption technique.
|
|
Another thing is that I will often see this warning in the lattigo path: warning: Range Analysis indicate that the first modulus must be larger than the scaling modulus by at least 127 bits. But not in the openfhe path. |
|
Making progress here: even just the first matvec differs in output between openfhe and lattigo. |
|
With help @ZenithalHourlyRate the bug has been identified: the input vectors is not cyclically rotated properly. The pass pipeline requests 1024 slots, but the parameter selection requires 4096 (at least). In OpenFHE we have special codegen that does additional cyclic repetition to ensure the slots are filled and rotations are semantically correct. In Lattigo we removed that code a while back, and no test was depending on it even though it breaks Halevi-Shoup matvec. I'm going to put the fix into a different PR, and try to fix it in a way that applies to all backends, moving this logic out of the codegen step and into the proper pipeline |
Overall we're finding the performance of OpenFHE bootstrap to be unmanageable. So we're looking to do the HEIR paper benchmarks entirely against lattigo, which means we need to compile models and then load the torch stuff in go directly.