-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Use Melissa E. O'Neill's pcg for rng and Lemire's unbiased bounded rand #2385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Build size and comparison to main:
|
b801061 to
09e2f75
Compare
|
https://www.youtube.com/watch?v=ia8Q51ouA_s&t=78 Seriously though, this looks great :) really good spot that the mersenne twister is way too big for this application (and that's probably the cause of a crash I've been seeing recently) I'm not too experienced with random number generation techniques so I need to read over them properly to review, but at a quick glance this looks very promising |
|
Yeah I'm personally not a fan of using MT ever given that there are prng functions like this floating around which are plenty good enough for purposes like these and tend to be faster as well. I'm suprised the size diff isn't larger. @mark9064 the alternative rng I would use is romu, but that is slightly larger and from memory it technically has a potential bad 0 state*. I've liked using pcg ever since I learned about it years ago, it's comparably fast to MT but doesn't have the huge state cost and setup time, so it's nice to be able to randomly create one on the fly for rng purposes. There's also a potentially similar issue with the Two game, |
3f0f369 to
bc8f107
Compare
|
Currently rewriting, after some thought I realized that I could create a "controller" type object inside Since the cost of a reference is equivalent to a pointer and the overall prng size is not much larger, even if the global prng is not well seeded we can generate a better seeded prng object for only the cost of an additional 64 bits per screen / application. The benefit being no single app needs to write their own rng object, and the initialization cost is extremely low. |
|
The nRF52 has RNG hardware, why not use that? |
|
@Avamander that's the bluetooth device right? I did a quick search and spotted that there's an RNG interrupt, so I assume there's some form of secure rng available. I would use it, if I knew how. But, if that wouldn't work for whatever reason, so long as there's some noisy data available we should be able to pack the rng with more than enough state that no one would ever notice. |
|
OK I've looked into it a bit. Firstly for cryptographically secure RNG we have Regarding general PRNG: However I'm not against including PCG, but I think if we choose to implement it we should implement the proper interface https://en.cppreference.com/w/cpp/named_req/RandomNumberEngine.html Regarding having a central RNG object that's shared, I think that could make sense. This would mean that apps avoid having to seed their own randomness and potentially minor memory savings if we ever reach a point where we have many RNGs. |
|
@mark9064 Melissa has an implementation that follows the standard pattern. I personally just don't see the need to bring in all the template mess since prng functions are fundamentally state machines where part of their state is hidden and the rest transformed. However, that said, I'm pretty sure that the c++ library for pcg has smaller versions which would probably work just fine. That might be worthwhile since I don't imagine that the hardware is 64 bit, so that might be a plus. Well actually this will be an easy switch, since the constants for each size are in the c++ header here: So I'll take a minute to make that edit. |
|
@mark9064 I believe looking at It looks like in a way we could create a similar approach to On second look, I absolutely would not bother using Update: struggling to get access to |
The relative slowness does not matter in these scenarios, especially for a dice app and it avoids the significant memory and storage cost of shipping an entire PRNG. |
|
@Avamander the cost of the pcg should only be 128 bits of memory to hold the state at most (depends on the size choice of the implementation) for any instance being used (or double that in my rough conceptual model where there were two at work, one for the screen, one to generate new seeds) and only a few bytes for the instructions for the two functions. |
|
@mark9064 I'm not sure why, but I thought when you brought up LCGs that |
|
Poking around the bluetooth specification I tried to find out what |
2c13cd8 to
7b4b528
Compare
|
Thoroughly confused myself with undefined reference problems. I'm leaving as is, though feel free to change the state size, it should adjust as desired, though I'd recommend sticking with the 128 bit form. |
7b4b528 to
efe9569
Compare
|
OK so there's a few problems here
1: We should have a randomness controller that hands out a well seeded RNG. This should use the hardware RNG for seeding, no other RNG source is needed. I'd guess that |
efe9569 to
7dab95f
Compare
|
@mark9064 well I think just conceptually since a PCG is taking an LCG and adding only a few instructions to massively improve the "randomness". I suppose you could take any LCG from anywhere, including the standard library and then perform the xor shift with random rotation step and get a fair improvement. So far what I've done*, provided this thing checks out and provides an artifact to check, is use the bluetooth "secure" rng, which presumably is some form of hardware noise to initialize the seed...unfortunately I couldn't get this to play nice and let me do that from within or around the I'd agree on 1. Unfortunately all the neat tricks I've learned to get some randomness don't really work on hardware like this, address randomization isn't great, using time as a seed would otherwise be ok*, but I've noticed the watch by default boots without restoring the last known time...which means it resets to a fixed point (not great for rng purposes).
I think we could have a fix for a decent seed strictly* based on time, if we made another PR which specifically saves and loads the last known time (I honestly would like this because I've had my watch die a couple times while I wasn't paying too much attention with the higher battery usage and all). Then we could use time at least as part of the seed. Another consideration for seeding (also worth a PR?) is lifetime statistics, EG: you and I probably haven't waved the watch around as much nor had it record the same number of heart beats etc...etc...basically turn the watch into an old arcade machine where the "RNG" is actually deterministic. The issue with the zero seed thing is more a critique on LCGs, you sort've need the addition term to escape the 0 state...and then with some poor choice of multipliers you can get really bad not so random looking behaviors. I don't know why Also...have not tested on real hardware I've no idea if the bluetooth rng thing works*, I'll test when the thing appears to have compiled here and I can grab the artifact. |
|
Put another way: without a convincing demo of why LCGs (in the form of |
Mersenne Twister's state size is huge (kb's of state), instead we'll swap with a pcg.
The pcg's state is n bits and has a equidistributed 2^(n/2) period (plenty for a quick rng).
To replicate std::distribution we use lemire's unbiased bounded random method.
Adjust the state size as desired between 32 and 128 bits.
7dab95f to
e42049d
Compare
|
Unfortunately I'd say I'm a bit biased, because the best explanation I've seen of why LCGs have an issue is from Melissa's work. Here's a point from her stanford video that visualizes some prngs which I think helps https://youtu.be/45Oet5qjlms?t=1313. The other option I initially suggested also has a paper with this sort of diagram as an example: https://arxiv.org/pdf/2002.11331 The I guess answer is that LCGs have this sort of "stripy" repeating pattern, and maybe at a human level reading the output it wouldn't be so obvious, but it does have statistical impact. So maybe for the pong game and twos is doesn't matter because there's human input involved and the games should expand in state size, but if you wanted something that was a good dice roller, the pcg's definitely a plus, at which point if the pcg is included, then the function's available to use for any application...and you could use it everywhere a prng is needed. I guess the way I 'd argue it is, if you want a statistically good cheap prng, these are simple enough in implementation I wouldn't worry about including them however you want. |
|
From what I understand the performance is in the same ballpark as LCG. It's the maintenance burden of carrying a submodule that's best avoided. More code, more problems :) |
|
@mark9064 I'm not sure what you mean by submodule, you don't have to pull in a git submodule for this, Mellissa's pcg is literally this: // *Really* minimal PCG32 code / (c) 2014 M.E. O'Neill / pcg-random.org
// Licensed under Apache License 2.0 (NO WARRANTY, etc. see website)
typedef struct { uint64_t state; uint64_t inc; } pcg32_random_t;
uint32_t pcg32_random_r(pcg32_random_t* rng)
{
uint64_t oldstate = rng->state;
// Advance internal state
rng->state = oldstate * 6364136223846793005ULL + (rng->inc|1);
// Calculate output function (XSH RR), uses old state for max ILP
uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
uint32_t rot = oldstate >> 59u;
return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
}which is dead simple, forgive my complicated mess trying to make sense of how to generalize what was already in her c++ headers. And then romu is any one of these: // Romu Pseudorandom Number Generators
//
// Copyright 2020 Mark A. Overton
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// ------------------------------------------------------------------------------------------------
//
// Website: romu-random.org
// Paper: http://arxiv.org/abs/2002.11331
//
// Copy and paste the generator you want from those below.
// To compile, you will need to #include <stdint.h> and use the ROTL definition below.
#define ROTL(d,lrot) ((d<<(lrot)) | (d>>(8*sizeof(d)-(lrot))))
//===== RomuQuad ==================================================================================
//
// More robust than anyone could need, but uses more registers than RomuTrio.
// Est. capacity >= 2^90 bytes. Register pressure = 8 (high). State size = 256 bits.
uint64_t wState, xState, yState, zState; // set to nonzero seed
uint64_t romuQuad_random () {
uint64_t wp = wState, xp = xState, yp = yState, zp = zState;
wState = 15241094284759029579u * zp; // a-mult
xState = zp + ROTL(wp,52); // b-rotl, c-add
yState = yp - xp; // d-sub
zState = yp + wp; // e-add
zState = ROTL(zState,19); // f-rotl
return xp;
}
//===== RomuTrio ==================================================================================
//
// Great for general purpose work, including huge jobs.
// Est. capacity = 2^75 bytes. Register pressure = 6. State size = 192 bits.
uint64_t xState, yState, zState; // set to nonzero seed
uint64_t romuTrio_random () {
uint64_t xp = xState, yp = yState, zp = zState;
xState = 15241094284759029579u * zp;
yState = yp - xp; yState = ROTL(yState,12);
zState = zp - yp; zState = ROTL(zState,44);
return xp;
}
//===== RomuDuo ==================================================================================
//
// Might be faster than RomuTrio due to using fewer registers, but might struggle with massive jobs.
// Est. capacity = 2^61 bytes. Register pressure = 5. State size = 128 bits.
uint64_t xState, yState; // set to nonzero seed
uint64_t romuDuo_random () {
uint64_t xp = xState;
xState = 15241094284759029579u * yState;
yState = ROTL(yState,36) + ROTL(yState,15) - xp;
return xp;
}
//===== RomuDuoJr ================================================================================
//
// The fastest generator using 64-bit arith., but not suited for huge jobs.
// Est. capacity = 2^51 bytes. Register pressure = 4. State size = 128 bits.
uint64_t xState, yState; // set to nonzero seed
uint64_t romuDuoJr_random () {
uint64_t xp = xState;
xState = 15241094284759029579u * yState;
yState = yState - xp; yState = ROTL(yState,27);
return xp;
}
//===== RomuQuad32 ================================================================================
//
// 32-bit arithmetic: Good for general purpose use.
// Est. capacity >= 2^62 bytes. Register pressure = 7. State size = 128 bits.
uint32_t wState, xState, yState, zState; // set to nonzero seed
uint32_t romuQuad32 () {
uint32_t wp = wState, xp = xState, yp = yState, zp = zState;
wState = 3323815723u * zp; // a-mult
xState = zp + ROTL(wp,26); // b-rotl, c-add
yState = yp - xp; // d-sub
zState = yp + wp; // e-add
zState = ROTL(zState,9); // f-rotl
return xp;
}
//===== RomuTrio32 ===============================================================================
//
// 32-bit arithmetic: Good for general purpose use, except for huge jobs.
// Est. capacity >= 2^53 bytes. Register pressure = 5. State size = 96 bits.
uint32_t xState, yState, zState; // set to nonzero seed
uint32_t romuTrio32_random () {
uint32_t xp = xState, yp = yState, zp = zState;
xState = 3323815723u * zp;
yState = yp - xp; yState = ROTL(yState,6);
zState = zp - yp; zState = ROTL(zState,22);
return xp;
}
//===== RomuMono32 ===============================================================================
//
// 32-bit arithmetic: Suitable only up to 2^26 output-values. Outputs 16-bit numbers.
// Fixed period of (2^32)-47. Must be seeded using the romuMono32_init function.
// Capacity = 2^27 bytes. Register pressure = 2. State size = 32 bits.
uint32_t state;
void romuMono32_init (uint32_t seed) {
state = (seed & 0x1fffffffu) + 1156979152u; // Accepts 29 seed-bits.
}
uint16_t romuMono32_random () {
uint16_t result = state >> 16;
state *= 3611795771u; state = ROTL(state,12);
return result;
}They're all incredibly small |
|
As explained before, I think that maintaining the C++ random interface is important. The most sensible way to do that would be to use the upstream implementation as the full C++ interface has already implemented there. |
|
I can make another edit using the git repo, I just want to make note I don't think it'll save quite as much as the ~800 bytes in program size. |
Mersenne Twister's state size is huge (kb's of state), instead we'll swap with a pcg. The pcg's state is 128 bits and has a equidistributed 2^64 period (plenty for a quick rng). To replicate std::distribution we use lemire's unbiased bounded random method.
Additionally we seed using the address of the rng struct, in theory this is randomly allocated somewhere, but since this is a tiny device with not a lot of of address space, this likely only flips the lower bits. It still uses the motion accelerometers 3-axis data as before, but uses the three uint32_t results to create one uint64_t. The rng shuffles itself a bit further by replacing its state using the initial seeding.