In this article I want to talk about one small and cool thing – a proof of not knowing. More precisely, we will generate SHA-256 hashes such that we can prove that their preimages are overwhelmingly likely not known.
Of course, we can just take a random sequence of bytes and this digest is already one without a known preimage. However, this doesn’t solve our task, because we also need to prove that the byte sequence is random and not specifically chosen.
The phrase “overwhelmingly likely” appears several times in this article because, strictly speaking, the probability of finding a SHA-256 preimage in reasonable time is non-zero, although it is negligibly small.
We can solve this as follows:
import hashlib, random
s = random.getrandbits(256)
s_hash = hashlib.sha256(s.to_bytes(32, "big")).digest()
s_hash_int = int.from_bytes(s_hash, "big")
h = s_hash_int ^ 1
print(f"h={h:064x}")
print(f"s={s}")
The program produces a value h such that, with overwhelming probability, there is no known number x with SHA256(x) = h. The value s can later be used to prove that the preimage is overwhelmingly likely not known to anyone.
Now, let’s dive into how it works. We take some random number s, which is just a seed for our future computations. If we want to generate a single hash, it can even be a constant: it doesn’t need to be truly random.
The value s is then hashed to get an initial byte sequence. After that we modify the initial SHA256(s) bytes in a way that doesn’t contain any hidden secrets. In this approach we just invert the last hash bit by XOR-ing the hash with 1. Alternatively, we could have used any other nothing-up-my-sleeve number, such as the first digits of pi, e, sqrt(2), etc., and/or other “nothing-up-my-sleeve” operation. For example, XOR-ing with a number such as
k = 34127829386043893497825798180264754903547262366162207969901803965141107382164
is rigged, because k = SHA256(0x2) ^ SHA256(0x3): if s = 0x2, we get
h = SHA256(s) ^ k = SHA256(0x2) ^ SHA256(0x2) ^ SHA256(0x3) = SHA256(0x3)
with the known preimage 0x3.
We cannot just take any arbitrary byte sequence and modify it, because it might be rigged. Using SHA256(s) here makes rigging as hard as finding a preimage for a hash: to arrange that h = SHA256(s) ^ 1 is equal to SHA256(x) for some chosen x, one would need to find an s such that SHA256(s) = SHA256(x) ^ 1, which is a preimage search.
To prove that the preimage is not known, the prover takes the number s, recomputes its hash, and modifies it the same way as during generation, so that the construction (and apparent randomness) of the byte sequence is verified.
I don’t know of concrete use cases for this thing, but it might be used to generate, for example, Bitcoin addresses that are provably without an owner and won’t ever have an owner.
Instead of SHA-256 we could use other cryptographic hash functions.