Updated at: 2024-11-26
Let's break down this question step by step:
First of all, in today's internet based on TCP/IP, both Huluwa and the Snake must have their own IP to access the internet. But being on the network means that the messages from Huluwa or Snake may pass through many network nodes before reaching the other party. The intermediate network nodes could be internet service providers or IM service providers. See the illustration below:
It is obvious that the intermediary nodes would see the message, which is certainly not what Huluwa and Snake would expect. Therefore, they would consider encrypting the message before transmission. That leads us to the second question.
Encryption is the process of using a Secret to encrypt a message into ciphertext, which can then be sent through any node without any issues, as long as the nodes do not have access to the secret and thus cannot decrypt the ciphertext to retrieve the message. The process is shown in the illustration below:
And now, the third question arises.
That is, to encrypt messages before transmission, both parties need to agree on a Secret. One side will use this Secret to encrypt the message, then the ciphertext is sent over, and the other side, upon receiving the ciphertext, uses the Secret to decrypt and thus retrieve the message.
The issue is, how can both parties agree on this Secret?
Currently, Signal and WhatsApp use the Diffie–Hellman key exchange algorithm to let Huluwa and Snake negotiate this Secret through Signal and WhatsApp's servers. Note that DH algorithm uses asymmetric keys, which is not the main point here. The general process is illustrated below:
So, what's the problem with this? Long ago, I remember using mutt (a text-based email client for terminal), when people were into encryption, and if they wanted to securely send emails, they had to share each other's public keys in a secure way, such as offline, because any third-party method of sharing public or private keys was not reliable, including third-party public key servers. Back when Unix Hacker culture was prevalent, people even had to hold offline events to share public keys face to face.
Does Signal and WhatsApp have some new magic to solve this problem? After studying their papers and blogs, it appears they don’t. If this is unreliable, then the next steps in Signal and WhatsApp, such as X3DH, Double Ratchet Algorithm, forward secrecy, and other more complex concepts, all become unreliable as well. Because after going through the third-party role of Signal and WhatsApp servers, this third party can deceive both Huluwa and Snake. As shown in the illustration below:
That is to say, the Signal and WhatsApp servers can generate two pairs of public and private keys separately and interface with both Huluwa and Snake to generate a Secret with each. Huluwa may believe the Secret they generated belongs to Snake, and Snake may believe their Secret belongs to the Huluwa. Subsequently, when Huluwa and Snake send encrypted messages through Signal and WhatsApp servers, Signal and WhatsApp would be able to decrypt the transmitted messages.
So, how do Signal and WhatsApp solve this problem? The solutions are here:
This means you must confirm through a secondary channel outside of Signal and WhatsApp, such as offline, that the public key you're using is indeed the other party's public key. It ultimately comes back to this naive but essential question. As for Telegram, by default, it stores messages in plain text on the server, even if you activate encryption, it can't escape this essential issue.
Auditability means whether users can easily confirm that the software operates as claimed by its developers.
WhatsApp is not open source; Telegram’s client is open source, but its server is not; Signal claims that both the client and server are open source.
Apparently whether it is open source is not the main focus. So what should we pay attention to regarding auditability? We should focus on two points:
Regarding the first point, we can observe all the uploaded data of the software using network packet capturing tools (like mitmproxy or Wireshark).
Of course, being easy to observe also means being easy to crack, so from a certain perspective, it's understandable.
Thus, Zhi was born, aimed at addressing the issues mentioned above, with solutions as follows:
As the founder of Telegram said, the reason why Telegram does not enable end-to-end encryption by default is for ease of use, and ease of use can make it more popular. And what is inconvenient about end-to-end encryption? It is that it is impossible to synchronize multiple devices seamlessly, because the server only has encrypted data, and the key only exists on the local device. If you want to synchronize to other devices, you must first get the key to other devices. So how do WhatsApp and others do it? They use Apple's iCloud and Google Drive for synchronization, which is also a point the founder of Telegram criticized them. In short, everyone only needs to remember one thing, as long as the key is passed through a third party, then the credibility is questionable. Zhi uses pre-shared keys, and you need to put the key on your other devices yourself, usually by face-to-face scanning. Zhi will not read the address book to obtain contact relationships, which means that Zhi can only be more niche.
For more details on Zhi, please see other articles on the blog.