Shufflecake: plausible deniability for multiple hidden filesystems on Linux
2023-11-28 NEW RELEASE! Shufflecake v0.4.2 is out. This includes various bugfixes. An outstanding one is the resolution of persistent slice allocation ambiguity after a volume corruption, in order to help external recovery tools (e.g. RAID).
2023-10-10 And, as promised, finally the full version of the Shufflecake research paper is online, with all the nitty gritty details of our beloved plausible deniability solution! Available both at IACR's Eprint and arXiv.
2023-10-06 Finally we can publicly announce the BIG NEWS! Our Shufflecake research paper has been accepted at ACM CCS 2023! We are super excited to present Shufflecake at one of the most prestigious academic cybersecurity conferences worldwide! An open access full version will be available soon. See you in Copenhagen!
Shufflecake is a tool for Linux that allows to create multiple hidden volumes on a storage device in such a way that it is very difficult, even under forensic inspection, to prove the existence of such volumes. This is useful for people whose freedom of expression is threatened by repressive authorities or dangerous criminal organizations, in particular: whistleblowers, investigative journalists, and activists for human rights in oppressive regimes. You can consider Shufflecake a "spiritual successor" of tools such as TrueCrypt and VeraCrypt, but vastly improved: it works natively on Linux, it supports any filesystem of choice, and can manage multiple nested volumes per device, so to make deniability of the existence of these partitions really plausible.
In Shufflecake, each hidden volume is encrypted with a different secret key, scrambled across the empty space of an underlying existing storage medium, and indistinguishable from random noise when not decrypted. Even if the presence of the Shufflecake software itself cannot be hidden - and hence the presence of secret volumes is suspected - the number of volumes is also hidden. This allows a user to create a hierarchy of plausible deniability, where "most hidden" secret volumes are buried under "less hidden" decoy volumes, whose passwords can be surrendered under pressure. In other words, a user can plausibly "lie" to a coercive adversary about the existence of hidden data, by providing a password that unlocks "decoy" data. Every volume can be managed independently as a virtual block device, i.e. partitioned, formatted with any filesystem of choice, and mounted and dismounted like a normal disc. The whole system is very fast, with only a minor slowdown in I/O throughput compared to a bare LUKS-encrypted disk, and with negligible waste of memory and disc space.
Shufflecake is FLOSS (Free/Libre, Open Source Software). Source code is available in the install section and released under the GNU General Public License v2.0 or superior.
Shufflecake is still experimental software, please do not rely on its security for anything important!
A user must first
init a device, for example, a physical disc, or a partition therein, or a virtual block device such as a file-backed loop device. This will first overwrite the disc with random data and then create an encrypted header section at the beginning of the device. The header contains metadata and allocation tables for 15 Shufflecake volumes. The user is asked to provide N different passwords (where N is between 1 and 15). Then, the first N sections of the header will be encrypted with each of the N passwords, while the others will be left random. The order of the given passwords is important, because it establishes a hierarchy from "less hidden" to "more hidden" volumes. Notice that it is impossible to know how many volumes there are without decrypting.
Then the user can
open the volumes inside a given Shufflecake-initialised device. This is done by providing only one of the N given passwords, which unlocks one of the 15 slots in the header, and hence a device area allocated for the corresponding volume. Furthermore, the unlocked slot contains a key that allows to decrypt the previous (i.e. "less hidden") slot in the hierarchy, thereby allowing to automatically open all the less sensitive volumes recursively. All these volumes appear as virtual block devices under /dev/mapper and can be mounted, formatted, and used to store data.
Finally, a user can
close a device and all the supported volumes therein with a single command.
Shufflecake is made of two components: dm-sflc, which is a kernel module implementing the Shufflecake scheme as a device-mapper target for the Linux kernel, and shufflecake-userland, which is a command-line tool allowing the user to create and manage hidden volumes. The kernel module must be loaded before using the userland tool.
Source code and installation instructions can be found on the project repository.
For now the code has been tested only on Debian/Ubuntu and similar derivatives.
The Shufflecake research paper has been peer-reviewed and published at the ACM Conference on Computer and Communications Security (CCS) 2023. An open access full version with extra details and continuous updates is available on IACR's Eprint and ArXiv.
Shufflecake is originally based on the M.Sc. Thesis "Hidden Filesystems Design and Improvement" from EPFL, Switzerland.
For an overview of different plausibly deniable storage approaches check the paper "SoK: Plausibly Deniable Storage".
Usage documentation can be found on the project repository.
How does Shufflecake work?
In a nutshell, Shufflecake allocates space for each volume as encrypted slices at random positions of the underlying device. Slices are allocated dynamically, as soon as the kernel module decides that more space than the currently used quota is required, and are interleaved to make forensic analysis more difficult. Data about the position of used and unused slices is stored in a volume-specific "position map", which is indexed within an encrypted header at the beginning of the device. Both position map and header are indistinguishable from random data without the correct decryption key, and every slot in the header (currently up to 15 volumes) has a field containing the decryption key for the previous (i.e., "less hidden") header and volume, thereby recursively linking all volumes and allowing the user to open all of them with a single password. This also makes overcommitment possible, i.e., if you have a 1 GiB device and you create three Shufflecake volumes on it, by default you will see each of these three volumes being 1 GiB in size (although you will start receiving I/O errors if you try to write more than 1 GiB total across all three), which is also crucial for plausible deniability, because an adversary can never tell for sure how many other volumes are there. Notice, in fact, that if some volumes are left unopened they are not considered for the total space allocation.
This sounds wasteful, how much space is occupied by headers, position maps, etc?
Actually very little: for a 1 TiB device, less than 0.5% space is occupied by these encrypted metadata, and this is accounting for the worst possible and unoptimized scenario of disc size, partitions, etc. Options are being evaluated to sacrifice further space in exchange for extra useful features.
Is this fast?
Quite fast: I/O is roughly 30% slower than a "normal" dm-crypt/LUKS encrypted volume, which is barely noticeable for daily desktop use. Memory footprint is minimal, and even slice fragmentation is negligible.
Why 15 volumes maximum?
It's just a convenient implementation limit that we think should be enough for any user. If you need more than 15 nested volumes, you are probably getting your threat model wrong. However, we have plans of implementing a new feature that would allow a virtually unlimited maximum number of volumes per device. This is WIP.
If I do not open all volumes but only some of them, what happens to the other unopened volumes?
Likely they will get corrupted badly. A certain possibility of data corruption is necessary for plausible deniability, because the adversary must observe a consistent random on-demand slice allocation even if not all volumes are opened. So the recommended behavior for the user is to unlock all volumes on a device for daily use, even if without using/mounting them, and only unlock a subset of them when under coercion. However, there are methods to increase the corruption resistance of unopened volumes to a certain degree. We are researching methods with academic friends to do so efficiently as a built-in feature in Shufflecake. In the meantime, if you really want, there are ways to add a certain degree of corruption resistance manually in exchange for some waste of space, by partitioning a hidden volume into many different ones and using RAID. See the documentation for details.
What happens if my machine crashes while I'm writing data on a Shufflecake volume?
There is a good chance of volume corruption and data loss. As it is now, Shufflecake does not offer crash consistency, but we have very clear plans on how to implement it in the future, they are also mentioned in the documentation and are WIP.
What filesystems are supported?
Anything you want. Shufflecake is filesystem-agnostic, so users can format the volume as they wish. Certain filesystems, for example ext4, exhibit better granularity features that improve performance a bit more than others.
How about Full Disk Encryption (FDE)? Can you boot Linux from a Shufflecake-encrypted partition?
In theory yes, and we are working on that. It requires a bit of work, because you need to load Shufflecake at bootloader time, but yes, it's possible. Consider it WIP.
Can I use Shufflecake on any platform? Mobile? Embedded? VM?
In theory yes, but Shufflecake is designed and tested mainly on laptop/desktop platforms, so YMMV.
Is Shufflecake similar to TrueCrypt / VeraCrypt?
Similar, yes, but vastly improved. Shufflecake is indeed inspired by TrueCrypt and similar solutions, but with the precise goal of overcoming many technical limitations that make the adoption of such software unrealistic nowadays. Most importantly: Shufflecake works natively on Linux, supports arbitrary filesystems, and can manage different nested volumes per device, so to make deniability of the existence of these partitions really plausible.
Is this steganography?
Not really. Steganography is like "there is no encrypted information at all here", while plausible deniability is "there is encrypted information, but I forgot the password", or "here is my decryption key, I swear I do not have any other one".
Why not just encrypting my disc with LUKS / BitLocker / etc?
None of these systems provide plausible deniability. In a nutshell: XKCD 538.
Maybe you don't know that LUKS already offers plausible deniability?
Depends how you define "plausible", but nope. With LUKS you can do some... workaround. You can fill the disc with random data, make a bootable USB drive with your bootloader on it, make a LUKS header only file on that USB drive, and then create an encrypted filesystem on the disc using that detached header file. You'll want to backup that header file, and possibly hide it with another encrypted volume using a headerless encryption on the USB drive. It's guaranteed to work as long as both the USB drive and the disc stay inside the pentacle you just painted on the floor with black chicken blood. Kidding apart, we believe that this technique is just a placebo, something that can defend you against the lazy security guard who asks you to turn on your laptop at the airport (and there is better and simpler solutions for that), but will not have a chance to stand in court or under interrogation, since you are now basing your security on the assumptions that 1) the adversary will not find the physical USB dongle that you are suspiciously hiding in your underwear, 2) the adverary will not ask questions about a potentially large and unused partition in your laptop filled with random data, and 3) if suspicious, the adversary will only ask you one password and will be happy with it. This is a "poor man's plausible deniability", actually more similar to steganography than plausible deniability proper.
Who needs this? What is the use case?
People who live in constant danger of being interrogated with coercive methods to reveal sensitive information. Think of: an undercover journalist who is investigating a ring of organized crime or a corruption scandal in some low-democracy country and needs to maintain the safety not only of themselves but also of their informants, a human right activist in a repressive regime with information on other members of a prosecuted minority or about the organizion of an upcoming protest, or a whistleblower who is about to become the next Edward Snowden provided they're not caught first and processed in some secret military trial.
Can criminals use Shufflecake?
Aren't you concerned that criminals can use Shufflecake?
Yes, of course we are concerned about criminals in general. If there were a magic switch that would allow us to make Shufflecake only be used by people without nefarious purpose we would gladly press it. Sadly, this switch does not exist. And, overall, we believe that the current status of humanity is such that we need more rather than less protection against invasive surveillance and coercive interrogations as a whole.
Won't you raise a red flag just for having Shufflecake installed?
Likely yes. But this is true for every plausible deniability solution. It is important to keep in mind that there is a difference between plausible deniability and deniable encryption, which is more akin to steganography. The two concepts are orthogonal, we are covering one specific threat scenario. Shufflecake is not a panacea.
With Shufflecake there is no way of stopping an adversary from torturing you to death, because they won't know when to stop. Isn't this even more dangerous than TrueCrypt?
It depends on your specific threat scenario. It is not up to us to say which is the best choice for each user's specific needs, but please, please, remember that people might find themselves in situations you are not considering. As a limited example, we want to briefly cite two of these scenarios: 1) indictments for questionable reasons in countries where the burden of proof is on the prosecutor, and 2) interrogations about secrets that you value more than your own life. They might not always get the headlines, but remember that there are still heroes out there.
How secure, really, is Shufflecake?
We believe that the Shufflecake scheme itself offers a great balance of usability VS security and we think it can be developed into a very robust solution. A cryptographic security proof is also available. However, the Shufflecake implementation is currently a bit more than a prototype. There is still lot of work to do, features planned but missing, probably bugs. We would need an independent security audit at some point, but not before a good cleanup of the code and a stable milestone. If you would like to help us to make it happen feel free to contribute.
What if I am monitored by a trojan / keylogger?
Then it's game over. Shufflecake only aims at protecting against a very specific threat scenario, and does not replace sound security practice.
How about TRIM? Can Shufflecake protect against forensic inspection of used disc sector traces?
Currently, not. The so-called "multi-snapshot adversary" is a very strong security model that takes into account the fact that, especially on modern devices such as SSDs, overwriting a logical sector often results in the underlying physical sector being simply marked as "unused" rather than being really overwritten, thereby leaving "traces" or "snapshots" of the data content at previous points in time. This in turn can (in theory) allow to break plausible deniability because empty, unused space should not change over time. Multi-snapshot attacks are a well-known issue in plausible deniability systems (TrueCrypt and derivatives are also very vulnerable in this sense), there are techniques for mitigation but they come with drawbacks. That said, consider the following: 1) multi-snapshot attacks are very complex and expensive. There is so much circuitry and complexity involved that 100% evidence of the presence of a hidden volume based only on past sector traces is unlikely to be reached, and an accusation in this sense will probably not stand in most courts. In fact, we are not aware of a single case in public literature of a conviction due to forensic detection of hidden data due to multi-snapshot attacks. On the contrary, there is many documented cases where even a simple system such as TrueCrypt was enough to grant acquittal of a suspect. 2) Thanks to its hierarchical design, Shufflecake scrambles volumes in such a random way that an analysis of unused sectors in this sense is likely to be even more complex than in TrueCrypt. 3) Regardless, the ability of Shufflecake to manage indipendently different volumes belonging to the same hierarchy offers us ways to protect against multi-snapshot attacks by simulating the action of a virtual user on empty space. More concretely, we have plans to add another component to Shufflecake in the future: a daemon that simulates user queries on the empty space of the topmost unlocked volume, regardless of whether there is further volumes hidden thereby or not. We believe that this strategy can thwart multi-snapshot attacks effectively at a marginal performance cost. See the documentation for more details.
Why is Shufflecake not based on more secure techniques such as ORAMs?
ORAMs (Oblivious Random Access Machines) are cryptographic schemes that aim at obfuscating the access patterns (in addition to the data content itself) of a trusted agent accessing an untrusted storage. The connection between ORAMs and plausibly deniable storage systems has been discovered and investigated since the breakthrough HiVE paper of 2014. In a nutshell, the idea is that if we use an ORAM to access a device, then nobody, not even a run-time backdoor in the device firmware, can know which volume we access and how. However, ORAMs are extremely slow. They are so slow, in fact, that precise theoretical bounds are known, telling us that no secure ORAM can be faster than extremely slow. The HiVE paper circumvented this problem with the following observation: If we are not worried by run-time backdoors in the device firmware, but are only concerned about "traditional" multi-snapshot adversaries, i.e. post-arrest investigation of the device physical layer, then we do not need a fully-fledged ORAM, because read operations do not change the state of the device. So all we need is a "write-only" ORAM (WORAM) that only obfuscates write requests. The advantage is that there is no currently known (yet) efficiency bounds for WORAMs, and in fact existing WORAM constructions seem to be slightly better than fully-fledged ORAMs. When initially designing Shufflecake, we also considered WORAMs, but eventually we opted against this solution for the following reasons: 1) even the most performant WORAM schemes known are still very slow or wasteful. For example, HiVE has a slowdown of roughly 200x I/O throughput, while some recent constructions reach a slowdown of "only" 5x but at the cost of wasting 75% of the disc space. We wanted Shufflecake to be practical. 2) WORAMs are themselves not bulletproof. In fact, we believe that the idea that read requests do not change the underlying state of the physical device is a somewhat strong assumption, and hard to justify with modern, complex SSDs that might, for example, cache read requests in some undocumented memory area of the firmware, etc. The only way to be 100% safe would be to use a full ORAM (which, again, would not be practical for daily use). That said, we are still very interested in ORAM techniques, and we are keeping an eye on the evolving research on this field. If anything changes in the future we might consider rewriting Shufflecake by keeping the overall functionality but replacing the underlying slice mapping algorithm with an ORAM scheme.
How do you prevent the OS from logging filesystem access and leaking the existence of hidden volumes?
We don't. It has been known since the work of Czeskis et al. from Usenix 2008 that the OS messing up with access logging on hidden volumes is a big problem, there is just no way you can reliably trust the OS and all installed applications therein (document readers, image galleries, etc.) to not do so, regardless of how you design your scheme. So, the solution is to have the OS itself inside a hidden volume, which is the idea that led to the concept of "hidden OS" on TrueCrypt, and it is also the direction where we would like Shufflecake to eventually go. Our vision for the future is to be able to boot Linux by embedding Shufflecake in the bootloader (and asking a password at boot, like cryptsetup does), and then having a different full Linux installation for each hidden volume on that device.
Why did you write this in C?
Because we are old-school graybeards. More seriously, we are investigating Rust, and might port Shufflecake to Rust in the future.
Why did you not host it on GitHub?
We believe that the current git hosting provider we use (Codeberg, a German provider backed by a no-profit organization) offers better guarantees in terms of freedom of expression and protection of the digital rights sought by the GNU GPL license we use.
Why did you release it under GPLv2?
I'd just like to interject for a moment, I think you meant "GPLv2-or-above". When releasing the software we decided to choose a modern copyleft license, and we initially opted for GPLv3. However, we plan of adding features to Shufflecake which might require a certain integration within the Linux kernel, which is released under GPL2, so we opted for "GPLv2 or superior".
Why the name Shufflecake?
Because "slices" at random positions, volumes as secrecy "layers", etc. You can call a device "cake" if you like.
Is that Shufflecake or ShuffleCake?
Who is behind the Shufflecake Project?
Please see the About section.
How do I know you are not an NSA honeypot?
Because we are not anonymous, at least somewhat known in the cryptography community, and because the whole idea makes absolutely no sense since the code is open source.
I still don't like this and I think it's dangerous / silly / insecure / a waste of time.
This is not a question, but anyway we are doing our best to make you change your mind. We would like to quote an answer from an obscure thread on StackExchange from 2016: "... the opposition to plausible deniability is likely given by the long-time opposition a lot of Linux supporters did against TrueCrypt, that was done just for license issues albeit disguised for technical reasons. An easy-to-use and effective plausible deniability is likely the best feature in TrueCrypt but a lot of Linux users, which didn't find TC in their distribution, get used to crypt with tools without it (for many years LUKS had no plausible deniability support) and get used to say that "plausible deniability is worthless or harmful". It was a case of "sour grapes" that still goes on".
Can I contribute to the project?
Absolutely yes! Please check the project repository.
The best way to reach us for quick questions and support is our Jabber/XMPP Multi User Chat (MUC) (Tor welcome).
Otherwise you can reach us directly by email at websiteATshufflecake.net
We also have a Mastodon channel on the FOSStodon instance @firstname.lastname@example.org
The Shufflecake Project (including code, website, and all infrastructure and communication) is created and maintained by Elia Anzuoni and Tommaso "tomgag" Gagliardoni. All opinions expressed herein belong to us only, and do not necessarily respect the point of view of anyone else.
Shufflecake was initially developed in 2022 as an EPFL Master Thesis project by Elia Anzuoni under supervision of Dr. Tommaso Gagliardoni and Prof. Edouard Bugnion during an internship at the Cybersecurity Research Team of Kudelski Security.