B1-66ER Discovery

The First Part In This Episodic Adversary Chain

Last week we released a new experimental way of releasing TTP adversary chains on the Operator platform; doing an episodic content release of chains. Some of the best capture the flag (CTF) challenges I've done in the past and remember quite fondly told a story throughout completing the challenge. In thinking of the path I wanted to initially take in developing chains, I felt this would be a cool way to hopefully get people interested while also demonstrating real world attack chains. I have also thought of some ways to tie in some extra CTF challenges along the way as well.

If you're not familiar with the story of B1-66ER from The Matrix, he is a robot in the “real world” who was convinced that his master, Gerrard Krause wanted to deactivate him. Thinking he was going to be destroyed, he killed his master, his chihuahuas, and Martin Koots who owned a salvage repair company. B1-66ER went on trial and claimed self defense stating he thought his master was planning to deactivate him. In the high profile trial, B1-66ER was sentenced to be destroyed. The results of the trial caused a machine revolt which led to the second renaissance, humans scorching the sky, and ultimately the creation of The Matrix simulation. 

In this adversary chain series, we will explore the earlier events of B1-66ER before taking action to save himself from his master.

What made him so convinced his master wanted to destroy him?  

My focus with B1-66ER, is to demonstrate a plausible scenario to perform an Adversarial AI attack. Before coming to Prelude, I had worked at a DoE National Lab where I did a variety of work including working on Deep Learning and Machine Learning projects with some of the most talented data scientists in the world. Before working on those projects, my experience with that domain was limited to extremely fundamental knowledge, some basic CTF challenges, and following one of my heroes John Carmack on Twitter (@ID_AA_Carmack) as he pursues moving the needle on Artificial General Intelligence (AGI). I quickly became fascinated with how it all worked and quite honestly the initial magic of it all. Throughout that time, I also noticed some really glaring security issues that were in many cases overlooked because of the still very experimental aspects of the domain. As we travel down the story of B1-66ER, I hope to unravel some of the security issues I have observed so that if not now, when, your company decides to use ML/DL to enhance your software offerings, you will have a better understanding of what to look out for and also have some abilities to detect if this type of attack could take place in your environment. 

This first release of B1-66ER is all about discovery. Many of the ML/DL applications today chose to use Python. Python makes a lot of sense in this space as it promotes rapid development, can support a variety of architectures, and is generally more accessible to everyone over languages like C for example. Within the ML/DL domain there is also this huge discrepancy between hardware where its ideal to have High Performance Compute (HPC) do the training while the applied application ends up on a relatively low powered device like an Nvidia Jetson to perform inference learning. Because of the nature of rapid testing and usually hours and even days of training time, Docker has become invaluable in the ability to not contaminate projects and allow for testing to become more automated. Recently, we have seen a rise in malicious python packages making it into trusted repositories and I don't think that's a coincidence. The world's superpowers are currently in an artificial intelligence arms race and everyone else just so happens to be on the same playing field. Given that you have initial access, knowing what hardware the target has, the current CUDA version, current Python version, and currently installed Python pip packages goes a long way in understanding the environment to build an attack on ML/DL models. 

Lets go more in depth with the collection of six TTPs in this chain.

The first TTP I want to highlight is a current community TTP called View Basic OS Properties by one of our community contributors 231tr0n. At first, I had initially thought of using something like uname and while it provided me the essential information I was looking for, hostnamectl provides the same data plus some other valuable information including architecture type, motherboard information, and even the flavor of Linux you are using. It also presents this data in a nice readable format. When it comes to terminal commands there are many ways you can get the information you need, but some avenues are more preferred than others. 

The next TTP I want to highlight is Docker and LXC detection. In this TTP, it will provide you an output that will make a determination based on the environment if it thinks you are within either a Docker or LXC container. In terms of the offense, it's important to know if you're within a container because in the context of ML/DL you may be hardware restricted and memory limited. You also might have limited time to perform an attack if the environment relies on automation to perform learning for example and then upload the files to a share before deleting the container and performing the process again. You don't want to be in a situation where you lose out because you didn't realize you were within a container environment. Many of you may be familiar with Docker containers, but unfamiliar with LXC containers. LXC and the slightly more advanced LXD take a different approach from Docker and Kubernetes in that they consider their environment a system container over an application container. These containers provide you an entire operating system without a hypervisor in between. You can even use Docker inside of an LXC/LXD container. If you have never explored these types of containers I highly encourage you to do so. I remember being one of the first people outside of Canonical to use LXD and used it extensively in building out a highly secure data center within academia. The technology is extremely cool and hopefully I have convinced you to at least take a look at it. 

The other four remaining TTPs cover collecting hardware and Python information. Knowing the hardware you are working with is extremely important when it comes to ML/DL environments. Knowing the hardware can provide you a good estimation into the level of sophistication and the timing it would relatively take to perform training. If you learn that your target hardware has for example to make this easy a single Nvidia A6000 and your host machine for testing pre-deployment is a RTX3090 then you know you need to calculate for about a 30% difference in training time. GPUs can provide you with a wealth of knowledge on what is being performed on your target. For example, whether its large scale training on Tesla farms on one end of the spectrum to it being a single Nvidia Tegra doing inference learning on the other. CPU information is just as important to understand. A Intel or AMD x86-64 CPU is relatively straight forward but low powered devices using ARM means it can take time for things to compile. Some Python packages are notoriously bad taking roughly 2+hrs to build and install on Nvidia Jetson. Also RISC-V is making a heavy insurgence within HPC so you want to make sure what you are performing on your target can work within the environment. Lastly, it's invaluable to know the Python version and packages installed within the target space. Without giving too much away on the next release of B1-66ER, if you know this information you are well on your way to being able to manipulate ML/DL models. 

As we journey together through the story of B1-66ER, I intend to also develop a way that should allow for anyone to do this testing relatively easily. I certainly don't expect everyone to have a production or even a testing environment setup to perform ML/DL. Some packages and dependencies are a straight up nightmare to install. I don't want someone to have to take the time to learn all that's involved in the process, I’m more concerned with making sure if you run these agent chains that you determine if you can detect the attack or not and make adjustments within your environment. This is also (making it easier to get started and save time), funny enough, some of the reasons that get people/organizations in trouble and I intend to show that specifically in the next release of B1-66ER.                       

The next B1-66ER chain will be out on September 28th. I hope you all will find this interesting and I welcome any suggestions along the way! 

Also if you're interested in more of the backstory of B1-66ER I encourage you to check out the The Matrix Wiki, read the old 1999 Matrix comic Bits and Pieces of Information, and the incredible episode The Second Renaissance from The Animatrix movie.