A Little Bit of Rust Goes a Long Way with Android's Jeff Vander Stoep

You may not be rewriting the world in Rust, but if you follow the findings of the Android team and our guest Jeff Vander Stoep, you’ll drive down your memory-unsafety vulnerabilities more than 2X below the industry average over time. 🎉

Links:

https://security.googleblog.com/2024/09/eliminating-memory-safety-vulnerabilities-Android.html
“Safe Coding”: https://dl.acm.org/doi/10.1145/3651621
- “software safety is framed as an emergent property of how it is developed”
- “The potential for defects is a hazard that arises during development, and it’s the responsibility of the development environment to mitigate this hazard.”
“effectiveness of security design”: https://docs.google.com/presentation/d/16LZ6T-tcjgp3T8_N3m0pa5kNA1DwIsuMcQYDhpMU7uU/edit#slide=id.g3e7cac054a_0_89
https://security.googleblog.com/2024/02/improving-interoperability-between-rust-and-c.html
https://github.com/google/crubit
https://github.com/google/autocxx
https://en.wikipedia.org/wiki/Stagefright_(bug)
https://security.googleblog.com/2021/04/rust-in-android-platform.html
https://chromium.googlesource.com/chromium/src/+/master/docs/security/rule-of-2.md
https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos
https://kb.meinbergglobal.com/kb/time_sync/ntp/ntp_vulnerabilities_reported_2023-04
https://blog.isosceles.com/the-legacy-of-stagefright/
https://research.google/pubs/secure-by-design-googles-perspective-on-memory-safety/
Lars Bergstrom - Beyond Safety and Speed: How Rust Fuels Team Productivity: https://www.youtube.com/watch?v=QrrH2lcl9ew
https://source.android.com/docs/setup/build/rust/building-rust-modules/overview
https://github.com/rust-lang/rust-bindgen
https://security.googleblog.com/2021/06/rustc-interop-in-android-platform.html

This rough transcript has not been edited and may have errors.

Deirdre: Hello. Welcome to Security Cryptography Whatever. I’m Deirdre.

Thomas:I’m not David Adrian.

Deirdre: Thanks. That’s Thomas. And we have a special guest today. We have Jeff, who is from the Android team. Hi, Jeff.

Jeff: Hey, everyone.

Deirdre: Hi.

Thomas: He sounds really excited to be here.

Deirdre: He is doing us a big favor by joining us late at night his time. We invited Jeff on because he helped co write a blog post from the Google security blog recently entitled Eliminating Memory Safety vulnerabilities at the Source. And to TLDR, it had some very interesting results from deploying and adopting Rust in practice in the Android project, and how, basically walling off your old memory unsafe code with a nice abstraction boundary and only writing new code in memory safe languages like Rust and others significantly reduces your vulnerabilities over time in a sort of non obvious way, which does not involve rewriting the world in Rust or rewriting the world in Kotlin. And so we just wanted to ask him a whole bunch of questions about this.

Thomas: Jeff, real quick, you guys are still writing some C code or whatever? Like, if you look at the graphs and stuff, that’s one of the counterintuitive things about the results here, is that despite the increase in memory unsafe code, you’re seeing a metric decrease, like a sharp decrease in memory safety. But that was a reasonable summary of what you’re up to.

Jeff: Yeah, I think so. As far as, like, uh, what, what code we’re still writing? Yes. Uh, we are still writing and touching some C and C code. And, um, you know, we expect that to continue to happen over time, but also to decrease over time.

Deirdre: Okay, that was not obvious to me when I first went through this. What memory unsafe parts of your code base are you adding to and how are you restricting that over time to align with sort of this, like, new perspective on how you approach, uh, adding to your code base over time to mitigate and minimize, uh, vulnerabilities.

Jeff: Yeah. So, uh, we have, like, kind of recommendations and best practices for things that teams should be doing, but also, like, we’re not trying to be over pedantic on. On things or, or put too many restrictions in place for. In order for teams to be productive. Instead, what I would say is that we are encouraging and incentivizing teams to use memory safe languages. And of course, as more and more teams switch, the bar and the impedance to switching goes down over time. Right. And so instead of, like, trying to put in a bunch of rules and restrictions in place, instead of, like, teams are kind of switching naturally, right. And as more code is in Rust or in Java or Kotlin or whatever, then that actually encourages the same.

Thomas: So can I get a better sense of the threat model here? So I have, like, an intuition for what the browser threat model is. Right. Like, what you’re. And also, like, a sort of sense of what code is getting written there and, like, what the security hotspot code is there and all that. Right. I, and I feel like I don’t have as good of a sense of, um, like, where new code is written. Like, what kinds of places in, in the Android code base tend to be implicated in vulnerabilities that kind of, so, like, as a starting point, like, what does that, what does that footprint look like? Like your starting point for this was. What was that?

Jeff: Yeah. So we actually talk about this a little bit in the blog post, but, like, one of the challenges that you have in running an operating system is that the threat model is really, really complex. So we have things like network interfaces, which, similar to a browser, are an obvious entry point, but also we have to be able to run untrusted code. Users can install a third party app, for example. We have everything from image parsers, network stacks, even network stacks and firmware, for example. But then on top of that, we have other APIs that are then reachable to third party apps. So, like, one escalation path that we could see is like, you exploit a messaging app with, like, a malicious image file, and then from the message app, you then exploit the GPU kernel driver. And then once you’re in the kernel, then you basically have access to anything that you want on the device, right. And so what’s kind of interesting from, from, from our standpoint is that trying to decide which of those things you spend effort on is itself a pretty large cost.

Deirdre: Yeah.

Jeff: Um, and so part of what we want to do is actually spend less time doing that and just be able to say, now everyone can do things that are safe, and we, and, you know, we, like, we can just eliminate a threat and not have to worry about a threat. And that’s part of what we’re seeing. We now have different pieces of code where we just don’t have this problem anymore. And it’s really nice. We just don’t worry about it in those areas anymore.

Thomas: So if I’m following that and if I’m trying to make a mental— I hear you saying that one of the things you’re, one of the benefits you’re hoping to get from ramping up the use of Rust code on the Android platform is to not have to have intricate models of like where the sensitive code is and where the, you know, like where you could tolerate memory corruption versus where you can’t tolerate memory corruption, that kind of thing. You just take it off the table is the idea, right? But like, as I’m listening to you, it’s occurring to me that like, so if I think of what Android is, like, my first thought is, okay, it’s the kernel, um, which is really hard to get Rust into. And then I would have thought like, you know, system applications and things like that, that you’re shipping and it, like, it’s occurring to me. This is also a lot of framework libraries that isn’t, like, that isn’t the operating system itself. It’s code that’s getting pulled into every application that gets shipped on Android, right? but am I right that some of the code that we’re talking about here is just like, yeah, for instance, like an image parser library. Like, I don’t intuitively see that as part of the operating system unless there’s a system app that uses it. But you’re saying like, am I right that like, that Android applications that other people write are using that code as well? So we’re also thinking basically about like what the libc is and what the image parser is and like your libpng or whatever that is. Right? That stuff.

Jeff: Yes, yeah, exactly. And to give you like a broader context, uh, for example, we have, you know, maybe 100 different hals that are running on a device. We have all of the system APIs. Um, you know, anything that requires a permission check has to be done across the security boundary. So that’s another process that, that API is running in. Gosh, I wish I had ADB access on my device right now. I could do a P’s and you could see that there’s 1000 different processes running right now on my device, only a small subset of which are actually applications.

Thomas: Yeah. One of the things that’s occurring to me is I’m wondering if there is a knock on effect of the operating system work that you’re doing, of the Android work that you’re doing. That is also, it’s interesting to think that if you took this to its logical conclusion, you could be swapping out a fair bit of application code for your developers without them having to think about it being memory safe, keeping the same interfaces, but they’re still calling into it. And now parts of their application are written without them having to think about it. In Rust as well.

Jeff: Yeah. And Java and Kotlin and. Yeah. What I would say is that when you install an application on your device, probably in most applications, most of the code that is running actually comes from the operating system and not actually what was shipped by the developer. Of course, there are large exceptions to this, like browsers being an obvious example.

Deirdre: So when you’re talking about the security boundary we already in like an OS or even in a browser, you already have notions of a security boundary that are kind of like baked in, where youre like, oh, all of this stuff is over here. This is trusted. This is in the kernel or far away. And this other area over here, this is definitely where this is attacker controlled supply data. This is untrusted, and you have to do something between one to the other, or vice versa. How does that change, if at all, when you are starting to rewrite parts of these things in a language like Rust, that just mitigates a whole class of vulnerabilities that used to kind of be encapsulated as, ah, that stuff is behind that security boundary. Like, do you have to evolve the notion or existing notions of a security boundary in this sort of world?

Jeff: Yeah. So memory unsafety is kind of a unique example for vulnerabilities because they lend themselves to being chained together. And so when you have high risk code, what we often do is we often isolate high risk code by itself. And so as the risk of code changes, we will be making and already are making different decisions based on what level of isolation that we provide to different things. So if I can replace an unsafe, a memory unsafe parser with a memory safe parser, then I’m probably not going to sandbox that, regardless of whether or not it processes untrusted content.

Deirdre: Okay. And do you have any other sort of, I want to say, like a rubric or a way of scoring, like, whatever. Scoring, evaluating, whatever you want to call it. We kind of have, like, a instinctual feeling that, like, if I have a parser implemented in Rust with no unsafe keywords or anything like that, or any weird exceptions under the hood, I have a pretty good feeling that this is a much safer parser and much less likely to have a, you know, a high, a critical vuln versus one that’s written in an unsafe language. Therefore, I feel pretty good about not putting that in a sandbox. But do you have any other kind of ways of evaluating, like, okay, the security boundary that I need for this component is different than this other component. Other than, yeah, seems all right.

Jeff: Yeah. I mean, memory unsafety doesn’t like, or memory safety doesn’t replace good security architecture. Right. So we still split things up into logical groups, and this isn’t just a security thing. Right. This is like basic system stability and reliability and I. And other things that we want.

Deirdre: Yeah.

Jeff: So, yeah, we’re still going to split things up into logical components. We’re still going to do things like principle of least privilege for various sandboxes. Yeah. Does that answer your question?

Deirdre: Yeah, to a degree. Which is basically like, you kind of have to see each component for what, or each module, as it were, for what it is. Because if it’s a memory safe parser, but the worst thing that that component can do after you’ve parsed some attacker controlled blob is like, read. That’s different than if you have a parser that’s written in Rust and it parses the thing. And the thing that it can do is read and write and do a inter process. Who’s he, what’s it. The capability makes a difference there, too.

Jeff: Yeah. And what I would say for parsers is that for parsers in general, I wouldn’t even say in general, the vulnerabilities that we have in parsers are memory safety vulnerabilities. Like. Like, we’re not having permission check vulnerabilities. Right. Whereas, like, you know, you could have that in, like, a network stack if it’s screwing up a. Like a permission check or encryption or, you know, like, there’s lots of stuff that can go wrong in a network stack for, like, image parsers or any other type of format parser. Like, the vulnerabilities are memory safety vulnerabilities.

Deirdre: Cool.

Thomas: I’m looking at the blog post. We’re going to, as usual, take a shotgun left in this whole thing. There’s an interesting graph early on in the post. You have a simulated model of ramping up memory safe code versus memory unsafe code, and then you have the actual empirical data from lines of code from the Android open source project. So I’m looking at a graph starting in 2019 where you’re about like a third memory safe versus memory unsafe code. And then in 2024, it’s now, like, roughly half and half, maybe a little bit more like, maybe closer to. So maybe that’s like 60% memory safe. There’s no numbers in the graph, but that’s roughly what I’m looking at, right? So I have a couple of questions. Right, so, first of all, 2019 seems. 2019 feels early when we think, when we talk about memory safety. Like, with people in the zeitgeist about memory safety. Right? Um, that’s Rust, right? That’s code for Rust. Right? It’s like, when we’re. I know it’s not in your case, but, like, in the zeitgeist, when people hear memory safety, they’re thinking, oh, we’re talking about rewriting things in Rust.

But in 2019, that couldn’t have been, like, you could not have been a third rest code. Right. So what does memory safe code there mean in 2019? Is that Java?

Jeff: Java. Just Java.

Thomas: Okay. And then it looks like you’ve roughly doubled the amount of memory safe code since. From 2019 to 2024. Like, the ratio has changed. But also, if you look at the chart, it’s like. It looks like the amount of memory safe code in the Android open change project is about double how much of that is the initiative to start doing? Like, how much of that stuff that wouldn’t have been memory safe before? And how much of that is. This is Java Kotlin code, and we just wrote more of it because that was part of this vampire.

Jeff: So I think there’s some of both. Right. Like, we actually do have things in place to, uh, encourage teams to shift. So, like, one of the things that we talked about in a previous blog post was that we even in the past, we would use the Rule of Two, uh, in order to encourage teams. Um. God, how do I describe the Rule of Two? It’s. It’s. It’s essentially that, like, if you’re going to process untrusted content in a memory unsafe language, then you have to sandbox it.

If you use a memory safe language, you don’t have to sandbox it. And so even a few years ago, we had things in place that would actually encourage teams to use memory safe languages. And so, yeah, just imagine things like that and how that encourages what teams do today. We have examples where, like, no, I actually need native code to process something. In the past, what they would have done is they would have used C, for example, and then we would have forced them to sandbox it. If a team has that decision now, they’re going to use Rust because the sandbox penalty is a penalty they would like to avoid.

Thomas: So the Rule of Two, it sounds like something that’s broader than sandboxing.

Deirdre: Yeah, they also have it in chromium.

Jeff: But, yeah, it actually came from chromium.

Thomas: Okay, so I guess one question I have is you also have a. You have a graph, and we’ll come back to it. Of the sharply declining number of memory safety vulnerabilities over time. Right. If I’m thinking about where you guys were at in 2019, just to try and get a sense of what this whole trend looks like. Right. We’ve. That’s like 100%.

Thomas: 100% is a weird way to put it. Right. But, like, we broadly kind of generally categorically applying that. Like, if you’re writing Emory safe, was all the memory unsafe code sandboxed in 2019, or did that also increase from 2019 as you. As you ticked forward from 2019 to 2020, did you also increase the number of sandboxes and stuff you’ve had as you caught more, or were you at that point already caught up on sandbox? So you were already doing that?

Jeff: Yeah. So one of the things we talked about is how we’re actually, um. So we have written about this, and Chrome has written about this as well, which is that sandboxing has limits and it has system resource limits. And so quite often, we couldn’t necessarily make security decisions that we would have wanted to because the penalty for it was too high. And so that’s another thing. Nice now, right. Is that we’re. We don’t have to make security versus performance trade offs.

We can just do both.

Thomas: That makes perfect sense. I’m trying to develop an intuition for what it looked like when you guys were sandboxing something. Are we talking about, is the sandboxing overhead there? IPC overhead? We’re running multiple processes, or was it. We’re instrumenting code as it runs? What is the, like, the, like the archetypical, you know, somebody wrote a big memory, unsafe blob of stuff, and we have to sandbox it. Like, what do you think the default is there?

Jeff: Yeah. So the media frameworks. Android’s media frameworks is. Is. Is actually the perfect example of this. So before the Stagefright vulnerabilities in whenever that was 2015, we had one massive media server process, and it. And it had, you know, all of the camera stuff, all of the codecs, all of the image processing. Like, it had access to all of the kernel drivers that it needed to for hardware.

It had all of the hals running in-process.

Deirdre: Oh, boy.

Jeff: And I think that single media server process now is something like 14 separate processes in Android now, where we’ve actually not just split out the risky parser code into very kind of deep, unprivileged sandboxes, but we’ve also just kind of logically split up. Like the audio processing now all takes place in two processes, actually, something called audio server and then an audio. How similarly, like, the camera is the camera server and the camera how. And so it’s, on the one hand, like everything is kind of like split out better, but also like, we can’t keep doing that for forever, right? Like, we can’t, we can’t take every single process and split it out into 14 processes. Yeah. Okay. We just don’t have the budget for that.

Thomas: So when I’m thinking about sandboxing here, the right way to think about it is roughly the same way that chrome is divided up into at this point, like microservices and all the attendant overhead. That makes sense.

Deirdre: Okay, I’m going to switch over a little bit because this is sort of like we can take code that happens to be written in the memory unsafe language and restructure it wisely. And once upon a time, it used to be, if it’s in a memory unsafe language, you must sandbox it. If it’s, if it’s in a memory safe language, you don’t have to sandbox it. Like, you have to, like pick one. And one of the interesting things that came out of your post that seemed counterintuitive was that the old memory unsafe code importantly matures and gets safer with time exponentially. Not just that, like, you know, it will stay steady state, it will get safer with time with an exponential decay. And then if you’re only writing new code in a memory safe language like that, like the trade off becomes very clear. Like the, like, the eventual like pivot point becomes very clear in these graphs.

And I think that was one of the things that was like unintuitive to someone just sort of thinking about this from kind of first principles. And you write that the returns on investments, like rewrites of the unsafe code into a memory safe language like rust, the returns on that sort of investment diminish over time as the code gets older. And, like, we can go back to, we just talked about like, rewriting things like parsers or some of these higher risk things, these applications that you’re actually the thing that you’re writing, not just that it’s a pile of C or C, it’s the thing you’re doing with it. Those things may be riskier and you still want to rewrite them in Rust so that you don’t have to worry about sandboxing them or something like that, but just a pile of unsafe code. You basically write that average vulnerable lifetimes five year old code has a 3.4 xenore to 7.4 x lower vulnerability density than you code you sort of kind of gestured at this in the post of why this is true, not just that you measured it and you’re able to observe it, but just the mechanisms of why. Please tell me why you think this is true.

Jeff: So I think that the way I would look at it is that just the frequency that vulnerabilities are found is directly proportional to their density.

Deirdre: Okay.

Jeff: And so if I’m doing, like, an Easter egg hunt, the number of easter eggs that I’m going to find are proportional to how many that are out there in their density. Right. And I think what you see in code is that people fix bugs when they find them. Okay. You know, and, like, I. Whether those bugs are vulnerabilities or not, if I go back to my Easter egg analogy, anyone who’s ever hidden Easter eggs for an Easter egg and is finding them months later, the same thing happens in code. We did an analysis of kernel vulnerabilities, and we found that almost half of kernel vulnerabilities are found and fixed long before they’re ever discovered to actually be vulnerabilities. Right, right.

And what’s actually happening there isn’t, I’m assuming, people trying to hide the fact that they’re vulnerabilities, they just don’t know they found a bug and they went and they fixed that bug. Right?

Deirdre: Yep.

Jeff: And so I think that’s actually the same thing that’s happening here, is that as you touch code, like, less. Mostly what’s happening is you’re finding. You’re finding and fixing the bugs over time, and that’s just directly proportional to the density of bugs in that code. So the density is going down.

Deirdre: Okay.

Thomas: I have, like, I have so many thoughts. They’re not good thoughts. These will all be bad thoughts. Right. I want to say up front, like, you know your code. I don’t know your code base. I’m an iPhone user. I have no idea how your code base works.

Right, but I read the Alexandra paper. Like, the open source defect density, like the decay, like the. The half life of vulnerabilities encode over time thing, right? And, like, it’s. Like. It’s a profound statement, right?

Jeff: Yeah.

Thomas: Like, it was a profound observation. Especially, you know, I know that memory safety inside of Android is much more complicated than can we use Rust, right, because you also have the decision to write whole components in high level, higher level languages anyways, right? But, like, that battleground between C and Rust is it’s all anybody is thinking about right now, and it’s easily the most important or most talked about issue in software security right now is that frontier. Right. And that observation about the half life of vulnerabilities, if that’s true, says something pretty profound about what the work looks like to kind of shift over to the memories they feature. I guess. I guess the first question I have is when you’re thinking about that half life, when you’re thinking about like, how many vulnerabilities you expect, how resilient you expect old memory safe code to be, um, are you thinking about that mostly in terms of prioritizing what things to, you know, rewrite or to build and Rust first? Or are you thinking about it in terms of what memory unsafe code will we still have in the Android open source project 20 years from now?

Jeff: Yeah, I think there’s a couple of angles here. I think, first of all, I would actually look more at the existing memory safe unsafe code as less about like, more like how are we going to, what are we going to apply like, our existing toolkit to if we’re going to spend effort, time and effort fuzzing? Like, where are we going to spend that effort? And I think the other side of this is actually what’s exciting about this result is it is it tells people that doing the inexpensive thing is actually very effective.

Deirdre: Yeah.

Jeff: And that’s what’s actually really exciting about it.

Deirdre: Yeah.

Thomas: Right. Cause you have a really good system of incentives right now. It looks like my top level message from that blog post is that the incentives for AOSP are great right now. Right. Um, like, you don’t have a mandate to rewrite everything. You have a sense of, like, how to prioritize it. And also you have things like, we can get rid of sandboxes and make things easier for, you know, that we can simplify your design. It’s the things you would expect to get from what languages like Rust promised there.

I am fixated on the computer science part of this. Right. I buy 100% the direction you guys are going and that you’re in a good place right now. The early result on the experiment of doing more Rust and AOSP sounds really positive. Right. So there’s a result that was published at Usenix about like the Linux kernel and OpenSSL and the half life of vulnerabilities. And I have some just kind of poorly informed CS kind of, doubt is the wrong word, right, but questions I have, for instance, that is a CVE driven result, right? So like, they are looking underneath the light post for these days, right?

Jeff: Yep.

Thomas: And obviously all of us have the superficial thought of, well, we’ve all heard about vulnerabilities that were found only 20 years after code was committed or whatever, and we know that that’s not going away and all that, right? You have insight about your code base, right? You’re not relying just on that Usenix survey results of all those open source projects. You’re looking at your own projects and your own, it seems like you’re pretty confident in those half life numbers in your code base and you’re pretty authoritative on that. So what gives you that confidence?

Jeff: So we did our own study of this on our codebase, and any data source that we look at seems to give us this result, which is pretty compelling. There was another study that we looked at that looked at fuzzing, and what it says is the cost of fuzzing actually goes up exponential. If you want to scale it with the number of vulnerabilities that it finds, it’s basically showing the same result, but in a different way. And so if we continue to see the same result, like any way that we look, when we look at a problem, whether it’s defect rate or vulnerabilities or, or the cost of finding the next bug, then it’s probably telling us something about like a property of, of software development.

Deirdre: Yeah.

Jeff: And I think, can we say that for sure? I don’t know yet, but I think like part of the reason why we wanted to publish this blog post was because like this is really what the data is pointing us towards and we keep seeing it in different ways. And when we try to actually rely on this property, it doesn’t let us down, which is like another great signal.

Deirdre: Yeah. So it seems like no matter which specific code base it can be, your openSSL, it could be a browser, it could be an OS, and if you’re measuring it with CVE’s, which are reported and measured in one way, or like Android’s own measurement of vulns and bugs and a different reporting mechanism because you have like a bug bounty and things like that, even via all these different ways, they correlate with the behavior and the signal that you’re seeing in terms of how they decay over time and how mitigations like adopting memory safety. And I haven’t read the other Usnik study in a long time, but how the old unsafe code quote cures or matures over time, as it were, in terms of vulns and bonds.

Jeff: Yeah, yeah.

Thomas: I don’t know. It’s interesting because it’s a result that I simultaneously very much want to be true and kind of don’t want to be true. Right. I don’t want to be comfortable with large amounts of memory unsafe code that we keep around just because it’s proven itself. It’s interesting because there are other efforts, obviously, to replace memory on safe code, right. The ISRG proximo project is one of those, right? So I think they did like an NTP rewrite, but that’s an interesting kind.

Deirdre: Of different case because that is like a full on binary, and you can just full on, like, it’s a very well modularized and abstracted out thing. You can just do a full rewrite. And the quote, you know, interface boundary is literally the NTP protocol. Not like not swapping out something that’s, you know, a core system component of Android or chromium or OpenSSL.

Thomas: I don’t think applicable to the problem that, Jeff, you’re working on. Right. But I think it’s kind of a rhyming thing. And the thing that sticks out to me is when Proximo announced the NCP thing, right. I have friends that are more plugged into exploit development and vulnerability research than I am right now. And they were a lot less bullish on it than I was. I wrote a couple of Hackernet’s comments about memory safety is because there’s still a live debate about whether memory safety is good on hackernades. I wrote some rah rah comments about doing an NTP writing, and Rust makes sense to me, and I got pushback from people because it’s like, this is not where the vulnerabilities are.

One thing that these people think is happening is that this idea that we’re doing blanket rewrites of memory, unsafe code in Rust or whatever, it’s a huge waste of effort to them because they have a much better sense of where they’re actually going to find vulnerabilities, and this isn’t where they are. Lines up with kind of the strategic thing that you’re doing, where it’s like relying on the half life of the code, not freaking out about older, proven memory, unsafe period. You still have other countermeasures for it and all that at the same time. It’s like there have to be vulnerabilities in there somewhere.

Jeff: Yeah. So the way we look at it is we’re not actually trying to get to zero vulnerabilities. I know that is kind of a weird thing to say, but like maybe a better way to look at it is that if we as like, you know, system architects are designing systems such that a single vulnerability is the end of the world, then we’re really bad at doing design. Right. And so, like, no matter what, we have to design systems that are robust against the existence of vulnerabilities in. In those designs. Um, the problem that we currently have isn’t that we can’t do, like, good, like, system architecture and security architecture. It’s that the vulnerability density is so high and memory safety vulnerabilities in particular are so flexible that what we see is we see chaining of vulnerabilities together in order to bypass good, robust system architecture.

And so I think, kind of my thought here is, yeah, is the occasional vulnerability going to exist? Yes. And that’s where we need to actually be applying defense in depth and good security architecture. And that’s actually the solution there, because we’re going to have those vulnerabilities for forever. And even in Rust code, we’re going to have the occasional unsafe code issue in Rust, or just, we have to be robust. Yeah.

Thomas: One of the really striking things in the blog post to me was there’s a graph here of the new memory unsafe code and memory safety vulnerabilities. And you’ve got a little red baseline at 70% for an industry norm for memory safety bugs. Things have changed, it sounds pretty radically from 2019 to 2024, but back in 2019, you guys were still playing to win. Like, you were one of the best secure software teams in the world. If you just think about the amount of effort that was going into securing that platform and paying attention to software security, you’re one of probably the four most important targets in all of software security. It’s not like you guys were just phoning it in in 2019. And back in 2019, all of the memory unsafety that you had with all of the countermeasures you had with all of the whatever your rules were about library safety idioms and how you’re allowed to write C code and this platform and sandboxing, you’re still at the norm.

Thomas: Right. If it stayed like that, you would get the message that there’s simply no way to make memory unsafe code secure.

Jeff: Yeah. So I joined in 2014, and shortly after I joined, we had the Stagefright vulnerabilities, which was kind of a huge moment. So Ben Hawks, the former head of project zero, wrote a really nice article about this recently, where he refers to the Android teams response to the stage fried bugs as throwing the kitchen sink at the issue, which I think is exactly what you’re describing, right. We took an all of the above approach. And I think what we really saw with our approach was that we were able to make progress by some measures. And I’ll give you an example, which is that we saw kind of like external exploit prices start to go up quite a bit for Android. And so that would be like, maybe that had nothing to do with what we were doing, but more likely that is probably a reasonable validation that our approach was working. At the same time, the number of our memory safety issues continued to go up, not down, which again, could also be a result of maybe we were getting better at looking for them, maybe we were incentivizing people, finding them better, who knows the reasons, right? But it kept going up.

Jeff: And so part of what that caused us to do was to actually kind of take a step back and say, this approach, while it’s not useless, do we need to look at this a different way? And so we started looking at memory safe languages as an option, right? Like, because it’s kind of an obvious solution. But then of course, everyone, including ourselves, thought, okay, well, this is a solution that will, if we start now, maybe we’ll have an impact in decades because we sit on this massive pile of legacy C and C, and no one thinks we’re going to rewrite it all.

Thomas: But it’s not. There’s like a synergistic effect here, right? Which is like, it’s your big result here is like, all of the work that you’ve done post Stagefright didn’t really seem to be moving the trend line much, right? Like it was making a difference. But like, if you look at the chart, it didn’t seem to change much. And it’s like you have, like, you have what looks like a pretty powerful synergistic effect here. You add not that much memory safe code, and all of the things that you’re doing to keep the memory unsafe code safe work better.

Deirdre: And to put some numbers on that from your post, the percent of vulns caused by memory safety issues went from 76% of Android vulns in 2019% to 24% in the year of our lord 2024, well below the 70% industry norm. Like, it’s still 70% for the industry of percentage of vulnerabilities to be 70% memory safety vulnerabilities. And Android is down to 24%. And that, and that’s from 2019 to 24. And it sounds like between 2015, when Stagefright happened, and all the work you did to mitigate or to respond to Stagefright between 2015 and 2019 still had you. That’s 76% of loans in Android being memory safety vulns, so.

Jeff: Right. Because, and like, this is where the insight really comes in, which is that the problem is that we’re introducing them at high rates.

Deirdre: And this is one thing I wanted to kind of like really drill into, is that we talked a little bit about why the old code, there’s the density of memory safety vulnerabilities, and also like, the findability. Like one, like the density implies more findability, but also like, if you’re in there and touching the code and working on it, or, you know, working around the edges of it, you’re more likely to find more vulns. This is one thing that’s interesting to me is like, one of the things that I seem to take away from your post is that the old code, the mold unsafe code, if we want stop touching it, we like, over time, if we stop touching it, it kind of shakes out the volumes that we’re going to find. And if we stop modifying it and stop exposing probably things that were once safe enough, but you change how you access these lines of code that do memory unsafe things, you all of a sudden expose a new memory safety vulnerability that you wouldn’t have exposed if you didn’t touch that code in the first place. That seems to both indicate one, literally, stop touching that old code and then you have to leave it, have to have an interface between the code that you’re not touching but you’re still using and the new code. And if that’s actually what holds true, there seems to be these interacting dynamics between you aren’t writing more new code and introducing new vulns. And while you’re in there finding more vulns, but then you should stop modifying the old, old code so that you aren’t exposing new vuln paths to code that was okay until you fiddled with it or something like that. Am I making sense?

Jeff: Yeah. I mean, one thing that I would say is that this trend only happens if, as bugs are found, you fix them. So you can’t just not touch the code. It’s that you are increasingly putting code into essentially maintenance mode that, and instead of if you need to add a new feature, you don’t add it into your old C code. Like, okay, now I’m going to add that feature over here in Rust or Java or whatever. And that’s kind of how you shift teams away from it.

Deirdre: Yes.

Jeff: And yeah, I mean, that’s what I.

Deirdre: Mean, but I didn’t say it very well.

Thomas: Is there something here, like if you’re looking at that synergistic effect on like, we’re keeping a lot of memory unsafe code around, but then the counter vulnerabilities is dropping quickly. Is there something here about the idea of vulnerability scaling, not just with the amount of memory unsafe code you have, but with the number of interfaces between memory unsafe things like the links between modules, library call, inter process, communication, whatever it is, the number of different individual blobs of code that are linked together somehow. And if you break that graph up, if you put more memory safe things in between memory safe code, then if the code is more straightforward to reason about, or it’s easier to confine or sandbox, or it’s easier to test. Is there a graph theoretic way of looking at this?

Jeff: So I don’t know. I think that’s a good theory. One thing that we’re already noticing when we look at the kernel is that certain C APIs are just not safe to be called from rest ever. And so what you see in the kernel is that they’re actually needing to make adjustments on both sides of the language boundary, because what they want to do is they want to do what you’re supposed to do with an abstraction between C and Rust, which is that it is impossible for the C, for the Rust code to cause something unsafe to happen due to the C code. So to your point, they’re having to make the C APIs safer in order to do that. Are we seeing some of that elsewhere? Yeah. I don’t have an answer for you, but probably.

Thomas: Okay, the internals, the implementation as versus the interface, the implementation stays largely the same and largely memory unsafe, but the interface there has to get better just to make it work with Rust.

Jeff: Exactly.

Thomas: That’s super interesting. That’s not a thing I would have thought of. A dumb question about the vulnerability count stuff that we’re working with here. Right. Do these include internal. What is the level of qualification for a memory safety? Vulnerability accounted here, right? Yep. I assume it’s not all the way to CVE. I assume like there are internal findings here.

Jeff: So it probably is to CVE in that these are through Android’s vulnerability rewards program. And what that means is that we released a vulnerability, and probably what I would say is probably about. Oh God. Let me give you a very unscientific estimate, which is that probably about half of our vulnerabilities are internally discovered and about half are externally reported. And of course that’s going to shift all over the place.

Thomas: And that figure would count both?

Jeff: Yes.

Thomas: If you refactored something and foreclosed, you had a bunch of vulnerabilities, but they were never discovered as vulnerabilities because you refactored them. And they were like, that code was vulnerable for. This is not a, you don’t retrospectively go back to code that was vulnerable before. I’m sure you do in a sense. Right.

Jeff: But there’s also, it depends because we have support Windows for Android releases. So you will see, like, you can go look at the Android security bulletin, right? And you might see a vulnerability that applies to Android 14 but doesn’t apply to Android 15. And we actually, we actually, one of the really fun things was they switched the ultra wideband stack to a Rust stack, and within days we had released this. Right. And I think four memory safety vulnerabilities were reported on the old one. And so we still had to report.

Thomas: You should see Deirdre’s face right now.

Jeff: We still had to report those, but they didn’t impact the latest version of Android, so they weren’t, you know, they didn’t have to be patched in the latest version.

Thomas: Here’s the thing I’m kicking myself for not having asked earlier what comes to mind for you as like, the top five big ticket Rust things that are now an Android.

Deirdre: An Android, yeah.

Jeff: It’s kind of starting to show up all over the place. But I think the virtualization framework is really interesting because if we were going to look at a big chunk of high privileged code that applications and stuff are increasingly interacting with, that’s it, right?

Thomas: Yep.

Jeff: The other thing is the ultra wideband stack is a nice one to talk about, but what I think is actually really interesting is we’re having less of these big ticket. Here’s the thing that we can talk about, and we’re having a lot more of like, oh, this team did their new feature in Rust, which is. So there’s a Rust component on a, you know, tacked onto the side of a big chunk of C, and that’s just kind of becoming the norm.

Deirdre: The norm of like just any new thing is just sort of like the Rust component tacked on to existing stuff.

Jeff: It’s not the tacked on part that’s the norm. Right. It’s that new things are moving to being in rest instead of in c. Okay, good.

Deirdre: Which is the kind of takeaway from this sort of work is like, no, really, you can get really big bang for your buck of kind of just doing that, which is if you have something new, just write it in the Rust or another memory safe language and make it interop with the rest of your project and you will in fact, get really good returns on mitigating your memory safe vulnerabilities, which is the majority of your vulnerabilities, period. And you do not have to have this ten year, 20 year plan to rewrite the world in Rust to see dividends to start paying.

Jeff: Yeah. Can I tell you all, one of the reasons why we did this blog post is because over the last few years we kept getting these really great results, right, and they were great, but also they were kind of suspiciously great.

Deirdre: You’re like, this can’t be right.

Jeff: Yeah, there was a little bit of that. And what I think we kind of learned from that was we had done this study or this kind of internal analysis, and this is before that eucnics paper was even published. So we didn’t know that distribution existed. But we looked at our own code and we looked at a couple of other code bases. And so it was quite interesting to see that, ok, yeah, the vulnerabilities really aren’t uniformly distributed across our codebase. They really are in the code that we recently touched. So we had this idea that this is probably going to work better than people are expecting, but then it worked even better than we were expecting. And I think the big part that we were actually missing was kind of this idea of like our older code is also getting safer.

Like, you know, like if we have a half life of two years, then in two years half the bugs have been shaken out of the code that, you know, that code that was just written in C or C++. And so, you know, six years later, think of the code that was written six years ago, right. Like, so many bugs have been fixed and then not reintroduced because we’re writing lower vulnerability density code. And so I think part of what happened was we had a little bit of, I don’t want to call it like a crisis, but what’s going on here? Why is this working so well? And we had to look into this. And so the co-author on this blog post, Alex, isn’t on the Android team. He wasn’t involved in the work that we’re doing on Android, but he was one of the main people who I was kind of working with on just investigating.

Is there an explanation for what’s going on here? So finding that paper and also starting to find other resources that were demonstrating similar things. And then we wrote that simulation and a big part of that simulation was actually, clearly there’s a mathematical model and we can simulate it is the simulation going to show us the same thing that we’re seeing? And imagine our relief when the simulation pretty much showed the same thing, and we’re like, oh, thank God. We now have an explanation for what we’re seeing, and it actually matches exactly what we would expect.

Deirdre: Yeah, I want to reemphasize what we kind of already discussed, which is that it’s patching and fixing the older unsafe code over time. That’s the only thing you’re doing to touch it. Anything new is written in a memory safe language, and you have to figure out your API boundary improvements. Or the way that the new code, the memory safe code, will interact with the existing unsafe code, which might mean improving the c boundaries on the one side and doing whatever you need to do on the Rust side or the Kotlin side as well. And then I want to talk a little bit about, like, all this other stuff, like fuzzing and linters and sanitizers and stuff like that. But it’s like you aren’t doing new development in unsafe languages, and you are, you are doing new development in safe languages is like, kind of the crux, so that you keep fixing issues, but like, in a maintenance mode state of the unsafe code, and you do new stuff, including improvements, new features, not just a whole new component in the safe code. And those two things together make this explosion of reduction of vulns or whatever.

Jeff: So. Yes, but I want to be clear, this is a journey, not a destination. And so the idea of, like, oh, you’re no longer writing memory unsafe code is like, no, that’s not what we’re doing to. We’re in a transition phase.

Deirdre: Okay.

Jeff: And like, that’s part of what we’re trying to show in the graphs there, right. Is that it is a transition, and it’s, you know, for us, it’s been going on for six years. We still are introducing, um, memory unsafe code, and that’s probably the main reason why we’re still having memory unsafe and safety vulnerabilities, right.

Thomas: But, like, the scale of, if you look at, like the, if you look at the scale of the decrease of memory, of memory safety vulnerabilities you have, right. I like, that can’t be coming just from replacing code, right. Because you haven’t replaced enough code for that. It seems like exactly the result that you have right now has to be saying that introducing some titration of memory safe code into a memory unsafe code base, even as you implement new features in C there are large existing C code bases that each release need new features. And that stuff is not being done in rest, it’s being done in C to get the job done. But despite that ongoing development, the result you have is that titration of Rust code that you have there or whatever other switches you did there. Right. Like you still got a sharp drop in those vulnerabilities.

Jeff: Yeah. And like, to be clear, like the drop matches exactly the number that we would expect based on the amount of memory unsafe code that’s being added or modified. It’s really an interesting result.

Thomas: How much of that effect do you think comes from the introduction of memory safe code? Makes it easier to focus countermeasures focus like sandboxing or standardization. How much of that comes from making those things more effective? And how much of it do you think comes from just simply offsetting the amount of memory safe code there is?

Jeff: We think that most of it comes from offsetting, and we think that because we are also looking at other projects that are, aren’t doing this or doing less of it and they aren’t seeing the same result.

Deirdre: Yeah, but like, the fact that like projects like Android and Chrome and a lot of these big projects have invested so much into fuzzing clusters, into sanitizers, into a whole bunch of like big beefy stuff and you just quote, you just start implementing new, most of the new stuff in memory safe languages like Java, Kotlin and Rust. And that makes such a profound difference compared to the investment done for these other techniques. Is this like it’s kind of mind blowing and which is, which is part of why like, you’re like, we don’t even believe what we’re seeing in front of our eyes. We have to like replicate this. It’s amazing. And one, there’s some upfront investment of getting these new languages integrated into the project, into the tool chain, training up developers, getting new. There’s like upfront costs to get that going, but it seems like especially Andrew has gotten kind of the flywheel going because once you’re kind of rolling, you’ve fronted the cost that’s not an ongoing cost of the cluster and compute and all this sort of other stuff that these other techniques bring with it. You’re not paying hundreds of thousands of dollars every, you know, billing period for your fuzzing cluster.

Jeff: Yeah. So like the results are really interesting, but I think the other really important part of the blog post is that it actually talks about why this scales. And like really the most important reason why this scales is because it’s cheap.

Deirdre: Yeah.

Jeff: And it’s cheaper than doing the other thing. Right. Like, a lot of these techniques actually scale incredibly poorly. If for every line of, or, you know, every new function of you add in a memory unsafe language, you then have to add a fuzzer. And then now we actually need to dedicate hardware to doing that work, right. Like these, these things actually scale along with the amount of code that you’re writing. And that’s actually really, really expensive.

And, and what you find is when, you know, teams are under like deadline or shipping pressure, you know, guess what gets dropped.

Deirdre: Yeah.

Jeff: And so, yeah, if you can actually build in the safety to the code development process, then you’ve not only reduced the costs that you’re dedicating the additional costs, you’re actually making the cost of development itself cheaper. Right. Because your code ends up being safer. And I think this is actually one of like the most important parts of what we’re talking about here because when security people tend to talk about costs, they tend to talk about costs in sometimes kind of like ways that are unproductive towards like businesses that actually need to ship things. And so like, one of the ones that, like we always hear on security teams is this idea of like raising attacker costs.

Deirdre: Yeah.

Jeff: Right. Like, oh, we’ve, you know, we’ve got to, we’ve got to make the cost of exploitation more expensive. We’ve got to make it harder for them to find bugs. We’ve, we’ve got to do these things. Right. And unfortunately, the way that we tend to raise attacker costs is by raising defender costs. Yeah, you know, that sounds great for job security, but it’s not good for actual security. Right.

Deirdre: And it’s also bad for like your teams, how they experience working on this project and maintaining this, you know, this code base. Like if you’re, you know, pushing a change and you have like a million processes that are like, nope, roll this back. Nope, this doesn’t pass. And then you, like, it’s a lot of work to both defend the code from an attacker, but also it’s a lot of work to just work on your change and just get it in and just trying to get something done. And then one of the awesome things that you mentioned as just a bullet point, which is it’s not just a bullet point. It’s like languages like Rust but other memory safe languages, shift bug finding further left much earlier before you even check in the code.

So that Rust changes in the Android project are rolled back at half the rate of C changes, getting checked into the project. It means, like, when you have something working, passing, all the tests pass, and it’s been, you know, checked off and merged in, you’re not likely to have to be like, nope, this broke. Nope, this has evolved. No, we have to back this out. And, like, that affects your efficiency and your velocity and also your kind of developer happiness and, like, productivity in general, and that it all costs money because if your developers aren’t happy, they’re going to leave.

Thomas: If you haven’t noticed, Deirdre’s trying to sell people Rust.

Deirdre: Like, look at all the great stuff that comes from it. Like, there’s a whole bunch of good stuff. And then we haven’t even talked about the efficiency of the implementation of, like, the QR code parser thingy. That’s like, a million times faster in terms of, like, it’s a million times—

Thomas: Faster because they got rid of an IPC sandbox.

Deirdre: Oh, well, okay. And this sounds fast to me. Doesn’t matter exactly what.

Jeff: Yeah, so the other thing I was going to say is that, like, we have to get approval for, like, every number that we share, right. So, getting the rollback rate was, like something that we can share, but, when we look at things. So let me preface this by saying measuring developer productivity is actually very challenging, right?

Deirdre: Oh, yeah.

Jeff: But, like, when we look at measures, we don’t see measures. Like, we can’t find a measure that tells us that it’s taking longer to write rest code than it is to write C. Right. Like, everything, any metric that we can find is showing us that teams are more productive. They’re going faster. They’re going faster with lower defect rates. And I think our director of— Lars, he’s the director of platform programming languages on Android, he did this about Rust Nation, but we can’t find a metric anywhere that tells us that using C is better for any type of velocity or quality. What I think is kind of fascinating is what that means for getting teams to use Rust is, first of all, if we can get them to switch from C to Rust, or when we get them to switch from C to Rust, teams do not want to go back to using C.

They don’t need to be re incentivized. They have the incentives that they want because they’re able to actually ship the things that they want to. They’re able to accomplish their work with. With fewer barriers.

Thomas: I was going to ask, right, about the general developer experience here. Right. It sounds like the experience they’re having is once you get people to the point where they’re shipping successfully on the routes, they tend to stick there.

Jeff: Yeah, we don’t need to incentivize them to stay there. I would say that you do still need catalysts to get people to make the switch. And so creating incentives or disincentives or whatever is still really quite useful. And I talked about one, just because we’ve published about it, the Rule of Two, right. It incentivizes teams to not add the kind of complexity and overhead of doing additional sandboxing. But there’s lots of ways to both incentivize the use or to disincentivize the use of circumental.

Thomas: Was there anything surprising about getting Rust rolled out across the project, like, in terms of adoption and people’s experiences with it?

Jeff: I don’t think so. So I’ve been on the Android team for ten years, and it’s the only team I’ve been on at Google. So I don’t feel like I have, like, I don’t have broad experience, but, like, I feel like Android people are just like, very practical. And so if they’re told like, oh, hey, we have a better tool for you to do your job, then people are like, okay, I’m willing to try that. And then more often than not, they’re like, yeah, this is a better tool. Let’s keep doing this.

Thomas: And it’s generally somewhat straightforward to do, like, Rust and C in the same process, like, in the same code bases just calling across.

Jeff: Yes. So Android is built on top of a mountain of C, right? Like, there’s no, like, pure Rust processes in Android. Everything is a combination. Even things that are mostly Rust still rely on basic system libraries, libc, libbinder. We did a blog post actually about interop. I talked about the Android team being a very practical team. I think our blog post about this and our approach actually reflects this, which is that what we wanted when we looked at interoper was we wanted something that was mostly convenient.

And also we just kind of admitted to ourselves that occasionally teams are going to experience some inconvenience and have to handwrite a binding or something like that. And that that was okay. And so that’s kind of what our blog post reflected was that we thought it looked like about 90, in about 90% of the cases, some like, fairly standard and not that complicated tools were going to be sufficient for reasonably convenient interop between c and Rust. And we just kind of admitted we’re like, well, we’re going to keep hacking away at that remaining 10%. But also, we’re not going to block on that remaining 10% because the alternative is that someone who’s paid to write code may be paid to write some more code, right. And that’s an okay thing to happen.

Deirdre: Yeah. And that’s like, the kind of 10% is like working on interop tooling like crubit and autocxx. Is that, are those. Some of those, yeah. And that’s kind of talking to both sides of the kind of binding layer of like, you need to do some work to make the C side interfaces a little bit better so that you can make sure that the Rust code stays safe and with correctly defined behavior when it’s calling the C code sort of deal.

Jeff: Yeah. What we want is that convenience, but also the safety of that convenience, right. So, like, binding is a good example of something that will just happily toss up an interface.

Deirdre: Lots of unsafe and understanding—

Jeff: Yeah, yeah. It’s more likely than not going to not have its safety properties.

Deirdre: Yeah.

Jeff: And so, um, you know, and we use bindgen all over the place, and then what we have is we have people who are available to. To review those things for other, for, for teams. But, yeah, like, that’s, that’s not the ideal state to be in. But again, I think kind of like our, our approach has been that we’re not going to block on the ideal things and our results are essentially that doing the okay thing is actually very effective. And so what I would hope is that people aren’t blocking on perfect solutions, that they’re moving forward with non perfect solutions, because the non perfect solutions turns out work pretty well.

Deirdre: Yeah. Especially when, like, especially when you have results like these that literally say, not only did we not rewrite the world before we found results, not only did we just add some Rust or add some memory safe languages and not like, block on perfect solutions of binding generators or anything like that, that are perfectly memory safe and have defined behavior, whatever. But we kept writing C and C in some areas of our project and we still had a major reduction in memory unsafety vulnerabilities over five years. So, like, having the, like, the extremely material results of this, like, don’t let. Not even. Don’t let the perfect be the enemy of the good of, like, just get started. If you just get started, you will have, like, crazy ROI on your investment. And it’s—

And the ROI scales, it’s not even, like, you know, it’s like going to be linear indefinitely, but better than the thing you did before. It’s like you do a little bit of ROI and, like, it’s kind of plateaued in terms of the investment. It just keeps on growing.

Jeff: Yeah, it’s funny, right. Because, you know, I think this isn’t just security people, but software people in general really like to be, you know, kind of pedantic. And so you all work in cryptography, right? And in cryptography, that, like, that’s to be pedantic.

Deirdre: Not exactly. All software has to be pedantic.

Jeff: Yeah.

Thomas: The word you’re looking for is rigorous.

Jeff: Yes. Thank you. Thank you. Much better word. But yeah. What I would say is that, like, when, when we show these results to teams that are wondering, like, what they should do, like, they really do kind of change. Like, like a couple of examples are like, teams who are like, well, it doesn’t matter if we use Rust for, for this thing because this other thing’s in C and therefore all bets are off, toss everything out the window, might as well C for eternity.

And then the other one is, it’s like you’re almost giving teams permission to be like, no, the stuff that you like, your work, your previous work, your team’s investments, those are still good. You get to keep using those. We’re not telling you to throw out everything that you’ve done. And so I think there’s also kind of a sense of ownership that people want to retain in the work that they’ve done and the code that they’ve written to, it just ends up being very convenient that the result says that that’s probably something that you should do. And, yeah, one of the things that we had originally put in the blog post was kind of a stronger statement about, like, hey, like, even in memory, unsafe code bases, about a third of your bugs are non memory safety. So, you know, if, if you have code that, say, you could calculate that, yeah, two thirds of the bugs have already been fixed in this, or more than two thirds, and you go and rewrite that, odds are you’re going to reintroduce all of the non memory safety bugs or, you know. Yeah, you just have high potential of doing those things, right. And so it just, like, maybe we shouldn’t be doing that at all.

Deirdre: And, and like, basically this is evidence that you really don’t have to.

Jeff: Yeah.

Deirdre: Like, you can, you get extreme bang for your buck and it makes all those things a lot easier and a lot easier to get started.

Jeff: Yeah.

Thomas: I mean, look, the blog post undersells it. Right? This is, I think it’s a pretty, pretty huge case study that you’ve got here. Right. Like, this is, like I said, this is one of the more important security target code bases in the world. And, yeah, I mean, just in terms of the ROI you guys have gotten on, the. Looks like kind of a practicable amount of, you know, introducing Rusko, it looks like a pretty big win so far. So, yeah, I’m psyched to see how this plays out moving forward. But, yeah, this is a lot more interesting than it kind of that it looks like on the tin.

Like, yeah, we rewrote some stuff in Rust. That’s not the story here. So, yeah, super interesting stuff. Thank you.

Deirdre: Thank you! Jeff, do you have anything else that we didn’t touch on that you want to send us off with?

Thomas: I feel like you don’t have to.

Jeff: I feel like we actually discuss most of what I wanted to talk about. I’m glad we got to talk about the kind of shifting away from an attacker based mindset. I think that’s another novel thing to talk about to security folks, especially on this topic, because it’s so just ingrained in everyone that we look at what attackers do, and our primary goal is to frustrate attackers.

Thomas: It’s interesting, right? It’s like, there’s like, if you talk to lawyers, right? Like, you have, like, a mental model of a lawyer. Somebody goes into court, but most lawyers aren’t in court ever. Right? There are litigators, and then there are people who do contracts and review and strategy and all that stuff right here you’ve got, like, this is like, I don’t know, this is going in, right? Like, here you’ve got, like, this thing where you’ve got, like, you have a software engineering answer to this problem. Like, you’re looking at as a software engineering problem, not as an adversarial process against attackers, right? Like, it’s, you know, there is that process happening at the same time. But, like, you also want the nuts and bolts of things. You want to be situated well so that, like, if it ever. If you ever do come into contact with, you know, attackers. Right.

Like, you’re in a much better position. So, yeah, I mean, I’m also a sucker for software engineering stories. He. Yeah, this is. Yeah, it’s pure software engineering. It’s awesome.

Jeff: Yeah. Yeah. So, you know, I talked about, like, raising the cost for attackers and how we do that with, like, exploit mitigations. But there’s even, like, fuzzing right where it’s like, it’s not even like that this is a good tool for defense. It’s almost more like we’re going to do the attacker thing first because that raises their costs and not because it’s actually a good thing for defenders to use. Right. Fuzzing actually is a little bit problematic for defenders because obviously attackers only need to find a couple of bugs and defenders need to find more of the bugs than the attackers.

But yeah, to your point about it being a software answer is that’s what I think is actually really exciting about it and why it actually kind of works in defenders’ favor is because we’re just ignoring the attacker. Right. We’re saying this actually drives down our costs and it drives down our costs by just improving the software quality. And then of course it almost ends up being a side effect that it actually drives up attacker costs. So it’s by trying to make our own lives easier we’ve made attacker lives more difficult. And yeah, like it’s just kind of an interesting result by instead of focusing on attacker strengths and trying to undermine attacker strengths, we actually said no. Like defenders have their strengths and their strengths involve things like owning the developer environment.

And so instead we’re going to focus on our strengths. And yeah, it’s at least an interesting area where that works. And I will say it doesn’t just work in memory safety, it actually works in other areas. We mentioned cross site scripting or there’s SQL injection or things like that where we’ve applied the same technique and see similar results. And I think what’s kind of exciting about memory safety is this was the one where it was like, oh sure, that works in those little areas, but will it actually work in kind of like the industry’s Achilles Heel and yeah, I think so far the, it looks promising.

Deirdre: Yeah. And like this is sort of like the memory safety part of these languages. But like the same can be said about some of the other nice features of languages like Rust, which is like strong cost free typing and like a whole bunch of other stuff that just make making correct code a lot easier and just ready to go. And the tooling is there for a developer to just do quote the right thing or do a good thing very easily and very, you know, it’s not like dragging yourself over glass to do the right thing. The language and the tooling make it very easy and helpful to just do the right thing. And it’s fast and it’s efficient and you just get to ship good code much easier and it mitigates other classes of vulnerabilities besides memory safety. And it’s the same sort of, sort of effect and it just makes the software better, which happens to make it more secure and the attacker costs low and it makes a lower cost to the business who’s actually doing the project or anything like that. So you can sell your your CISos or whoever on like this is a good idea because it’ll save you business money and money money and costs for the attacker too.

Jeff: Yeah.

Deirdre: Jeff, thank you so much. This is awesome. Thank you so much for letting us pick your brain with lots of annoying questions.

Jeff: Yeah, thanks, thanks. Thanks for having me.

Deirdre: Totally!

Security Cryptography Whatever is a side project from Deirdre Connolly, Thomas Ptacek, and David Adrian. Our editor is Nettie Smith. You can find the podcast online @scwpod and the hosts online @durumcrustulunm, @tqbf, and @davidcadrian. You can buy merch online at merch.securitycryptographywhitever.com. If you like the pod, you can give us a five star review wherever you get your podcasts. Thank you for listening.

A Little Bit of Rust Goes a Long Way with Android's Jeff Vander Stoep

Latest Posts

Apple’s Memory Integrity Enforcement

Stop Using Encrypted Email with William Woodruff