Ofir Weisse
Systems engineer specializing in kernel, virtualization, and low-level security. Currently building cloud infrastructure at Google.
About Me
I'm a systems engineer at Google Cloud on the kernel team, where I focus on cloud infrastructure performance and security. My work sits at the intersection of operating systems, virtualization, and hardware security.
I completed my PhD at the University of Michigan, advised by Thomas Wenisch and Baris Kasikci. My research has uncovered critical hardware vulnerabilities (including the Foreshadow/L1TF attack) and developed novel techniques for secure and efficient systems.
Where I've Worked
Working on the kernel team, focusing on cloud infrastructure performance and security.
Conducted Windows/Linux boot experiments. Presented research at osfc.io.
Optimized TLB shootdown algorithms in the Linux kernel for improved virtualization performance.
Developed security analysis framework for miTLS as part of the Everest project.
Focused on SGX and BiosGuard security research.
Conducted research and served as teaching assistant during graduate studies.
Research & Papers
Snap & Replay: A New Way to Analyze Uarch-Scale Performance Bottlenecks for ML Accelerators
Fine-grained methodology for analyzing ML models at machine-code level using hardware simulators. Optimizes TPU workloads, reducing token generation latency by up to 4.1%.
Our core insight is to use a hardware-level simulator, an artifact of the hardware design process that we can repurpose for performance analysis. Traditionally, these simulators are confidential tools used to improve hardware designs given representative software workloads. However, as a hyperscaler practicing hardware/software co-design, we are uniquely positioned to use them in the opposite direction: to optimize software given fixed hardware architectures. SnR captures traces from production deployments running on accelerators and replays them in a modified microarchitecture simulator to gain low-level insights into the model's performance. We implement SnR for our in-house accelerator (TPU) and used it to analyze the performance of several of our production LLMs, revealing several previously unknown microarchitecture inefficiencies. Leveraging these insights, we optimize a common communication collective by up to 15% and reduce token generation latency by up to 4.1%.
ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling
Infrastructure for delegating kernel scheduling decisions to userspace. Enables policy optimization without host reboots, deployed on Google Search and Snap.
ghOSt provides general-purpose delegation of scheduling policies to userspace processes in a Linux environment. ghOSt provides state encapsulation, communication, and action mechanisms that allow complex expression of scheduling policies within a userspace agent, while assisting in synchronization. Programmers use any language to develop and optimize policies, which are modified without a host reboot. ghOSt supports a wide range of scheduling models, from per-CPU to centralized, run-to-completion to preemptive, and incurs low overheads for scheduling actions. We demonstrate ghOSt's performance on both academic and real-world workloads, including Google Snap and Google Search. We show that by using ghOSt instead of the kernel scheduler, we can quickly achieve comparable throughput and latency while enabling policy optimization, non-disruptive upgrades, and fault isolation for our data center workloads.
NDA: Preventing Speculative Execution Attacks at Their Source
Hardware defense against Meltdown/Spectre attacks by restricting speculative data propagation. Closes 68-96% of the performance gap vs. in-order execution.
Our key observation is that these attacks require a chain of dependent wrong-path instructions to access and transmit secret data. We propose NDA, a technique to restrict speculative data propagation. NDA breaks the wrong-path dependence chains required by all known attacks, while still allowing speculation and dynamic scheduling. We describe a design space of NDA variants that differ in the constraints they place on the dynamic scheduling and the classes of speculative execution attacks they prevent. NDA preserves much of the performance advantage of out-of-order execution: on SPEC CPU 2017, NDA variants close 68-96% of the performance gap between in-order and unconstrained (insecure) out-of-order execution.
Foreshadow-NG (known as L1TF): Breaking the Virtual Memory Abstraction with Transient Out-of-Order Execution
Extends Foreshadow beyond SGX to attack OS kernels, hypervisors, SMM, and VMs in cloud environments. CVE-2018-3620, CVE-2018-3646.
Perhaps most devastating, Foreshadow-NG might also be used to read information stored in other virtual machines running on the same third-party cloud, presenting a significant risk to cloud infrastructure. Finally, in some cases, Foreshadow-NG might bypass previous mitigations against speculative execution attacks, including countermeasures for Meltdown and Spectre.
Foreshadow: Extracting the Keys to the Intel SGX Kingdom
Critical speculative execution vulnerability affecting Intel SGX. Extracts cryptographic keys and forges attestation responses. Affects millions of devices.
We present Foreshadow, a practical software-only microarchitectural attack that decisively dismantles the security objectives of current SGX implementations. Crucially, unlike previous SGX attacks, we do not make any assumptions on the victim enclave's code and do not necessarily require kernel-level access. At its core, Foreshadow abuses a speculative execution bug in modern Intel processors, on top of which we develop a novel exploitation methodology to reliably leak plaintext enclave secrets from the CPU cache. We demonstrate our attacks by extracting full cryptographic keys from Intel's vetted architectural enclaves, and validate their correctness by launching rogue production enclaves and forging arbitrary local and remote attestation responses. The extracted remote attestation keys affect millions of devices.
HotCalls: Turbocharging Enclave Transitions
Fast SGX interface framework providing 13-27x speedup over default enclave transitions. Boosts throughput by 2.6-3.7x for memcached, openVPN, and lighttpd.
We show that straightforward use of SGX library primitives for calling functions add between 8,200 - 17,000 cycles overhead, compared to 150 cycles of a typical system call. We quantify the performance impact of these library calls and show that in applications with high system calls frequency, such as memcached, openVPN, and lighttpd, which all have high bandwidth network requirements, the performance degradation may be as high as 79%. We investigate the sources of this performance degradation by leveraging a new set of micro-benchmarks for SGX-specific operations such as entry-calls and out-calls, and encrypted memory I/O accesses. We leverage the insights we gain from these analyses to design a new SGX interface framework, HotCalls: HotCalls provide a 13-27x speedup over the default interface. It can easily be integrated into existing code, making it a practical solution. Compared to a baseline SGX implementation of memcached, openVPN, and lighttpd - we show that using the new interface boosts the throughput by 2.6-3.7x, and reduce application response time by 62-74%.
WALNUT: Waging Doubt on the Integrity of MEMS Accelerometers
Acoustic injection attacks on MEMS accelerometers. 75% of tested sensors vulnerable to output biasing, 65% to total output control. Includes software defenses.
Spoofing such sensors with intentional acoustic interference enables an out-of-spec pathway for attackers to deliver chosen digital values to microprocessors and embedded systems that blindly trust the unvalidated integrity of sensor outputs. Our contributions include (1) modeling the physics of malicious acoustic interference on MEMS accelerometers, (2) discovering the circuit-level design flaws that cause the vulnerabilities by measuring acoustic injection attacks on MEMS accelerometers as well as systems that depend on the sensors, and (3) two software-only defenses that mitigate many of the risks to the integrity of MEMS accelerometer outputs.
We characterize two classes of acoustic injection attacks: output biasing and output control. We test these attacks against 20 models of capacitive MEMS accelerometers representing the majority of the consumer market. Our experiments find that 75% are vulnerable to output biasing, and 65% are vulnerable to total output control. To illustrate end-to-end implications, we show how to inject fake steps into a Fitbit with a $5 speaker. In our self-stimulating attack, we play a malicious music file from a smartphone's speaker to control the on-board MEMS accelerometer trusted by a local app to pilot a toy RC car.
A New Framework for Constraint-Based Probabilistic Template Side Channel Attacks
Byte-oriented constraint solver for side-channel cryptanalysis. Extracts secret keys from 1-2 power traces in under 9 seconds with 79%+ success rate.
In this paper we describe a specialized byte-oriented constraint solver for side channel cryptanalysis. The user only needs to supply code snippets for the native operations of the cipher, arranged in a flow graph that models the dependence between the side channel leaks. Our framework uses a soft decision mechanism which overcomes realistic measurement noise and decoder classification errors, through a novel method for reconciling multiple probability distributions. On the DPA v4 contest dataset our framework is able to extract the correct key from one or two power traces in under 9 seconds with a success rate of over 79%.
Practical Template-Algebraic Side Channel Attacks with Extremely Low Data Complexity
First practical template-algebraic attack on public dataset. Recovers encryption keys with only 200 offline traces and a single online trace.
In this work we present the first application of the template-algebraic key recovery attack to a publicly available data set (IAIK WS2). We show how our attack can successfully recover the encryption key even when the attacker has extremely limited access to the device under test – only 200 traces in the offline phase and as little as a single trace in the online phase.
New Methods for Side Channel Cryptanalysis
Master's thesis combining template attacks with algebraic side-channel methods. Introduces novel constraint solver and probability reconciliation techniques.
In this work we present a practical application of the template-algebraic key recovery attack to a publicly available data set (IAIK WS2). We show how our attack can successfully recover the encryption key even when the attacker has extremely limited access to the device under test – only 200 traces in the offline phase and as little as a single trace in the online phase. However, to use such solvers the cipher must be represented at bit-level. For byte-oriented ciphers this produces very large and unwieldy instances, leading to unpredictable, and often very long, run times.
In this work we describe a specialized byte-oriented constraint solver for side channel cryptanalysis. The user only needs to supply code snippets for the native operations of the cipher, arranged in a flow graph that models the dependence between the side channel leaks. Our framework uses a soft decision mechanism which overcomes realistic measurement noise and decoder classification errors, through a novel method for reconciling multiple probability distributions. On the DPA v4 contest dataset our framework is able to extract the correct key from one or two power traces in under 9 seconds with a success rate of over 79%.
Academic Background
PhD in Computer Science
University of Michigan
Advised by Thomas Wenisch & Baris Kasikci
MSc in Computer Engineering
Tel-Aviv University
Graduated Cum Laude
BSc in Computer Engineering
Technion, Israel Institute of Technology
Graduated Cum Laude
Get In Touch
Interested in systems research, security, or just want to chat about low-level engineering? Feel free to reach out.