$ whoami

Ofir Weisse

Systems engineer specializing in kernel, virtualization, and low-level security. Currently building cloud infrastructure at Google.

Get in touch View research

// about

About Me

bio.sh

I'm a systems engineer at Google Cloud on the kernel team, where I focus on cloud infrastructure performance and security. My work sits at the intersection of operating systems, virtualization, and hardware security.

I completed my PhD at the University of Michigan, advised by Thomas Wenisch and Baris Kasikci. My research has uncovered critical hardware vulnerabilities (including the Foreshadow/L1TF attack) and developed novel techniques for secure and efficient systems.

Linux Kernel Virtualization SGX Hardware Security Performance Cloud Infrastructure Systems Research

// experience

Where I've Worked

Systems Engineer @ Google Cloud Current

Working on the kernel team, focusing on cloud infrastructure performance and security.

PhD Intern @ Google Cloud PhD

Conducted Windows/Linux boot experiments. Presented research at osfc.io.

Research Intern @ VMware Research PhD

Optimized TLB shootdown algorithms in the Linux kernel for improved virtualization performance.

Research Intern @ Microsoft Research PhD

Developed security analysis framework for miTLS as part of the Everest project.

Security Researcher @ Intel Pre-PhD

Focused on SGX and BiosGuard security research.

Researcher & TA @ Tel-Aviv University MSc

Conducted research and served as teaching assistant during graduate studies.

// publications

Research & Papers

SoCC 2025

Snap & Replay: A New Way to Analyze Uarch-Scale Performance Bottlenecks for ML Accelerators

Fine-grained methodology for analyzing ML models at machine-code level using hardware simulators. Optimizes TPU workloads, reducing token generation latency by up to 4.1%.

Paper

As models become larger, ML accelerators are a scarce resource whose performance must be continually optimized to improve efficiency. Existing performance analysis tools are coarse-grained, fail to capture model performance at the machine-code level, and often do not provide specific recommendations for optimizations. In addition, existing methodologies are hard to apply in Google's production environment, as they require hardware changes or recompilation. We present SnR, a fine-grained methodology for analyzing ML models at the machine-code level that provides actionable optimization suggestions. It requires no hardware changes and no recompilation.

Our core insight is to use a hardware-level simulator, an artifact of the hardware design process that we can repurpose for performance analysis. Traditionally, these simulators are confidential tools used to improve hardware designs given representative software workloads. However, as a hyperscaler practicing hardware/software co-design, we are uniquely positioned to use them in the opposite direction: to optimize software given fixed hardware architectures. SnR captures traces from production deployments running on accelerators and replays them in a modified microarchitecture simulator to gain low-level insights into the model's performance. We implement SnR for our in-house accelerator (TPU) and used it to analyze the performance of several of our production LLMs, revealing several previously unknown microarchitecture inefficiencies. Leveraging these insights, we optimize a common communication collective by up to 15% and reduce token generation latency by up to 4.1%.

SOSP 2021

ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling

Infrastructure for delegating kernel scheduling decisions to userspace. Enables policy optimization without host reboots, deployed on Google Search and Snap.

Paper

We present ghOSt, our infrastructure for delegating kernel scheduling decisions to userspace code. ghOSt is designed to support the rapidly evolving needs of our data center workloads and platforms. Improving scheduling decisions can drastically improve the throughput, tail latency, scalability, and security of important workloads. However, kernel schedulers are difficult to implement, test, and deploy efficiently across a large fleet. Recent research suggests bespoke scheduling policies, within custom data plane operating systems, can provide compelling performance results in a data center setting. However, these gains have proved difficult to realize as it is impractical to deploy a custom OS image(s) at an application granularity, particularly in a multi-tenant environment, limiting the practical applications of these new techniques.

ghOSt provides general-purpose delegation of scheduling policies to userspace processes in a Linux environment. ghOSt provides state encapsulation, communication, and action mechanisms that allow complex expression of scheduling policies within a userspace agent, while assisting in synchronization. Programmers use any language to develop and optimize policies, which are modified without a host reboot. ghOSt supports a wide range of scheduling models, from per-CPU to centralized, run-to-completion to preemptive, and incurs low overheads for scheduling actions. We demonstrate ghOSt's performance on both academic and real-world workloads, including Google Snap and Google Search. We show that by using ghOSt instead of the kernel scheduler, we can quickly achieve comparable throughput and latency while enabling policy optimization, non-disruptive upgrades, and fault isolation for our data center workloads.

MICRO 2019

NDA: Preventing Speculative Execution Attacks at Their Source

Hardware defense against Meltdown/Spectre attacks by restricting speculative data propagation. Closes 68-96% of the performance gap vs. in-order execution.

Paper

Speculative execution attacks like Meltdown and Spectre work by accessing secret data in wrong-path execution. Secrets are then transmitted and recovered by the attacker via a covert channel. Existing mitigations either require code modifications, address only specific exploit techniques, or block only the cache covert channel. Rather than battling exploit techniques and covert channels one by one, we seek to close off speculative execution attacks at their source.

Our key observation is that these attacks require a chain of dependent wrong-path instructions to access and transmit secret data. We propose NDA, a technique to restrict speculative data propagation. NDA breaks the wrong-path dependence chains required by all known attacks, while still allowing speculation and dynamic scheduling. We describe a design space of NDA variants that differ in the constraints they place on the dynamic scheduling and the classes of speculative execution attacks they prevent. NDA preserves much of the performance advantage of out-of-order execution: on SPEC CPU 2017, NDA variants close 68-96% of the performance gap between in-order and unconstrained (insecure) out-of-order execution.

Conference Presentation

Technical Report 2018

Foreshadow-NG (known as L1TF): Breaking the Virtual Memory Abstraction with Transient Out-of-Order Execution

Extends Foreshadow beyond SGX to attack OS kernels, hypervisors, SMM, and VMs in cloud environments. CVE-2018-3620, CVE-2018-3646.

Paper

While investigating the vulnerability that causes Foreshadow, which Intel refers to as "L1 Terminal Fault", Intel identified two related attacks, which we call Foreshadow-NG. These attacks can potentially be used to read any information residing in the L1 cache, including information belonging to the System Management Mode (SMM), the Operating System's Kernel, or Hypervisor.

Perhaps most devastating, Foreshadow-NG might also be used to read information stored in other virtual machines running on the same third-party cloud, presenting a significant risk to cloud infrastructure. Finally, in some cases, Foreshadow-NG might bypass previous mitigations against speculative execution attacks, including countermeasures for Meltdown and Spectre.

USENIX Security 2018

Foreshadow: Extracting the Keys to the Intel SGX Kingdom

Critical speculative execution vulnerability affecting Intel SGX. Extracts cryptographic keys and forges attestation responses. Affects millions of devices.

Paper

Trusted execution environments, and particularly the Software Guard eXtensions (SGX) included in recent Intel x86 processors, gained significant traction in recent years. A long track of research papers, and increasingly also real-world industry applications, take advantage of the strong hardware-enforced confidentiality and integrity guarantees provided by Intel SGX. Ultimately, enclaved execution holds the compelling potential of securely offloading sensitive computations to untrusted remote platforms.

We present Foreshadow, a practical software-only microarchitectural attack that decisively dismantles the security objectives of current SGX implementations. Crucially, unlike previous SGX attacks, we do not make any assumptions on the victim enclave's code and do not necessarily require kernel-level access. At its core, Foreshadow abuses a speculative execution bug in modern Intel processors, on top of which we develop a novel exploitation methodology to reliably leak plaintext enclave secrets from the CPU cache. We demonstrate our attacks by extracting full cryptographic keys from Intel's vetted architectural enclaves, and validate their correctness by launching rogue production enclaves and forging arbitrary local and remote attestation responses. The extracted remote attestation keys affect millions of devices.

USENIX Security 2018 Presentation

Talk at Duo Security in Ann Arbor

Talk at UC Berkeley

Demo at USENIX Security

ISCA 2017

HotCalls: Turbocharging Enclave Transitions

Fast SGX interface framework providing 13-27x speedup over default enclave transitions. Boosts throughput by 2.6-3.7x for memcached, openVPN, and lighttpd.

Paper

Intel's SGX secure execution technology allows the running of computations on secret data using untrusted cloud servers. While recent work showed how to port applications and large scale computations to run under SGX, the performance implications of using the technology remains an open question. We present the first comprehensive quantitative study to evaluate the performance of SGX.

We show that straightforward use of SGX library primitives for calling functions add between 8,200 - 17,000 cycles overhead, compared to 150 cycles of a typical system call. We quantify the performance impact of these library calls and show that in applications with high system calls frequency, such as memcached, openVPN, and lighttpd, which all have high bandwidth network requirements, the performance degradation may be as high as 79%. We investigate the sources of this performance degradation by leveraging a new set of micro-benchmarks for SGX-specific operations such as entry-calls and out-calls, and encrypted memory I/O accesses. We leverage the insights we gain from these analyses to design a new SGX interface framework, HotCalls: HotCalls provide a 13-27x speedup over the default interface. It can easily be integrated into existing code, making it a practical solution. Compared to a baseline SGX implementation of memcached, openVPN, and lighttpd - we show that using the new interface boosts the throughput by 2.6-3.7x, and reduce application response time by 62-74%.

Linux Kernel Meetup Talk (Hebrew)

European Symposium on Security and Privacy 2017

WALNUT: Waging Doubt on the Integrity of MEMS Accelerometers

Acoustic injection attacks on MEMS accelerometers. 75% of tested sensors vulnerable to output biasing, 65% to total output control. Includes software defenses.

Paper

Embedded systems depend on sensors to make automated decisions. Resonant acoustic injection attacks are already known to cause malfunctions by disabling MEMS-based gyroscopes. However, an open question remains on how to move beyond denial of service attacks to achieve full adversarial control of sensor outputs. Our work investigates how analog acoustic injection attacks can damage the digital integrity of a popular type of sensor in consumer devices: the capacitive MEMS accelerometer.

Spoofing such sensors with intentional acoustic interference enables an out-of-spec pathway for attackers to deliver chosen digital values to microprocessors and embedded systems that blindly trust the unvalidated integrity of sensor outputs. Our contributions include (1) modeling the physics of malicious acoustic interference on MEMS accelerometers, (2) discovering the circuit-level design flaws that cause the vulnerabilities by measuring acoustic injection attacks on MEMS accelerometers as well as systems that depend on the sensors, and (3) two software-only defenses that mitigate many of the risks to the integrity of MEMS accelerometer outputs.

We characterize two classes of acoustic injection attacks: output biasing and output control. We test these attacks against 20 models of capacitive MEMS accelerometers representing the majority of the consumer market. Our experiments find that 75% are vulnerable to output biasing, and 65% are vulnerable to total output control. To illustrate end-to-end implications, we show how to inject fake steps into a Fitbit with a $5 speaker. In our self-stimulating attack, we play a malicious music file from a smartphone's speaker to control the on-board MEMS accelerometer trusted by a local app to pilot a toy RC car.

CHES 2014

A New Framework for Constraint-Based Probabilistic Template Side Channel Attacks

Byte-oriented constraint solver for side-channel cryptanalysis. Extracts secret keys from 1-2 power traces in under 9 seconds with 79%+ success rate.

Paper

The use of constraint solvers, such as SAT- or PseudoBoolean-solvers, allows the extraction of the secret key from one or two side-channel traces. However, to use such a solver the cipher must be represented at bit-level. For byte-oriented ciphers this produces very large and unwieldy instances, leading to unpredictable, and often very long, run times.

In this paper we describe a specialized byte-oriented constraint solver for side channel cryptanalysis. The user only needs to supply code snippets for the native operations of the cipher, arranged in a flow graph that models the dependence between the side channel leaks. Our framework uses a soft decision mechanism which overcomes realistic measurement noise and decoder classification errors, through a novel method for reconciling multiple probability distributions. On the DPA v4 contest dataset our framework is able to extract the correct key from one or two power traces in under 9 seconds with a success rate of over 79%.

HASP 2013

Practical Template-Algebraic Side Channel Attacks with Extremely Low Data Complexity

First practical template-algebraic attack on public dataset. Recovers encryption keys with only 200 offline traces and a single online trace.

Paper

Template-based Tolerant Algebraic Side Channel Attacks (Template-TASCA) were suggested as a way of reducing the high data complexity of template attacks by coupling them with algebraic side-channel attacks. In contrast to the maximum-likelihood method used in a standard template attack, the template-algebraic attack method uses a constraint solver to find the optimal state correlated to the measured side-channel leakage.

In this work we present the first application of the template-algebraic key recovery attack to a publicly available data set (IAIK WS2). We show how our attack can successfully recover the encryption key even when the attacker has extremely limited access to the device under test – only 200 traces in the offline phase and as little as a single trace in the online phase.

MSc Thesis 2014

New Methods for Side Channel Cryptanalysis

Master's thesis combining template attacks with algebraic side-channel methods. Introduces novel constraint solver and probability reconciliation techniques.

Thesis

Template-based Tolerant Algebraic Side Channel Attacks (Template-TASCA) were suggested by Wool et al. in 2012 as a way of reducing the high data complexity of template attacks by coupling them with algebraic side-channel attacks. In contrast to the maximum-likelihood method used in a standard template attack, the template-algebraic attack method uses a constraint solver to find the optimal state correlated to the measured side-channel leakage.

In this work we present a practical application of the template-algebraic key recovery attack to a publicly available data set (IAIK WS2). We show how our attack can successfully recover the encryption key even when the attacker has extremely limited access to the device under test – only 200 traces in the offline phase and as little as a single trace in the online phase. However, to use such solvers the cipher must be represented at bit-level. For byte-oriented ciphers this produces very large and unwieldy instances, leading to unpredictable, and often very long, run times.

In this work we describe a specialized byte-oriented constraint solver for side channel cryptanalysis. The user only needs to supply code snippets for the native operations of the cipher, arranged in a flow graph that models the dependence between the side channel leaks. Our framework uses a soft decision mechanism which overcomes realistic measurement noise and decoder classification errors, through a novel method for reconciling multiple probability distributions. On the DPA v4 contest dataset our framework is able to extract the correct key from one or two power traces in under 9 seconds with a success rate of over 79%.

// education

Academic Background

2015-2019

PhD in Computer Science

University of Michigan

Advised by Thomas Wenisch & Baris Kasikci

2010-2014

MSc in Computer Engineering

Tel-Aviv University

Graduated Cum Laude

2003-2007

BSc in Computer Engineering

Technion, Israel Institute of Technology

Graduated Cum Laude

// contact

Get In Touch

Interested in systems research, security, or just want to chat about low-level engineering? Feel free to reach out.

OWeisse@umich.edu GitHub LinkedIn Google Scholar