I am a final year computer science Ph.D. student at The University of Texas at Austin co-advised by David Soloveichik and Scott Aaronson. I am broadly interested in theoretical computer science with a focus on time-space lower bounds for classical and quantum computation as well as constructions, algorithms and impossibility results for quantum, reversible, molecular, and other kinds of unconventional computational models. Recently I have also found interest in exploring connections between stochastic thermodynamics and computation.
After finishing my PhD, I am excited to start a postdoc at Sandia National Laboratories as a Truman Fellow.
My primary non-academic interests include eurogames, competitive card games, and hiking.
You can reach me at nielskorneruputexas.edu.
As is standard in theoretical computer science, the authors of all papers are listed in alphabetical order. Publications are listed in reverse chronological order of their most recent versions.
Versions: [TOTC 2025] [ICALP 2023] [arXiv 2023]
Cumulative memory --- the sum of space used per step over the duration of a computation --- is a fine-grained measure of time-space complexity that was introduced to analyze cryptographic applications like password hashing. It is a more accurate cost measure for algorithms that have infrequent spikes in memory usage and are run in environments such as cloud computing that allow dynamic allocation and de-allocation of resources during execution, or when many instances of an algorithm are interleaved in parallel.
We prove the first lower bounds on cumulative memory complexity for both sequential classical computation and quantum circuits. Moreover, we develop general paradigms for bounding cumulative memory complexity inspired by the standard paradigms for proving time-space tradeoff lower bounds that can only lower bound the maximum space used during an execution. The resulting lower bounds on cumulative memory that we obtain are just as strong as the best time-space tradeoff lower bounds, which are very often known to be tight.
Although previous results for pebbling and random oracle models have yielded time-space tradeoff lower bounds larger than the cumulative memory complexity, our results show that in general computational models such separations cannot follow from known lower bound techniques and are not true for many functions.
Among many possible applications of our general methods, we show that any classical sorting algorithm with success probability at least $1/\text{poly}(n)$ requires cumulative memory $\tilde \Omega(n^2)$, any classical matrix multiplication algorithm requires cumulative memory $\Omega(n^6/T)$, any quantum sorting circuit requires cumulative memory $\Omega(n^3/T)$, and any quantum circuit that finds $k$ disjoint collisions in a random function requires cumulative memory $\Omega(k^3n/T^2)$.
Versions: [STOC 2024] [Video Recording] [arXiv 2024]
We consider the time and space required for quantum computers to solve a wide variety of problems involving matrices, many of which have only been analyzed classically in prior work. Our main results show that for a range of linear algebra problems --- including matrix-vector product, matrix inversion, matrix multiplication and powering---existing classical time-space tradeoffs, several of which are tight for every space bound, also apply to quantum algorithms with at most a constant factor loss. For example, for almost all fixed matrices $A$, including the discrete Fourier transform (DFT) matrix, we prove that quantum circuits with at most $T$ input queries and $S$ qubits of memory require $T=\Omega(n^2/S)$ to compute matrix-vector product $Ax$ for $x \in \{0,1\}^n$. We similarly prove that matrix multiplication for $n\times n$ binary matrices requires $T=\Omega(n^3 / \sqrt{S})$. Because many of our lower bounds are matched by deterministic algorithms with the same time and space complexity, our results show that quantum computers cannot provide any asymptotic advantage for these problems with any space bound.
We obtain matching lower bounds for the stronger notion of quantum cumulative memory complexity---the sum of the space per layer of a circuit. We also consider Boolean (i.e. AND-OR) matrix multiplication and matrix-vector products, improving the previous quantum time-space tradeoff lower bounds for $n\times n$ Boolean matrix multiplication to $T=\Omega(n^{2.5}/S^{1/4})$ from $T=\Omega(n^{2.5}/S^{1/2})$.
Our improved lower bound for Boolean matrix multiplication is based on a new coloring argument that extracts more from the strong direct product theorem that was the basis for prior work. To obtain our tight lower bounds for linear algebra problems, we require much stronger bounds than strong direct product theorems. We obtain these bounds by adding a new bucketing method to the quantum recording-query technique of Zhandry that lets us apply classical arguments to upper bound the success probability of quantum circuits.
Versions: [Quantum 2025] [arXiv 2025]
Pebble games are popular models for analyzing time-space trade-offs. In particular, reversible pebble game strategies are frequently applied in quantum algorithms like Grover's search to efficiently simulate classical computation on inputs in superposition, as unitary operations are fundamentally reversible. However, the reversible pebble game cannot harness the additional computational power granted by intermediate measurements, which are irreversible. The spooky pebble game, which models interleaved Hadamard basis measurements and adaptive phase corrections, reduces the number of qubits beyond what purely reversible approaches can achieve. While the spooky pebble game does not reduce the total space (bits plus qubits) complexity of the simulation, it reduces the amount of space that must be stored in qubits. We prove asymptotically tight trade-offs for the spooky pebble game on a line with any pebble bound. This in turn gives a tight time-qubit tradeoff for simulating arbitrary classical sequential computation when using the spooky pebble game. For example, for all $\epsilon \in (0,1]$, any classical computation requiring time $T$ and space $S$ can be implemented on a quantum computer using only $O(T/ \epsilon)$ gates and $O(T^{\epsilon}S^{1-\epsilon})$ qubits. This improves on the best known bound for the reversible pebble game with that number of qubits, which uses $O(2^{1/\epsilon} T)$ gates. For smaller space bounds, we show that the spooky pebble game can simulate arbitrary computation with $O(T^{1+\epsilon} S^{-\epsilon}/\epsilon)$ gates and $O(S / \epsilon)$ qubits whereas any simulation via the reversible pebble game requires $\Omega(S \cdot (1+\log(T/S)))$ qubits.
We also consider the spooky pebble game on more general directed acyclic graphs (DAGs), capturing fine-grained data dependency in computation. We show that for an arbitrary DAG even approximating the number of required pebbles in the spooky pebble game is PSPACE-hard. Despite this, we are able to construct a time-efficient strategy for pebbling binary trees that uses the minimum number of pebbles.
Versions: [arXiv 2024]
The key factor currently limiting the advancement of computational power of electronic computation is no longer the manufacturing density and speed of components, but rather their high energy consumption. While it has been widely argued that reversible computation can escape the fundamental Landauer limit of $k_B T\ln(2)$ Joules per irreversible computational step, there is disagreement around whether indefinitely reusable computation can be achieved without energy dissipation. Here we focus on the relatively simpler context of sampling problems, which takes no input, and so avoids modeling the energy costs of the observer perturbing the machine to change its input. Given an algorithm $A$ for generating samples from a distribution, we desire a device that can perpetually generate samples from that distribution driven entirely by Brownian motion. We show that such a device can efficiently execute algorithm $A$ in the sense that we must wait only $O(\text{time}(A)^2)$ between samples. We consider two output models: Las Vegas, which samples from the exact probability distribution every $4$ tries in expectation, and Monte Carlo, in which every try succeeds but the distribution is only approximated. We base our model on continuous-time random walks over the state space graph of a general computational machine, with a space-bounded Turing machine as one instantiation. The problem of sampling a computationally complex probability distribution with no energy dissipation informs our understanding of the energy requirements of computation, and may lead to more energy efficient randomized algorithms.
Versions: [TCBB 2019] [CMSB 2018] [Thesis 2020] [arXiv 2020]
Biological regulatory networks depend upon chemical interactions to process information. Engineering such molecular computing systems is a major challenge for synthetic biology and related fields. The chemical reaction network (CRN) model idealizes chemical interactions, allowing rigorous reasoning about the computational power of chemical kinetics. Here we focus on function computation with CRNs, where we think of the initial concentrations of some species as the input and the equilibrium concentration of another species as the output. Specifically, we are concerned with CRNs that are rate-independent (the computation must be correct independent of the reaction rate law) and composable ($f\circ g$ can be computed by concatenating the CRNs computing $f$ and $g$). Rate independence and composability are important engineering desiderata, permitting implementations that violate mass-action kinetics, or even ``well-mixedness", and allowing the systematic construction of complex computation via modular design. We show that to construct composable rate-independent CRNs, it is necessary and sufficient to ensure that the output species of a module is not a reactant in any reaction within the module. We then exactly characterize the functions computable by such CRNs as superadditive, positive-continuous, and piecewise rational linear. Thus composability severely limits rate-independent computation unless more sophisticated input/output encodings are used.
Versions: [arXiv 2019]
We looked into the algorithm for calculating Betti numbers presented by \cite{LGZ} (LGZ). We present a new algorithm in the same spirit as LGZ with the intent of clarifying quantum algorithms for computing Betti numbers. Our algorithm is simpler and slightly more efficient than that presented by LGZ. We present a thorough analysis of our algorithm, pointing out reasons that both our algorithm and that presented by LGZ do not run in polynomial time for most inputs. However, the algorithms could run in polynomial time for calculating an approximation of the Betti number to polynomial multiplicative error when applied to some class of graphs for which the Betti number is exponentially large, if a good lower bound on the eigenvalues of a particular matrix can be found.