Skip to content

Molecular Simulation’s Quantum Moment

 

How AI and HPC are Redefining Accuracy and Scale

Molecular simulation is undergoing a revolution. For decades, scientists have relied on empirical force fields – simplified mathematical models tuned to reproduce experimental data – to simulate the behavior of molecules. These classical models were fast and could handle large systems like proteins or materials, but they came with big limitations: they often can’t handle chemical reactions, like bonds forming or breaking, and their accuracy is capped by the quality of their empirical parameters.

Soon, researchers will be liberated from relying on force fields. But how?

Neural networks are being trained to understand quantum mechanics to a level that’s way beyond force fields and this will push the accuracy of molecular simulations to new heights. This leap has only been made possible due to the collaboration of experts across domains as varied as quantum chemistry, machine learning (ML), and high-performance computing (HPC). Together, they are reshaping how we model the molecular world.

 

Breaking with Tradition: Why a Foundation Model is a Paradigm Shift

Researchers have been hamstrung by traditional force fields which rely on fixed formulas and parameters that must be carefully fine-tuned by humans. Unfortunately, this means the calculations are designed to assume that bonds and atoms don’t change.

This lack of flexibility means you can’t reliably simulate a chemical reaction, like a bond breaking or forming a new molecule. Now, researchers can move away from laborious approaches by using the power of AI and neural networks trained on a vast array of quantum mechanics calculations. The team at Qubit Pharmaceuticals calls this the “Foundation Model”, and it’s a force field with machine learning abilities that directly understands the fundamental physics of electrons.

The technology can model anything from simple vibrations to complex reactions in real time. Essentially, it’s a set of Lego bricks for molecular simulation, which can be reassembled to reflect various situations without the need to return to the start every time.

The system is trained on the best physics we have available. What really sets this approach apart is the accuracy of the training data. Density Functional Theory (DFT) – a workhorse quantum method – was used as a baseline to generate a broad set of molecular configurations. But DFT, while much more accurate than classical force fields, still has known errors. So the team didn’t stop there. In a “Jacob’s Ladder” strategy, they systematically climbed toward higher rungs of accuracy. With massive GPU supercomputers at their disposal, they computed selected cases using Quantum Monte Carlo (QMC) and even multi-determinant configuration interaction (CI) methods – calculations so demanding they were previously impractical for anything beyond tiny systems. Thanks to new exascale computing resources, this is the first time QMC forces and energies (augmented by CI wavefunctions for extra precision) have been computed at such a scale. In other words, they generated an unprecedented quantum-accurate dataset as the foundation for training the model.

The result is a true “foundation” model; FeNNix-Bio1, the team's first foundation model for biomolecular systems. It was trained exclusively on synthetic quantum chemistry data (no experimental fitting here) across multiple levels of theory. This gives it a kind of generality and robustness that traditional models lack – much like large language models learn from heaps of text to capture general patterns of language, FeNNix learned from heaps of quantum calculations to capture general patterns of interatomic forces. Importantly, FeNNix-Bio1 is not tied to a single molecule or protein; it’s meant to be a broad model that can be adapted to many chemical systems. In the authors’ vision, such a model could be applicable well beyond biomolecules – to pharmaceuticals, catalysts, battery materials, and even nuclear chemistry – a truly generalizable force field. For now, they focused on biomolecular chemistry as a proving ground, but the “Bio1” in the name hints that this model is just the beginning (with obvious potential to extend to other domains in the future).

 

copyright - GBCM - CNAM, Xlim - U Limoges CNRS, LCT - Sorbonne université CNRS - using VTX software

 

Climbing the Ladder: From DFT to QMC with Exascale Computing

Achieving this quantum-level accuracy on a broad dataset was no trivial task. The team’s approach leveraged multiple layers of computation to increase accuracy—reminiscent of climbing Jacob’s ladder in quantum chemistry. The team began their analysis at the DFT level, which covers a broad range of molecular structures, and then proceeded to apply QMC and select CI on subsets to achieve near-standard accuracy. QMC, in simple terms, uses stochastic methods to explicitly solve the many-electrons Schrödinger equation very accurately, but it’s computationally intensive. CI methods, on the other hand, explicitly consider many-electrons in all the possible combination/permutation of configurations they simultaneously exist in and can approach exact solutions for small systems. By combining these (using CI to guide QMC), the researchers obtained reference energies and forces of extremely high fidelity.

This is where exascale HPC comes in. Performing QMC on even a single complex molecule can gulp enormous computing power.

The team, however, managed to do those calculations at scale – something never done before. “Handling this ambitious computational pipeline would be impossible without exascale computing resources,” they note, especially for the notoriously heavy QMC force calculations. In fact, by optimizing their codes to run on modern GPU supercomputers, they turned what used to be unimaginable computations into a reality.

They essentially own or co-developed much of the software stack: from the quantum codes (for example, QMC and CI programs) to the machine learning infrastructure and even the molecular dynamics engine. This vertical integration meant they could squeeze out maximum performance at every step. They had to combine their expertise as quantum chemists, software developers, and HPC engineers, a rare combination. The payoff was huge: a rich quantum-mechanical dataset that would serve as the bedrock for the foundation model.

 

Transferring Quantum Accuracy to a Neural Network

With data in hand, the next challenge was to train the FeNNix-Bio1 neural network. But how do you make a single model benefit from both the breadth of DFT data and the precision of QMC data? The answer lies in transfer learning. The idea is elegant: first train the neural network on the large dataset of DFT calculations, so it learns the general landscape of molecular interactions. Then, take the much smaller (but more accurate) set of QMC results and train the model further on the difference between QMC and DFT predictions. Learning this correction, also known as the "delta", enables the model to improve its accuracy consistently towards the QMC level. In effect, the high-fidelity QMC knowledge propagates throughout the model’s understanding of chemistry, even for configurations where QMC data wasn’t explicitly provided.

This transfer learning approach is powerful. It’s like teaching a student with a decent textbook (DFT) and then giving them a set of expert commentaries to refine their knowledge. The model retains the broad coverage learned from thousands of DFT examples and gains a boost in accuracy from the QMC examples. The authors emphasized in the interview that this delta approach was key to reaching beyond-DFT accuracy without incurring the astronomical cost of doing QMC for every point in the dataset. As a result, FeNNix-Bio1 achieves accuracy on par with methods far beyond DFT, while still being fast enough for routine simulation. In the published report, they note that using transfer learning to improve the DFT-based foundation model helped “bridge the gap between highly accurate QC calculations and condensed-phase molecular dynamics”. In practice, such an effort means the model can be dropped into an MD simulation and behave with a level of fidelity previously limited to small-scale quantum calculations.

Another advantage of having a foundation model is that it’s systematically improvable. If tomorrow even better quantum data (say, from a new quantum computer or a better theory) becomes available, the team can further refine or retrain the model. In contrast, empirical force fields often need a complete redesign or manual re-parametrization to improve significantly. Here, the neural network can continuously learn—a true hallmark of a foundational model approach.



What Can It Do? Bond Breaking, Proton Hopping, and Million-Atom Simulations

All this effort would be merely academic if it didn’t enable new feats of simulation. Fortunately, it does – in a big way. The model has allowed the team to run stable molecular dynamics simulations for several nanoseconds, which is long enough to study chemical reactions and biological processes, such as protein folding. The most promising capability is simulating bond formation and breaking, as classical force fields can’t do this because the model will fail. 

FeNNix-Bio1 can both break and form bonds using the laws of physics, with the team specifically demonstrating proton transfer reactions, which is a proton hopping between atoms. This is exactly the kind of process one encounters in enzymes, in drug interactions, or in radiation damage to DNA – and now they can be modeled at scale with confidence in the accuracy of the forces involved.

One million atom test case: The paper’s author simulated a scenario that was unthinkable a few years ago. They took a fully solvated tobacco mosaic virus, which has about a million atoms, and captured quantum effects such as proton tunneling. The model captured complex chemical interactions, like protonation changes and bond alterations, through this stimulation environment, which included the real virus, the RNA genetic material, water, and ions. 


Here’s what FeNNix-Bio1 has the potential to achieve:

  • General-purpose accuracy: Trained on physics, not fitted to one system, so it works for a variety of molecules/environments within the biomolecular realm (water, ions, proteins, etc.).
  • Reactive molecular dynamics: Can simulate chemical reactions (bond breaking/forming, proton transfer) during molecular dynamics— a regime that classical force fields can’t reach.
  • Large-scale systems: On modern supercomputers, they can handle simulations with 100,000+ atoms, even at the million-atom scale.
  • Long timescales: Stable for nanoseconds of physical time, which is long enough to study many biologically relevant processes.
  • Quantum nuclear effects: Incorporates path integral methods to include the quantum behavior of atoms like hydrogen, improving realism for processes like proton hopping.
  • Transferability: This model is capable of handling a wide range of scenarios, including water properties, ion solvation, protein dynamics, folding free energy landscapes, protein–ligand binding, and more, all within a single, unified framework.
    Validated hydration free energies: Closely matches experimental values for challenging systems like ion solvation without the need for manual tuning.
  • Seamless AlphaFold-like techniques Integration: Compatible with prediction models, like AlphaFold, allowing for rapid quantum-accurate simulations of protein dynamics and ligand-binding interactions.
  • Outperforms benchmarks: Beats current leading models in benchmarks relating to speed and can simulate molecular systems of a million atoms while maintaining quantum accuracy.

 

copyright LCT - Sorbonne Université, CNRS

 

The Team and the Culture Shift: Bridging Silos in Science

The project required an unusual team that spanned traditionally separate fields. The authors and collaborators include experts in quantum chemistry, machine learning, and HPC.

In essence, it brought together people who know how to get accurate electronic structure data, those who design and train neural networks and coders who can optimize supercomputers to handle big simulations.

They also had domain experts in biochemistry and physics to ensure the problems tackled are relevant to biomolecular science. This kind of cross-disciplinary effort is still relatively rare. As one of the team members noted, “We had people who normally might be found in completely different conferences working together – quantum chemists, protein modelers, HPC architects – speaking a common language of this project.” In a sense, the project is not only a technical breakthrough but a cultural one.

Historically, disciplines such as materials science, drug discovery, and HPC have operated independently of each other. Computational chemistry experts might develop high-accuracy methods but only apply them to small molecules or materials; meanwhile, pharmaceutical simulation experts use fast but low-accuracy models for giant biomolecules. Then HPC teams focus on benchmarks and infrastructure.

Here, these threads come together. The outcome is a demonstration of what can be achieved when these communities collaborate: they create a tool that has the potential to revolutionise various fields. A foundational model trained on quantum physics can be considered a unifying platform— a bit like a Lego set of molecular interactions that anyone in any domain can take and build upon. If a researcher needs a better force field for an enzyme reaction, then you can fine-tune the foundation model. Other specialists can extend the model and supply some quantum data for a new battery chemistry if they wish to simulate it.

This bridging of communities lowers barriers – a materials scientist could leverage the model the biochemists built, and vice versa, because the underlying physics is the same.

The team’s control over the software stack also made a difference. They weren’t just users of tools – in many cases they were the developers. For instance, some of the authors help develop QMC simulation codes; others develop the Tinker-HP molecular dynamics platform and other quantum chemistry packages, and others are involved in AI/ML frameworks. Such an arrangement meant everything could be tailored to work together seamlessly. It’s a stark contrast to the siloed scenario where one group might produce data, another separately tries to train an ML model, and a third struggles to use it in a simulation. Instead, it was a concerted effort.

As more supercomputers emerge, this work is a signal of the future of how they can be used not just to run bigger simulations but also to generate better models. It’s a wonderful synergy: HPC provides the raw power, quantum chemistry provides the accuracy, and machine learning provides the generalization and scalability.

 

Approaching a New Era of Simulation

In sum, “Pushing the Accuracy Limit of Foundation Neural Network Models with Quantum Monte Carlo Forces and Path Integrals” lives up to its title. It pushes far beyond the status quo, showing that we don’t have to accept the trade-off between accuracy and scale in molecular simulations. By investing compute time upfront to train a foundation model on the best data available, we gain a tool that can simulate phenomena once thought too complex for realistic modeling.

These developments could herald a new era wherein in silico experiments complement or even replace some in vitro experiments – researchers can test hypotheses on the computer with higher confidence in the results.

The implications stretch across science and engineering: pharma companies could use such models to predict drug binding and reactivity in the body with fewer approximations; materials scientists could explore catalytic reactions or battery electrolyte breakdown with chemical accuracy; chemical engineers could design processes knowing the simulations capture even subtle quantum effects; and in fundamental science, we might finally simulate things like enzyme catalysis or virus assembly with the same accuracy we currently reserve for tiny molecules. One might even say this work helps move molecular modeling from an art (with expert-crafted force fields and lots of intuition) to more of a science, where a unified model trained on fundamental principles can be applied broadly.

There's still much to do. Expanding the foundation model’s training to, say, inorganic chemistry or materials will require additional data and possibly new training methods. While the computational cost is mitigated, it's still very high when it comes to generating the best quantum data. However, the authors have demonstrated a blueprint of how to break the accuracy barrier by using the best HPC to teach AI quantum chemistry and then unleashing that AI to explore molecular worlds.

They’ve also shown that the model can help bridge communities. Here quantum chemists, AI experts and molecular engineers are working hand in hand, whereas before they would struggle to speak each other’s language. It’s not just a better force field; it’s a demonstration of how we can build better scientific models in the 21st century by combining strengths across domains. And for those of us eager to understand and engineer the molecular machinery of life (and beyond), it’s a very exciting development indeed.

 

Access the full paper here