AVS 71 Session AIML-ThP: AI/ML for Scientific Discovery Poster Session

Thursday, September 25, 2025 4:30 PM in Ballroom BC
Thursday Evening

Time Period ThP Sessions | Topic AIML Sessions | Time Periods | Topics | AVS 71 Schedule

AIML-ThP-1 AI Agents for Semiconductor Processing: A New Benchmark for Autonomous Materials Synthesis
Angel Yanguas-Gil (Argonne National Laboratory)

Over the past year there has been an increasing interest in leveraging foundation and large language models to design AI agents that can interact withexperiments to solve materials science and synthesis problems. One of the challenges of this approach is that testing the performance of these agents require access to automated labs. In contrast to benchmarks testing abilities such as knowledge, math skills, or reasoning, there is a lack of benchmarks that can help both design and evaluate agents without the access to dedicated experimental facilities.

In this work, we introduce Semibench, a benchmark to evaluate AI agents' ability to operate and solve synthesis challenges in the context of semiconductor processing. This benchmark introduces two core ideas: first, it introduces virtual tools that simulate the output of real life experiments. This allows us to test an agent's ability to solve a wide range of challenges involving different tool configurations, amount and nature of information that is accessible, and process complexity. Second, it focuses on the concept of microtasks, challenges designed to have a unique solution. This allows us to define quantitative performance metrics for the agent based on how far the proposed solution is to the ground truth. For Semibench, we have focused on three different techniques that are commonly used in the context of microelectronics: atomic layer deposition, sputtering, and reactive ion etching. For each challenge in the benchmark, agents are exposed to a collection of virtual tools and asked to solve specific questions by providing a sequence of synthesis steps. These steps involve selecting the right configurations for each of the tools, such as the precursor channels in the case of ALD, or the targets and power for sputtering, or the etching recipe for RIE.

We have applied this benchmark to agents based on leading large language models, such as OpenAI's o1 an o3 family of reasoning models. The results show that these agents can correctly identify the sequence of steps in a wide range of conditions. However, they struggle when they need to use quantitative data that is not provided explicitly to solve these challenges. These results provide useful information about how to design useful models and their limitations for thin film applications.
Time Period ThP Sessions | Topic AIML Sessions | Time Periods | Topics | AVS 71 Schedule