Tests and evaluates autonomous AI agents for accuracy, safety, and expected behavior across complex scenarios. Designs evaluation pipelines and red-teaming strategies specific to agentic AI systems.