Share This Article:

Researchers Give Artificial Intelligence Failing Grade in use by Employees
14 Oct, 2025 Chriss Swaney

The Trained A-Eye
A recent study by Carnegie Mellon University, conducted in collaboration with Salesforce, has found that artificial intelligence agents are currently unable to reliably complete most office tasks, with failure rates approaching 70 percent.
“It does not surprise me because I have only used artificial intelligence (AI) for some office editing, ‘’ said William Barber, a film editor and consultant from Washington, Pa.
The research which evaluated AI models from leading tech firms in a simulated company environment, suggests enterprise AI deployments remain constrained by fundamental technical and practical shortcomings.
Roy Burkowski, a Woodworker from Claysville, Pa. said he has tried to use AI tools to catalogue his many woodworking projects but found it too technical and burdensome. “I simply chart out my work now on my computer spreadsheet in Word,’’ Burkowski said.
So frustrated by AI and all the hoopla about it, Burkowski said he may even revert back to his old Underwood typewriter to track all his projects that include handmade tables, chairs and walking canes.
Carnegie Mellon researchers created a simulated technology company fully staffed by AI agents using models from OpenAI, Google, Anthropic and Amazon. The environment was designed to replicate a typical office, featuring roles such as CTO, HR manager and engineer, with multiple everyday tasks drawn from finance, administration and engineering. Agents were tested on their ability to work independently , using Internet chat, company handbooks and websites – without human intervention.
The research results revealed that no AI agent could complete more than 24 percent of its assigned tasks. Anthropic’s Claude 3.5 Sonnet was the best performer, managing a 24 percent success rate, while Google’s Gemini achieved 11 percent and Amazon’s Nova recorded just 1.7 percent. Even basic assignments, such as closing pop-up windows or finding the right colleague, frequently confused the agents, according to the Carnegie Mellon/Salesforce research. And tasks typically took dozens of steps and sometime resulted in agents making errors, like renaming colleagues to get desired outcomes.
The agents not only failed standard office tasks but also illustrated deeper shortcomings. The research showed that the AI agents also became confused, fabricated information, or made poor decisions that a human would likely avoid. Common failures included struggling to navigate basic digital interfaces, misunderstanding task instructions and lacking common sense.
Industry analysts also find that workers are less productive when dealing with AI in the office. A recent market survey of small tech companies found that unhelpful AI- generated work is having a significant impact on work collaboration.
Sue Merck, an office manager at a small auto parts supplier in Cleveland, Ohio also found AI- generated work less creative and less reliable.
Still, recent Gallup surveys found AI work is accelerating. In the past two years, the percentage of U.S workers using AI jumped from 21 percent to 40 percent. Frequent AI use (a few times a week or more) has also nearly doubled from 11 percent to 19 percent since Gallup’s first measure in 2023. Daily use has doubled in the past 12 months alone, from 4 percent to 8 percent.
AI california case file caselaw case management case management focus claims compensability compliance compliance corner courts covid do you know the rule exclusive remedy florida FMLA glossary check Healthcare hr homeroom insurance insurers iowa leadership medical NCCI new jersey new york ohio pennsylvania roadmap Safety safety at work state info tech technology violence WDYT west virginia what do you think women's history women's history month workers' comp 101 workers' recovery Workplace Safety Workplace Violence
Read Also
- Oct 14, 2025
- Frank Ferreri
- Oct 13, 2025
- Liz Carey
About The Author
About The Author
-
Chriss Swaney
Chriss Swaney is a freelance reporter who has written for Antique Trader Magazine, Reuters, The New York Times, U.S. News & World Report, the Burlington Free Press, UPI, The Tribune-Review and the Daily Record.
More by This Author
Read More
- Oct 14, 2025
- Frank Ferreri
- Oct 13, 2025
- Liz Carey
- Oct 13, 2025
- Frank Ferreri
- Oct 12, 2025
- Chris Parker
- Oct 11, 2025
- Liz Carey