AI, Change Management, Japan, Technology, Uncategorized

Who’s Watching the Agents: Why Observability is the Management Challenge of the Agentic Era

As enterprises deploy hundreds or thousands of AI agents across their organizations, a critical question emerges: does anyone know what they are actually doing?

That was a common question raised at the AWS Summit Japan, held on June 25-26 in very far away Makuhari Messe, Chiba.

The answer I heard again and again was “Observability,” a term I hadn’t given much thought to. But now that large enterprises are deploying agents across all departments and businesses in volume, they need ways of tracking them.

If you’ve watched enough medical dramas, you would know that hospitals need to track every medication dispensed, every procedure ordered, and every medical professional involved in a patient’s care.

With the proliferation of so many agents doing so many different things, enterprises need a comprable record for their digital workers. Who did what, when, why, and based on what information. An observability system is like a manager who never sleeps, watching every agent execute every task at every moment.

Here’s four reasons why you need an agent management system that never sleeps.

Seeing inside the black box: The agent finished the task. But you need to see how it got there, every step it took, every tool it used, every database it queried, every other agent it consulted along the way. Without being able to “observe” what’s going on, finding the root cause of errors and hallucinations becomes guesswork.

Yuki Ogawa, tech lead in the AI COE of Mitsubishi Electric, explained that they use AWS tools to investigate errors and alerts automatically, reducing the time required to investigate and resolve problems, but that challenges in response accuracy remain.

“Model evolution and growing RAG data mean we need a consistent, ongoing process for evaluating agent quality,” said Ogawa. “We use multiple methods, including user evaluations, keyword extraction, RAG retrieval checks, and agent-level evaluations. For evaluation, we have begun operating a question-and-answer evaluation set reviewed by domain experts such as veteran engineers.”

Keeping the house in order: Left unmonitored, enterprises end up with dozens of agents doing roughly the same job, built by different teams who did not know the other existed. Imagine if those similar agents are also similarly inefficient and error-prone: that’s wasted money and risk at scale.

Taichi Hirano is the senior architect of the AI Acceleration Division at Sony. He explained that creating a single agent is no longer so difficult, that the hard part is keeping everything running safely across the organization and improving it continuously. After building agents and tools, he said that the most import point is cataloguing the agents, registering them in an organizational catalog where they can be discovered and reused.

Slide from Sony Group’s presentation

Managing cost: Every agent involves a tradeoff. How accurate do you need it to be? How fast? And what are you willing to pay? Observability gives you the data to make those tradeoffs deliberately rather than by accident, and to track whether performance is getting better or worse over time.

AWS AIML specialist, Yumiko Kanasugi said in her talk that initially cost may not be a major concern at the POC or prototype stage. But when you get to the production stage, “thousands or tens of thousands of users may use it, inputs become unpredictable, and cost, security, and compliance all matter.”

“Operational excellence means being able to run an agent reliably in production at scale,” she said. “You need visibility into what is happening, control of cost, performance under load, and organization-wide governance. Consider a common complaint: ‘The agent is slow.’ Is the cause the model, a tool call, or a data-retrieval issue? Observability is what lets you answer that question.”

Creating a paper trail: When something goes wrong, or when a regulator asks questions, you need more than a log of outputs. You need to know which version of the model was running, which instructions it was given, and which data it drew from. In an era of tightening AI regulation, documenting the trail is critical.

Hundreds of agents are already running inside large Japanese enterprises. Most organizations cannot tell you exactly how many there are, what they are doing, or whether they are performing well.

Thus observability is not a nice to have. It is a management necessity that makes agentic AI governable at scale.

And so here is the irony. The more we give tasks and decision to AI agents, the more demand there is for human oversight.

Observability is how we humans stay in the loop.

Leave a Reply