Many research reports cite that 80%–85% of GenAI pilots fail, and Gartner predicts that over 40% of Agentic AI projects will be canceled by the end of 2027 [1].
Organizations have started building AI solutions — especially AI agents — to serve specific purposes, and in many cases, multiple agents are being developed. However, these solutions are often built in silos. While they may work well in proof-of-concept (PoC) stages, they struggle when pushed into production.
The biggest question that emerges on a scale is trust.
Trust First: Security, Governance, and Cost Control
Before scaling AI, we must make it safe, compliant, and with predictable cost. It means:
- PII and sensitive data never leaves our controlled boundary
- Every query is auditable and attributable
- Cost per interaction is enforced by design, not monitored after the bill arrives
Why this matters:
Without trust, AI adoption stops at the first incident. To gain trust we need to avoid hallucination, confabulation and we can make some action to monitor and govern. I was working with Databricks to monitor and govern the agent. With step by step process I will share how you can setup AI governance in Databricks.
AI Governance with Databricks
Once you deploy your model as a serving endpoint, you can configure the Databricks AI Gateway, which includes:
• Input and Output guardrails
• Usage monitoring
• Rate limiting
• Inference tables
This setup is illustrated in Figure 1.

AI Guardrails: Filter input & output to prevent unwanted data, such as personal data or unsafe content. Figure 2 demonstrates how guardrails protect both input and output.

Inference tables: Inference tables track the data sent to and returned from models, providing transparency and auditability over model interactions. Refer to Figure 3 for inference table, where you can assign the inference table.

Rate Limiting: Enforce request rate limits to manage traffic for the endpoint. As shown in fig 4, you can limit number queries and number of token for the particular serving end point.

In above Fig 4, QPM stands for number of queries the end point can process per minute and TPM stands for number of tokens the endpoint can process per minute
Usage Tracking: Databricks provides built-in usage tracking so you can monitor:
• Who is calling the APIs or model endpoints
• How frequently they are used

Usage data is available in the System table: system.serving.endpoint_usage
In summary, Databricks offers the tools needed to build trusted and governed Agentic AI solutions, helping you land on the 20% success side of AI adoption.
MIT report: 95% of generative AI pilots at companies are failing | Fortune
No comments:
Post a Comment