
Trustrorthy AI
Knowledge Base
Reliability and Safety
AI workloads and their underlying infrastructure, models, and use of data must be reliable and safe in any scenario into which they are deployed. This principle emphasizes the importance of building AI systems that are dependable and secure, capable of functioning correctly under diverse and unforeseen circumstances. It also involves rigorous testing and continuous monitoring to prevent failures and mitigate risks.
For example, AI is increasingly used in healthcare for diagnostic purposes. To ensure reliability and safety, a hospital might implement an AI-based diagnostic tool and conduct extensive testing in controlled environments before full deployment. Continuous monitoring and updates ensure that the tool performs accurately and safely in real-world medical scenarios.
Software was nearly entirely deterministic prior to the advent of generative AI, which is to say that its programming provided for a specific number of defined outcomes from any input it was provided. When new lead is created, check if contact exists. If contact does not exist, create contact.
Deterministic programs can be tested for every possible outcome, because the outcomes can be quantified and defined.
Modern AI is largely non-deterministic, meaning that the program chooses its own path, its own adventure if you will, each time that it is run. Responses, even to an identical prompt, vary each time the prompt is given and the response is generated.
Let’s illustrate this non-deterministic phenomenon with an innocent example.
In writing this chapter, we provided Microsoft Copilot with the following simple prompt:
Please paint me a picture of a lighthouse.
Copilot returned a response several seconds later, generated by Microsoft Designer using DALL-E 3.
We then repeated the same prompt, only to have Copilot return a different generative image.
Figure 25: The first painting of a lighthouse that Microsoft Copilot returned.
Figure 26: Copilot then returned a different image when prompted again just a minute later.
All of which is to say that reliability and safety - and really all five RAI dimensions - are not things that can be deterministically tested for in advance and then left to run on their own. RAI requires ongoing monitoring, correction and tuning, and testing again to produce responses that are ever more aligned to RAI principles. It also requires risk tolerance to the reality that AI will make it mistakes, it will produce irresponsible responses. All the more reason for the organization’s collective digital literacy to be attended to, so that humans are able to recognize these errors and take part in continually refining the AI workloads with which they interact.