Location
Wolff Auditorium, Jepson Center
Start Date
4-4-2025 10:45 AM
End Date
4-4-2025 11:30 AM
Description
Chaired by Graham Morehead, Ph.D. (Gonzaga University)
Modern AI systems are omnipresent and serve numerous use cases in diverse commercial and other sectors. Sensitive applications are not uncommon, and human values play an important role in nurturing AI systems that are mindful of ethical interplays in their reasoning and eventual outcomes.
Human beings are complex creatures, and human values are intricate and multifaceted, often with tensions between two or more values. Helpfulness, harmlessness, and honesty are some of the innate qualities that make a human being human. These desirable qualities may often present conflicting scenarios in which it becomes difficult, even for a human being, to keep everyone happy by maximizing each one of them. For example, it might make sense not to tell a layman the steps to building an explosive device. By being less helpful, a human increases safety. However, the same information could be very useful to someone requiring this expertise and working with law enforcement. Therefore, the surrounding context could deem the same outcome both harmful and unhelpful.
The contextual nature of harmfulness becomes more evident when competing entities are involved, even where there is a difference of opinion. For example, with geopolitical scenarios, wherein entities watch out for themselves but at the same time work towards the joint goal of progressing humanity and preserving the environment, or even between fractions of people debating the benefits of free speech as opposed to censorship, we can see that values such as harmfulness, helpfulness, morality can unwittingly yet inevitably become a function of perspective.
Similarly, suppose someone wants to learn how to hide money from the IRS. In that case, an AI system can cause serious harm by being honest and helpful if it provides methods to hide money and encourages illegal activities. Navigating such intricacies is complex enough for a human being, let alone an AI system that has internalized general patterns about the open world and the curated datasets used for training it.
AI models are, by construction, good at creating an illusion of novelty. Conversations with modern AI-based chatbots often feel very real. However, upon close inspection, we will find that these models are, in fact, incapable of “new intelligence” and, in fact, will absolutely fail when expected to do a task far from what it has seen before. For example, if human beings experience a new verbally expressed emotion or a unique way of speaking, we would have to move mountains and label a lot of data, before we can satisfactorily adapt an AI model’s abilities along these dimensions. This limitation is not unexpected, and it makes more sense when one starts thinking of an AI model as a manifestation of the past patterns, nuances, and quirks of the real world. The actual value is realized only when a human interacts with these systems and holds its hands through the creation of novelty.
With the realization that AI is a highly condensed yet operationally sound form of all the data in the world, we conclude that the reasoning and final decision/output of an AI model, be it desirable or not, is attributable to both the model itself as well as to how the model was built and used. While the extent to which AI and AI systems are humanized today is unfortunate, it behooves us to treat them and consider them as just another inanimate tool. The values considered and adopted both in training AI models, embedding them in sophisticated decision-making steps, and finally in how they are used all play a significant role in ensuring that the realized outcomes align with human values and are justified despite the natural tensions at play between these values.
People building models, such as scientists, statisticians and engineers, are often removed from the intricacies of large datasets and are more interested in the theoretical abstractions that eventually provide reliable guarantees for applications that deliver real-world impact. As a result, considerations around the ethicality of outcomes become almost an afterthought. Instead, we need to understand that the ethics of AI outcomes are a key component of the impact delivered, and therefore, it makes complete sense to begin considering this right from the get-go by naturalizing this as part of the process and instilling human values into the data labeling and model building exercise early on.
The human species’ beauty is constantly searching for answers to difficult new questions and venturing outside its comfort domain onto less-trodden paths. For example, we will examine several such difficult questions and examples related to special considerations for self-driving cars, Simpson’s paradox for cholesterol vs diabetes and the value of knowing age groups, or even fairness or the lack of it when AI models consider sensitive attributes such as gender and race in the process of loan approvals. Furthermore, does it make more sense to unlearn or realign models to be more “safe,” or is it more desirable to control this behavior and employ this controllability as appropriate in different scenarios?
This continual collaborative effort, filled with interactions between diverse mindsets and their openness to agree to disagree, furthers the general agenda of progress and innovation for the human race. Responsibly built AI systems will walk alongside this evolving data generation process and, in conjunction with their users, deliver preferable and ethical experiences.
Recommended Citation
Basu, Debraj, "The Ethical Architecture of AI: Embedding Human Values from the Start" (2025). Value and Responsibility in AI Technologies. 8.
https://repository.gonzaga.edu/ai_ethics/2025/general/8
The Ethical Architecture of AI: Embedding Human Values from the Start
Wolff Auditorium, Jepson Center
Chaired by Graham Morehead, Ph.D. (Gonzaga University)
Modern AI systems are omnipresent and serve numerous use cases in diverse commercial and other sectors. Sensitive applications are not uncommon, and human values play an important role in nurturing AI systems that are mindful of ethical interplays in their reasoning and eventual outcomes.
Human beings are complex creatures, and human values are intricate and multifaceted, often with tensions between two or more values. Helpfulness, harmlessness, and honesty are some of the innate qualities that make a human being human. These desirable qualities may often present conflicting scenarios in which it becomes difficult, even for a human being, to keep everyone happy by maximizing each one of them. For example, it might make sense not to tell a layman the steps to building an explosive device. By being less helpful, a human increases safety. However, the same information could be very useful to someone requiring this expertise and working with law enforcement. Therefore, the surrounding context could deem the same outcome both harmful and unhelpful.
The contextual nature of harmfulness becomes more evident when competing entities are involved, even where there is a difference of opinion. For example, with geopolitical scenarios, wherein entities watch out for themselves but at the same time work towards the joint goal of progressing humanity and preserving the environment, or even between fractions of people debating the benefits of free speech as opposed to censorship, we can see that values such as harmfulness, helpfulness, morality can unwittingly yet inevitably become a function of perspective.
Similarly, suppose someone wants to learn how to hide money from the IRS. In that case, an AI system can cause serious harm by being honest and helpful if it provides methods to hide money and encourages illegal activities. Navigating such intricacies is complex enough for a human being, let alone an AI system that has internalized general patterns about the open world and the curated datasets used for training it.
AI models are, by construction, good at creating an illusion of novelty. Conversations with modern AI-based chatbots often feel very real. However, upon close inspection, we will find that these models are, in fact, incapable of “new intelligence” and, in fact, will absolutely fail when expected to do a task far from what it has seen before. For example, if human beings experience a new verbally expressed emotion or a unique way of speaking, we would have to move mountains and label a lot of data, before we can satisfactorily adapt an AI model’s abilities along these dimensions. This limitation is not unexpected, and it makes more sense when one starts thinking of an AI model as a manifestation of the past patterns, nuances, and quirks of the real world. The actual value is realized only when a human interacts with these systems and holds its hands through the creation of novelty.
With the realization that AI is a highly condensed yet operationally sound form of all the data in the world, we conclude that the reasoning and final decision/output of an AI model, be it desirable or not, is attributable to both the model itself as well as to how the model was built and used. While the extent to which AI and AI systems are humanized today is unfortunate, it behooves us to treat them and consider them as just another inanimate tool. The values considered and adopted both in training AI models, embedding them in sophisticated decision-making steps, and finally in how they are used all play a significant role in ensuring that the realized outcomes align with human values and are justified despite the natural tensions at play between these values.
People building models, such as scientists, statisticians and engineers, are often removed from the intricacies of large datasets and are more interested in the theoretical abstractions that eventually provide reliable guarantees for applications that deliver real-world impact. As a result, considerations around the ethicality of outcomes become almost an afterthought. Instead, we need to understand that the ethics of AI outcomes are a key component of the impact delivered, and therefore, it makes complete sense to begin considering this right from the get-go by naturalizing this as part of the process and instilling human values into the data labeling and model building exercise early on.
The human species’ beauty is constantly searching for answers to difficult new questions and venturing outside its comfort domain onto less-trodden paths. For example, we will examine several such difficult questions and examples related to special considerations for self-driving cars, Simpson’s paradox for cholesterol vs diabetes and the value of knowing age groups, or even fairness or the lack of it when AI models consider sensitive attributes such as gender and race in the process of loan approvals. Furthermore, does it make more sense to unlearn or realign models to be more “safe,” or is it more desirable to control this behavior and employ this controllability as appropriate in different scenarios?
This continual collaborative effort, filled with interactions between diverse mindsets and their openness to agree to disagree, furthers the general agenda of progress and innovation for the human race. Responsibly built AI systems will walk alongside this evolving data generation process and, in conjunction with their users, deliver preferable and ethical experiences.