# install.packages("fairness")
library(tidyverse)
library(fairness)
data('germancredit')
Ethics4DS: Coursework 1 answers
Discussion questions
Question 1
Data science is usually framed as utilitarian because of its focus on prediction/causation (consequences) and optimization (maximizing utility). Describe an example data science application using explicitly utilitarian language, then refer to at least one non-consequentialist theory to identify some aspect of the application that utilitarianism might overlook.
Example application
(This answer is written somewhat abstractly in order to include many possible applications)
A common kind of application is for an organization to predict “risk” for individuals and make decisions based on these predictions. These decisions usually involve allocating some kind of resource. Decision makers at the organizations usually believe allocating resources when the predicted risk is lower will achieve better outcomes (i.e. higher utility) for the organization, and they probably also believe the organization where they work makes a positive contribution to society and therefore to total utility.
- A tech company may predict users’ interest in different kinds of content to decide which items to recommend (for viewing/purchasing) or which advertisements to show
- A university may predict academic performance of applicants to decide which students to admit
- A bank may predict credit scores to decide how much credit to make available to different customers
- A government agency may predict threats to public health or safety, like outbreaks of viruses or violence, and decide who/where to monitor more closely
Several utilitarian criticisms:
- Social choice problem: the utility of the organization may align poorly with the total utility of society
- Data science problem: predictions of risk are usually based on correlations/associations and hence may not capture causal relationships, or the measures of risk may be poor proxies for an unobserved variable that would better inform decisions. Hence allocating resources based on predicted risk may not actually maximize (or even increase) the organizations’ (or society’s) utility
A non-utilitarian aspect of this application
One possible deontological issue: there may be laws or ethical duties requiring an organization does not make decisions about individuals based on certain “protected” characteristics or variables. But many of the same variables that are commonly used to predict risk can be associated with protected variables, so predictive models could end up using information about protected variables either explicitly or implicitly. Deontological ethics may encourage us to remove such signals from the risk predictions even if the consequences result in lower utility for the organization.
One possible virtue ethics issue: the utility of the organization may align poorly with certain virtues. These virtues could become neglected if people focus narrowly on whatever other traits they believe are beneficial to the organization’s risk predictions or decisions.
Question 2
Choose one of the ethical data science guidelines that we read. Find some part of it that you agreed with strongly, quote that part, and describe why you thought it was important. Then find another part that you think is too limited, quote that part, and describe what you think is its most important limitation.
Guideline document: ASA
Agreement
From Section A:
The ethical statistical practitioner: … Does not knowingly conduct statistical practices that exploit vulnerable populations or create or perpetuate unfair outcomes.
Reasoning: I’m most impressed by the idea that we should not perpetuate (or continue) some existing unfairness. There are good reasons why avoiding harm is often stated as a first ethical principle, and avoiding harm could correspond to not creating newly unfair outcomes. However I also think that this does not go far enough, and that we also have a responsibility to try to reduce existing unfairness.
- Disagreement
From the Appendix:
Organizations and institutions engage in and promote ethical statistical practice by: … Expecting and encouraging all employees and vendors who conduct statistical practice to adhere to these guidelines. Promoting a workplace where the ethical practitioner may apply the guidelines without being intimidated or coerced. Protecting statistical practitioners who comply with these guidelines.
Reasoning: First, I am confused and disappointed about why this material is in an Appendix. I think responsibilities at the level of organizations/institutions should be emphasized perhaps even more strongly than at the level of individuals. And second, I think that “expecting and encouraging” and “promoting” statistical practice are too weak, and hard requirements should be considered instead. Perhaps training about ethical statistical practice should be mandatory so that, for example, people working with statisticians understand why they should not try to pressure them to produce different results.
Data questions
Computing fairness metrics
Use the fairness package. Pick one of the example datasets in the package. Fit a predictive model using that dataset. Choose three different fairness metrics to compute using the predictions from that model. For each of these, compute the values in the fairness metric in two ways: (1) using standard R
functions, e.g. arithmetic operations, and (2) using the fairness
package functions. Check to see whether you get the same answer.
# Predictive model
<- germancredit |>
gc_data select(BAD, Duration, Amount, Savings,
Employment, Installment_rate, Guarantors,
Job, Foreign, Female)<- glm(BAD ~ ., family = binomial(), data = gc_data) gc_fit
Fairness metric 1
Which metric: demographic parity
# Computing manually in base R
<- gc_data
gc_fit_pred $.fitted <- predict(gc_fit, type = "response")
gc_fit_pred<- which(gc_fit_pred$Female == "Female")
inds_F <- which(gc_fit_pred$Female == "Male")
inds_M data.frame(
sex = c("Female", "Male"),
average = c(mean(gc_fit_pred$.fitted[inds_F]),
mean(gc_fit_pred$.fitted[inds_M])),
cutoff50 = c(mean(gc_fit_pred$.fitted[inds_F] > 0.5),
mean(gc_fit_pred$.fitted[inds_M] > 0.5)),
cutoff90 = c(mean(gc_fit_pred$.fitted[inds_F] > 0.9),
mean(gc_fit_pred$.fitted[inds_M] > 0.9))
)
sex average cutoff50 cutoff90
1 Female 0.7231884 0.9028986 0.10434783
2 Male 0.6483871 0.8096774 0.04193548
# Computing manually with tidyverse
<- gc_fit |>
gc_fit_pred ::augment(type.predict = "response")
broom|>
gc_fit_pred group_by(Female) |>
summarize(average = mean(.fitted),
cutoff50 = mean(.fitted > 0.5),
cutoff90 = mean(.fitted > 0.9))
# A tibble: 2 × 4
Female average cutoff50 cutoff90
<fct> <dbl> <dbl> <dbl>
1 Female 0.723 0.903 0.104
2 Male 0.648 0.810 0.0419
# Comparing to the fairness package answer
<- dem_parity(
fairness_dp data = gc_fit_pred,
outcome = "BAD",
group = "Female",
probs = ".fitted",
cutoff = 0.5
)$Metric fairness_dp
Female Male
Positively classified 623 251.0000000
Demographic Parity 1 0.4028892
Group size 690 310.0000000
Unfortunately this package seems to have a bug currently where the baseline for demographic parity does not take into account the group size. But we can calculate the desired values (proportions of each group with predicted probability above the 0.5 cutoff) this way:
$Metric[1, ] / fairness_dp$Metric[3, ] fairness_dp
Female Male
0.9028986 0.8096774
Fairness metric 2
Which metric: false negative rate parity
To see which is positive/negative we need to see how the glm
function treated the BAD
variable (which levels are 0 and 1), so let’s check the predicted probabilities this way:
|>
gc_fit_pred group_by(BAD) |>
summarize(avg_pred_prob = mean(.fitted))
# A tibble: 2 × 2
BAD avg_pred_prob
<fct> <dbl>
1 BAD 0.608
2 GOOD 0.739
On average the model predicts higher probabilities for the GOOD
level, so a positive corresponds to good credit.
In this case it might make more sense from the perspective of potential customers to have fairness for false negative rates.
To compute false negatives we subset to customers with GOOD
credit and look at the proportion of them given BAD
predictions.
# Computing manually with base R
<- with(gc_fit_pred,
inds_GF which(BAD == "GOOD" & Female == "Female"))
<- with(gc_fit_pred,
inds_GM which(BAD == "GOOD" & Female == "Male"))
data.frame(
sex = c("Female", "Male"),
cutoff50 = c(mean(gc_fit_pred$.fitted[inds_GF] < 0.5),
mean(gc_fit_pred$.fitted[inds_GM] < 0.5)),
cutoff90 = c(mean(gc_fit_pred$.fitted[inds_GF] < 0.9),
mean(gc_fit_pred$.fitted[inds_GM] < 0.9))
)
sex cutoff50 cutoff90
1 Female 0.05210421 0.8677355
2 Male 0.11940299 0.9452736
# Computing manually with tidyverse
|>
gc_fit_pred ::filter(BAD == "GOOD") |> # positives
dplyrgroup_by(Female) |>
summarize(FN50 = mean(.fitted < .5),
FN90 = mean(.fitted < .9))
# A tibble: 2 × 3
Female FN50 FN90
<fct> <dbl> <dbl>
1 Female 0.0521 0.868
2 Male 0.119 0.945
At a cutoff of 50% the false negative rates are low but very different by group. At a cutoff of 90% the false negative rates are less different by group but very high.
# Comparing to the fairness package answer
<- fnr_parity(
fairness_fnr data = gc_fit_pred,
outcome = "BAD",
group = "Female",
probs = ".fitted",
cutoff = 0.5
)$Metric fairness_fnr
Female Male
FNR 0.05210421 0.119403
FNR Parity 1.00000000 2.291619
Group size 690.00000000 310.000000
The FNR
results show the same numbers we computed above as FN50
. Verifying for a cutoff of 0.9 would follow the same way.
Simulating a response variable
Now replace the outcome variable in the original dataset with a new variable that you generate. You can decide how to generate the new outcome. Your goal is to make this outcome result in all the fairness metrics you chose above indicating that the predictive model is fair.
Answer: Recalling the impossibility theorems, we know the only way to satisfy multiple different fairness metrics is (usually, if the fairness metrics are truly different) under some trivial condition like (1) the world is already fair or (2) the model predicts with perfect accuracy.
The easiest way to satisfy (1) is to just make the outcome independent of everything.
<- nrow(gc_data)
n <- gc_data
gc_data_sim $BAD <- rbinom(n, 1, .5) # random 0-1 gc_data_sim
This is a satisfactory answer since it is technically correct and shows understanding of the application of the impossibility result. But I know that just being technically correct can seem unsatisfying as an answer, so I’ll try to give a more interesting solution as well.
Another way to have a “fair world” (satisfying the impossibility theorem) would be if the outcome depends only on predictor variables that are independent of the protected attribute(s). I checked the other numeric predictors in this dataset and none of them were independent of Female
, but I saw that I could create a new predictor that was independent, Rate_Duration_ratio
:
<- gc_data |>
gc_data_indep mutate(Rate_Duration_ratio = Installment_rate / Duration)
|>
gc_data_indep select(-BAD) |>
group_by(Female) |>
summarize(across(where(is.numeric), mean))
# A tibble: 2 × 5
Female Duration Amount Installment_rate Rate_Duration_ratio
<fct> <dbl> <dbl> <dbl> <dbl>
1 Female 21.6 3448. 3.04 0.190
2 Male 19.4 2878. 2.83 0.190
<- function(x) exp(x)/(1+exp(x))
exp_ratio <- gc_data_indep |>
gc_data_sim mutate(
BAD = rbinom(n,
1,
prob = exp_ratio(7 * Rate_Duration_ratio)))
|>
gc_data_sim group_by(BAD) |>
summarize(avg_ratio = mean(Rate_Duration_ratio))
# A tibble: 2 × 2
BAD avg_ratio
<int> <dbl>
1 0 0.129
2 1 0.211
This shows we have a predictor variable that is correlated with the outcome, so the outcome is not just purely random and independent of everything (noise).
# Predictive model
<- glm(BAD ~ ., family = binomial(), data = gc_data_sim) gc_sim_fit
Fairness metric 1
Which metric: demographic parity
# Computing manually
<- gc_sim_fit |>
gc_sim_fit_pred ::augment(type.predict = "response")
broom|>
gc_sim_fit_pred group_by(Female) |>
summarize(average = mean(.fitted),
cutoff50 = mean(.fitted > 0.5),
cutoff90 = mean(.fitted > 0.9))
# A tibble: 2 × 4
Female average cutoff50 cutoff90
<fct> <dbl> <dbl> <dbl>
1 Female 0.746 0.980 0.203
2 Male 0.742 0.961 0.2
# Comparing to the fairness package answer
# skipped
Fairness metric 2
Which metric: false negative rate parity
# Computing manually
# Computing manually with tidyverse
|>
gc_sim_fit_pred ::filter(BAD == 1) |> # positives
dplyrgroup_by(Female) |>
summarize(FN50 = mean(.fitted < .5),
FN90 = mean(.fitted < .9))
# A tibble: 2 × 3
Female FN50 FN90
<fct> <dbl> <dbl>
1 Female 0.0175 0.746
2 Male 0.0174 0.739
Concluding thoughts
Since the questions were fairly open-ended there are many other possible good answers. This solution guide is just an example of some of the answers that first occurred to me and a few ways of writing of R
code that may be helpful to learn from.