We begin by considering examples within two broad themes: the replication crisis in science and fairness and inequality in algorithmic or data-driven systems.
First install R and then install RStudio (this second step is highly recommended but not required, if you prefer another IDE and you’re sure you know what you’re doing). Finally, open RStudio and install the tidyverse set of packages by running the command
install.packages("tidyverse")
Note: If you use a Mac or Linux-based computer you may want to install these using a package manager instead of downloading them from the websites linked above. Personally, on a Mac computer I use Homebrew (the link has instructions for how to install it) to install R and RStudio.
But analysts don’t know which hypotheses are null, so they could not create this plot or separate the zscore values into the null and nonnull cases. Instead, some analysts may choose to only publish the results that seem significant.
# Generate simulated published effectsproportion_phack <- .9which_studies_phacked <-rbinom(N, 1, proportion_phack)simulated_publications <- simulated_world |>mutate(phacked = which_studies_phacked) |> dplyr::filter(phacked ==0|# not p-hacked ORabs(zscore) > signif_level) # large enoughnrow(simulated_publications)
[1] 8768
Published zscores when proportion 0.9 are p-hacked