--- title: "Flagging respondents" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Flagging respondents} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Using `flag_resp()` to create and compare flagging strategies One use-case for response quality indicators is to use them to flag responses which potentially are of low quality. `resquin` provides the function `flag_resp()` to create a data frame of booleans (`T` and `F`) according to user-defined cut-off values on response quality indicators. If a respondent receives a `T` value, they are flagged as suspicious. If they receive `F` value, they are deemed unsuspicious. The strength of `flag_resp()` lies in its ability to quickly create and compare multiple flagging strategies, as the following example illustrates: Suppose we use data on response styles to decide whether respondents are low-quality responders on the 15 item `nep` scale. We can use `resp_styles()` to calculate response style indices per respondent. ```{r setup} library(resquin) nep_resp_styles <- resp_styles( x = nep, scale_min = 1, # minimum response option scale_max = 5, # maximum response option min_valid_responses = 1) # default, excludes respondents with any missing value summary(nep_resp_styles) ``` In the first example, we will consider the acquiescence response style (ARS). ARS represents the tendency of respondents to agree to questions regardless of their content. Since the `nep` scale includes positively and negatively keyed items, we can expect that higher ARS values indeed correspond to this behavior: Respondents who are more concerned about nature should choose higher response options on the positively keyed items and more negative responses on the negatively keyed items. Just choosing all high response options presents a substantively inconsistent response behavior, potentially caused by acquiescence. A first idea could be to flag respondents which have more than 80% responses in the ARS category. ```{r} first_flagging <- flag_resp(nep_resp_styles, ARS > 0.8) summary(first_flagging) ``` We can see that 33 respondents are flagged as suspicious, as their ARS score is above 0.8. ### Using two flagging strategies In a second step, we might also be interested in flagging respondents who choose the same response option repeatedly. We can use the `resp_patterns()` to compute the longest string length indicator. This indicator shows the longest string of repeated response options. We will flag respondents which have a longest string length of 8 or more. We keep the ARS flagging strategy in place to compare it to the new one. ```{r} nep_resp_patterns <- resp_patterns(nep) nep_resp_patterns_resp_styles <- cbind(nep_resp_styles,nep_resp_patterns[,-1]) second_flagging <- flag_resp(nep_resp_patterns_resp_styles, ARS > 0.8, longest_string_length >= 8) summary(second_flagging) ``` We can see that 19 respondents have a longest string length of larger or equal to 8. The output also contains an agreement matrix between the flagging strategies. In the second row of the first column, we can see that the two flagging strategies agree on 9 flagged respondents. Together, both strategies would flag 33 + 19 - 9 = 43 respondents of 1222. It is also possible to join mutliple flagging expressions with an `&` or `|` operator. ```{r} flag_resp(nep_resp_patterns_resp_styles, ARS > 0.8, longest_string_length >= 8, ARS > 0.8 | longest_string_length >= 8) |> summary() ``` ### Comparing three flagging strategies with one comming from a different source We can use any vector of logical (i.e. `T` and `F`) values with the same number of rows as the `nep` data frame and compare them with the values provided by `resquin`. In the following example we create a random vector of boolean values and add it to the data frame from the last example. ```{r} random_vector <- sample(c(F,T),1000,replace = T) random_vector[is.na(nep_resp_styles$ARS)] <- NA # Add missing data as in the other data frames # example three contains response indicator values per respondent external_indicator_data <- cbind( nep_resp_patterns_resp_styles, new_indicator = random_vector) flag_resp(external_indicator_data, ARS > 0.8, longest_string_length >= 8, new_indicator == T) |> summary() ``` The new indicator `new_indicator` now is included in the output of the summary function and can be compared with the other indicators. ### Filtering respondents The output of `flag_resp()` can be used to filter out the flagged respondents. The output of `flag_resp()` is just a collection of logicals: ```{r} flag_df <- flag_resp( nep_resp_patterns_resp_styles, ARS > 0.8, longest_string_length >= 8, ARS > 0.8 | longest_string_length >= 8) flag_df ``` We can use these to filter respondents from the original `nep` dataset. We can exclude the flagged respondent. ```{r} # Exclude the 33 flagged respondents with ARS > 0.8 nep[!flag_df$`ARS > 0.8`,] |> na.omit() #exclude respondents with missing values ``` Alternatively we can filter out the flagged respondent. ```{r} # Extract only the 33 flagged respondents with ARS 0.8 nep[flag_df$`ARS > 0.8`,] |> na.omit() ``` Notice that you can also use the `id` column in the `flag_df` to join the `flag_df` to your original data.