library(pollster)
library(dplyr)
library(knitr)
library(ggplot2)
The default topline table comes with columns for response category, frequency count, percent, valid percent, and cumulative percent.
topline(df = illinois, variable = voter, weight = weight) %>%
kable()
Response | Frequency | Percent | Valid Percent | Cumulative Percent |
---|---|---|---|---|
Voted | 56230937 | 54.76407 | 63.6809 | 63.6809 |
Not voted | 32070164 | 31.23357 | 36.3191 | 100.0000 |
(Missing) | 14377412 | 14.00236 | NA | NA |
Because the output is a tibble
, it’s simple to
manipulate it in any way you want after creating it. Use
dplyr::select
to remove columns or
dplyr::filter
to remove rows. For convenience, the
topline
function also provides ways to do this within the
function call. For example, the remove
argument accepts a
character vector of response values to be removed from the table
after all statistics are calculated. This is especially useful
for survey data with a “refused” category.
topline(df = illinois, variable = voter, weight = weight,
remove = c("(Missing)"), pct = FALSE) %>%
mutate(Frequency = prettyNum(Frequency, big.mark = ",")) %>%
kable(digits = 0)
Response | Frequency | Valid Percent | Cumulative Percent |
---|---|---|---|
Voted | 56,230,937 | 64 | 64 |
Not voted | 32,070,164 | 36 | 100 |
Refer to the kableExtra
package for lots of examples on how to format the appearance of
these tables in either HTML or PDF latex formats. I recommend the
vignettes “Create Awesome HTML Table with knitr::kable and kableExtra”
and “Create Awesome PDF Table with knitr::kable and kableExtra.
topline(df = illinois, variable = voter, weight = weight) %>%
ggplot(aes(Response, Percent, fill = Response)) +
geom_bar(stat = "identity")
Get at topline table with the margin of error in a separate column
using the moe_topline
function. By default, a z-score of
1.96 (95% confidence interval is used). Supply your own desired z-score
using the zscore
argument.
moe_topline(df = illinois, variable = educ6, weight = weight)
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> # A tibble: 6 × 6
#> Response Frequency Percent `Valid Percent` MOE `Cumulative Percent`
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 LT HS 10770999. 10.5 10.5 0.326 10.5
#> 2 HS 31409418. 30.6 30.6 0.490 41.1
#> 3 Some Col 21745113. 21.2 21.2 0.434 62.3
#> 4 AA 8249909. 8.03 8.03 0.289 70.3
#> 5 BA 19937965. 19.4 19.4 0.420 89.7
#> 6 Post-BA 10565110. 10.3 10.3 0.323 100
The margin of error is calculated including the design effect of the sample weights, using the following formula:
sqrt(design effect)*zscore*sqrt((pct*(1-pct))/(n-1))*100
The design effect is calculated using the formula
length(weights)*sum(weights^2)/(sum(weights)^2)
.