We’ll use the dataset pop_dt. The dataset contains tabulation of Indonesia’s population based on the results of the 2020 population census by regency/city and gender from BPS-Statistics Indonesia https://sensus.bps.go.id/main/index/sp2020.
dim(pop_dt)#> [1] 514 8pop_dt %>%head()#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865#> 2 1102 11 02 ACEH ACEH SINGKIL 63978 62536 126514#> 3 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414#> 4 1104 11 04 ACEH ACEH TENGGARA 110799 110061 220860#> 5 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401#> 6 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576
Allocation DataFrame
The dataset used is alokasi_dt which is a dataset consisting of sample allocations for each province for sampling purposes.
dim(alokasi_dt)#> [1] 34 3alokasi_dt#> # A tibble: 34 x 3#> kdprov jml_kabkota n_primary#> <chr> <int> <dbl>#> 1 11 23 4#> 2 12 33 5#> 3 13 19 3#> 4 14 12 3#> # ... with 30 more rows
Simple Random Sampling (SRS)
A simple random sample is a randomly selected subset of a population. In this sampling method, each member of the population has an exactly equal chance of being selected.
The following is the syntax for simple random sampling. Use parameter method = 'srs'
Systematic random sampling is a method to select samples at a particular preset interval. Using population and allocation data that has been provided previously, we will carry out systematic random sampling by utilizing the doSampling function from samplingin package. Use parameter method = 'systematic'
Primary Units Sampling
The following is the syntax for sampling the primary units
head(dtSampling_u$pop)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total flags#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865 U#> 2 1102 11 02 ACEH ACEH SINGKIL 63978 62536 126514 <NA>#> 3 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414 <NA>#> 4 1104 11 04 ACEH ACEH TENGGARA 110799 110061 220860 <NA>#> 5 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401 <NA>#> 6 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576 <NA>#> tanggal#> 1 28/09/24#> 2 <NA>#> 3 <NA>#> 4 <NA>#> 5 <NA>#> 6 <NA>
Units Sampled
head(dtSampling_u$sampledf)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865#> 2 1107 11 07 ACEH ACEH BARAT 100492 98244 198736#> 3 1113 11 13 ACEH GAYO LUES 50026 49506 99532#> 4 1118 11 18 ACEH PIDIE JAYA 78742 79655 158397#> 5 1205 12 05 SUMATERA UTARA TAPANULI UTARA 156176 156582 312758#> 6 1211 12 11 SUMATERA UTARA KARO 200247 204751 404998#> flags tanggal#> 1 U 28/09/24#> 2 U 28/09/24#> 3 U 28/09/24#> 4 U 28/09/24#> 5 U 28/09/24#> 6 U 28/09/24dtSampling_u$sampledf %>% nrow#> [1] 100
To perform sampling for secondary units, we utilize the population results from prior sampling, which have been marked for the selected primary units. Parameters in doSampling are added with is_secondary=TRUE.
It can be seen that there are still 2 units that have not been selected as samples. To view the allocation that has not yet been selected as samples, it is as follows:
PPS systematic sampling is a method of sampling from a finite population in which a size measure is available for each population unit before sampling and where the probability of selecting a unit is proportional to its size. Units with larger sizes have more chance to be selected. We will use doSampling function with parameter method = 'pps' and auxVar = 'Total' for its auxiliary variable.
head(dtSampling_pps$pop)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total flags#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865 <NA>#> 2 1102 11 02 ACEH ACEH SINGKIL 63978 62536 126514 <NA>#> 3 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414 <NA>#> 4 1104 11 04 ACEH ACEH TENGGARA 110799 110061 220860 <NA>#> 5 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401 <NA>#> 6 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576 U#> tanggal#> 1 <NA>#> 2 <NA>#> 3 <NA>#> 4 <NA>#> 5 <NA>#> 6 28/09/24
Units Sampled
head(dtSampling_pps$sampledf)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576#> 2 1110 11 10 ACEH BIREUEN 215282 221136 436418#> 3 1114 11 14 ACEH ACEH TAMIANG 149263 145093 294356#> 4 1175 11 75 ACEH SUBULUSSALAM 46065 44686 90751#> 5 1208 12 08 SUMATERA UTARA ASAHAN 389391 380569 769960#> 6 1212 12 12 SUMATERA UTARA DELI SERDANG 971735 959706 1931441#> flags tanggal#> 1 U 28/09/24#> 2 U 28/09/24#> 3 U 28/09/24#> 4 U 28/09/24#> 5 U 28/09/24#> 6 U 28/09/24dtSampling_pps$sampledf %>% nrow#> [1] 100
For sampling that utilizes stratification, the doSampling function includes additional parameter called strata. The strata variable must be available in the population and the allocation being used. For example, in the pop_dt data, information about strata is added, namely strata_kabkot, which indicates information about districts (strata_kabkot = 1) and cities (strata_kabkot = 2).
Displaying the sampling result with stratification
Population Sampled
head(dtSampling_strata$pop)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865#> 2 1102 11 02 ACEH ACEH SINGKIL 63978 62536 126514#> 3 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414#> 4 1104 11 04 ACEH ACEH TENGGARA 110799 110061 220860#> 5 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401#> 6 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576#> strata_kabkot flags tanggal#> 1 1 <NA> <NA>#> 2 1 <NA> <NA>#> 3 1 U 28/09/24#> 4 1 <NA> <NA>#> 5 1 <NA> <NA>#> 6 1 <NA> <NA>
Units Sampled
head(dtSampling_strata$sampledf)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414#> 2 1109 11 09 ACEH PIDIE 215878 219397 435275#> 3 1115 11 15 ACEH NAGAN RAYA 85039 83353 168392#> 4 1171 11 71 ACEH BANDA ACEH 127435 125464 252899#> 5 1204 12 04 SUMATERA UTARA TAPANULI TENGAH 183814 181363 365177#> 6 1212 12 12 SUMATERA UTARA DELI SERDANG 971735 959706 1931441#> strata_kabkot flags tanggal#> 1 1 U 28/09/24#> 2 1 U 28/09/24#> 3 1 U 28/09/24#> 4 2 U 28/09/24#> 5 1 U 28/09/24#> 6 1 U 28/09/24dtSampling_strata$sampledf %>% nrow#> [1] 100dtSampling_strata$sampledf %>%count(strata_kabkot)#> strata_kabkot n#> 1 1 63#> 2 2 37
So that the characteristics of the selected sample are distributed according to certain variables, sampling sometimes employs implicit stratification. For instance, if you aim to obtain samples distributed according to the total population, you can add the parameter implicitby = 'Total' when conducting sampling.
Sometimes, the random numbers for sampling have already been determined beforehand. Thus, for sampling using those predetermined random numbers, the samplingin package accommodates this by adding the parameter predetermined_rn, which takes the value of the variable storing the predetermined random numbers. For example, if the random numbers are stored in the allocation data frame under the variable name arand, thus we add predetermined_rn = 'arand'
Displaying the sampling result with predetermined random number
Population Sampled
head(dtSampling_prn$pop)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865#> 2 1102 11 02 ACEH ACEH SINGKIL 63978 62536 126514#> 3 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414#> 4 1104 11 04 ACEH ACEH TENGGARA 110799 110061 220860#> 5 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401#> 6 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576#> strata_kabkot flags tanggal#> 1 1 <NA> <NA>#> 2 1 <NA> <NA>#> 3 1 <NA> <NA>#> 4 1 <NA> <NA>#> 5 1 U 28/09/24#> 6 1 <NA> <NA>
Units Sampled
head(dtSampling_prn$sampledf)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401#> 2 1111 11 11 ACEH ACEH UTARA 301211 301582 602793#> 3 1117 11 17 ACEH BENER MERIAH 81765 79577 161342#> 4 1172 11 72 ACEH SABANG 20838 20359 41197#> 5 1204 12 04 SUMATERA UTARA TAPANULI TENGAH 183814 181363 365177#> 6 1212 12 12 SUMATERA UTARA DELI SERDANG 971735 959706 1931441#> strata_kabkot flags tanggal#> 1 1 U 28/09/24#> 2 1 U 28/09/24#> 3 1 U 28/09/24#> 4 2 U 28/09/24#> 5 1 U 28/09/24#> 6 1 U 28/09/24dtSampling_prn$sampledf %>% nrow#> [1] 100
Allocate predetermined allocations to smaller levels
One of the supporting functions in the samplingin package is get_allocation. This function aims to allocate sample allocations to lower levels using the proportional allocation method based on the square root of the specified variable.
For example, sample allocations are available at the Province level, which will be allocated to lower levels such as Districts/Cities using the proportional allocation method based on the square root of the total population (Total).
set.seed(242)alokasi_prov = alokasi_dt %>%select(-jml_kabkota, -n_primary) %>%mutate(init_alloc =as.integer(runif(n(), 100, 200))) %>%as.data.frame()alokasi_prov %>%head(10)#> kdprov init_alloc#> 1 11 178#> 2 12 100#> 3 13 133#> 4 14 168#> 5 15 165#> 6 16 176#> 7 17 175#> 8 18 192#> 9 19 102#> 10 21 164alokasi_prov %>%summarise(sum(init_alloc))#> sum(init_alloc)#> 1 5168alokasi_kab = pop_dt %>%left_join(alokasi_prov) %>%get_allocation(n_alloc ="init_alloc", group =c("kdprov"), pop_var ="Total") %>%as.data.frame()#> Joining with `by = join_by(kdprov)`alokasi_kab %>%head(10)#> idkab kdprov kdkab nmprov nmkab Laki-laki Perempuan Total#> 1 1101 11 01 ACEH SIMEULUE 47630 45235 92865#> 2 1102 11 02 ACEH ACEH SINGKIL 63978 62536 126514#> 3 1103 11 03 ACEH ACEH SELATAN 116542 115872 232414#> 4 1104 11 04 ACEH ACEH TENGGARA 110799 110061 220860#> 5 1105 11 05 ACEH ACEH TIMUR 212286 210115 422401#> 6 1106 11 06 ACEH ACEH TENGAH 109262 106314 215576#> 7 1107 11 07 ACEH ACEH BARAT 100492 98244 198736#> 8 1108 11 08 ACEH ACEH BESAR 204428 201107 405535#> 9 1109 11 09 ACEH PIDIE 215878 219397 435275#> 10 1110 11 10 ACEH BIREUEN 215282 221136 436418#> init_alloc n_primary#> 1 178 5#> 2 178 6#> 3 178 8#> 4 178 8#> 5 178 11#> 6 178 8#> 7 178 8#> 8 178 11#> 9 178 11#> 10 178 11alokasi_kab %>%summarise(sum(n_primary))#> sum(n_primary)#> 1 5168alokasi_kab %>%group_by(kdprov) %>%summarise(sum(n_primary))#> # A tibble: 34 x 2#> kdprov `sum(n_primary)`#> <chr> <dbl>#> 1 11 178#> 2 12 100#> 3 13 133#> 4 14 168#> # ... with 30 more rows# check all.equal( alokasi_prov, alokasi_kab %>%group_by(kdprov) %>%summarise(init_alloc=sum(n_primary)) %>%ungroup() %>%as.data.frame())#> [1] TRUE
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.