Title: | Measuring Discursive Sophistication in Open-Ended Survey Responses |
---|---|
Description: | A simple approach to measure political sophistication based on open-ended survey responses. Discursive sophistication captures the complexity of individual attitude expression by quantifying its relative size, range, and constraint. For more information on the measurement approach see: Kraft, Patrick W. 2023. "Women Also Know Stuff: Challenging the Gender Gap in Political Sophistication." American Political Science Review (forthcoming). |
Authors: | Patrick Kraft [aut, cre, cph] |
Maintainer: | Patrick Kraft <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1.9000 |
Built: | 2024-11-25 03:53:56 UTC |
Source: | https://github.com/pwkraft/discursive |
A subset of data from the UWM Team Content of the 2018 CCES wave. See Kraft (2023) for details.
cces
cces
cces
A data frame with 1,000 rows and 15 columns:
Age (in years)
Gender (1 = female)
Education level (1-6)
Party identification (1-7)
educ_cont * pid_cont
Open-ended responses
A sample of terms that signal a higher level of constraint between different considerations (combining conjunctions and exclusive words). See Kraft (2023) for details.
dict_sample
dict_sample
cces
A data character vector with 4 elements:
also, and
but, without
This function takes a data frame (data
) containing a set of open-ended responses (openends
) to compute the three components of discursive sophistication (size, range, and constraint) and combines them in a single scale. See Kraft (2023) for details.
discursive( data, openends, meta, args_textProcessor = NULL, args_prepDocuments = NULL, args_stm = NULL, keep_stm = TRUE, dictionary, remove_duplicates = FALSE, type = c("scale", "average", "average_scale", "product"), progress = TRUE )
discursive( data, openends, meta, args_textProcessor = NULL, args_prepDocuments = NULL, args_stm = NULL, keep_stm = TRUE, dictionary, remove_duplicates = FALSE, type = c("scale", "average", "average_scale", "product"), progress = TRUE )
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
meta |
A character vector containing topic prevalence covariates included in |
args_textProcessor |
A named list containing additional arguments passed to |
args_prepDocuments |
A named list containing additional arguments passed to |
args_stm |
A named list containing additional arguments passed to |
keep_stm |
Logical. If TRUE function returns output of |
dictionary |
A character vector containing dictionary terms to flag conjunctions and exclusive words. May include regular expressions. |
remove_duplicates |
Logical. If TRUE duplicates in |
type |
The method of combining the three components, must be "scale", "average", "average_scale", or "product". The default is "scale", which creates an additive index that is re-scaled to mean 0 and standard deviation 1. Alternatively, "average" creates the same additive index without re-scaling; "average_scale" re-scales each individual component to mean 0 and standard deviation 1 before creating the additive index; "product" creates a multiplicative index. |
progress |
Logical. Shows progress bar if TRUE. |
A list containing the measure of discursive sophistication and the underlying components in a data frame, as well as the output of stm::textProcessor()
, stm::prepDocuments()
, and stm::stm()
.
discursive(data = cces, openends = c(paste0("oe0", 1:9), "oe10"), meta = c("age", "educ_cont", "pid_cont", "educ_pid", "female"), args_prepDocuments = list(lower.thresh = 10), args_stm = list(K = 25, seed = 12345), dictionary = dict_sample)
discursive(data = cces, openends = c(paste0("oe0", 1:9), "oe10"), meta = c("age", "educ_cont", "pid_cont", "educ_pid", "female"), args_prepDocuments = list(lower.thresh = 10), args_stm = list(K = 25, seed = 12345), dictionary = dict_sample)
This function combines the size
, range
, and constraint
of open-ended responses in a single scale. See Kraft (2023) for details.
discursive_combine( size, range, constraint, type = c("scale", "average", "average_scale", "product") )
discursive_combine( size, range, constraint, type = c("scale", "average", "average_scale", "product") )
size |
A named list containing an element labeled |
range |
A numeric vector containing the range component of discursive sophistication. Usually created via |
constraint |
A numeric vector containing the constraint component of discursive sophistication. Usually created via |
type |
The method of combining the three components, must be "scale", "average", "average_scale", or "product". The default is "scale", which creates an additive index that is re-scaled to mean 0 and standard deviation 1. Alternatively, "average" creates the same additive index without re-scaling; "average_scale" re-scales each individual component to mean 0 and standard deviation 1 before creating the additive index; "product" creates a multiplicative index. |
A numeric vector with the same length as the number of rows in data
.
discursive_combine(size = list(size = runif(100)), range = runif(100), constraint = runif(100))
discursive_combine(size = list(size = runif(100)), range = runif(100), constraint = runif(100))
This function takes a data frame (data
) containing a set of open-ended responses (openends
) and a dictionary
to identify terms that signal a higher level of constraint between different considerations (usually conjunctions and exclusive words). It returns a numeric vector of dictionary counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
discursive_constraint(data, openends, dictionary, remove_duplicates = FALSE)
discursive_constraint(data, openends, dictionary, remove_duplicates = FALSE)
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
dictionary |
A character vector containing dictionary terms to flag conjunctions and exclusive words. May include regular expressions. |
remove_duplicates |
Logical. If TRUE duplicates in |
A numeric vector with the same length as the number of rows in data
.
discursive_constraint(data = cces, openends = c(paste0("oe0", 1:9), "oe10"), dictionary = dict_sample)
discursive_constraint(data = cces, openends = c(paste0("oe0", 1:9), "oe10"), dictionary = dict_sample)
This function takes a data frame (data
) containing a set of open-ended responses (openends
) to compute the Shannon entropy in individual response lengths across items. The function returns a numeric vector of topic counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
discursive_range(data, openends)
discursive_range(data, openends)
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
A numeric vector with the same length as the number of rows in data
.
discursive_range(data = cces, openends = c(paste0("oe0", 1:9), "oe10"))
discursive_range(data = cces, openends = c(paste0("oe0", 1:9), "oe10"))
This function takes a data frame (data
) containing a set of open-ended responses (openends
) and additional arguments passed to stm::textProcessor()
and stm::prepDocuments()
to estimate a structural topic model via stm::stm()
. The results of the the structural topic model are used to compute the relative number of topics raised in each open-ended response. The function returns a numeric vector of topic counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
discursive_size( data, openends, meta, args_textProcessor = NULL, args_prepDocuments = NULL, args_stm = NULL, keep_stm = TRUE, progress = TRUE )
discursive_size( data, openends, meta, args_textProcessor = NULL, args_prepDocuments = NULL, args_stm = NULL, keep_stm = TRUE, progress = TRUE )
data |
A data frame. |
openends |
A character vector containing variable names of open-ended responses in |
meta |
A character vector containing topic prevalence covariates included in |
args_textProcessor |
A named list containing additional arguments passed to |
args_prepDocuments |
A named list containing additional arguments passed to |
args_stm |
A named list containing additional arguments passed to |
keep_stm |
Logical. If TRUE function returns output of |
progress |
Logical. Shows progress bar if TRUE. |
A list containing the size component of discursive sophistication as well as the output of stm::textProcessor()
, stm::prepDocuments()
, and stm::stm()
.
discursive_size(data = cces, openends = c(paste0("oe0", 1:9), "oe10"), meta = c("age", "educ_cont", "pid_cont", "educ_pid", "female"), args_prepDocuments = list(lower.thresh = 10), args_stm = list(K = 25, seed = 12345))
discursive_size(data = cces, openends = c(paste0("oe0", 1:9), "oe10"), meta = c("age", "educ_cont", "pid_cont", "educ_pid", "female"), args_prepDocuments = list(lower.thresh = 10), args_stm = list(K = 25, seed = 12345))
This function takes a structural topic model output estimated via stm::stm()
as well as the underlying set of documents created via stm::prepDocuments()
to compute the relative number of topics raised in each open-ended response. The function returns a numeric vector of topic counts re-scaled to range from 0 to 1. See Kraft (2023) for details.
ntopics(x, docs, progress = TRUE)
ntopics(x, docs, progress = TRUE)
x |
A structural topic model estimated via |
docs |
A set of documents used for the structural topic model; created via |
progress |
Logical. Shows progress bar if TRUE. |
A numeric vector with the same length as the number of documents in x
and docs
.
meta <- c("age", "educ_cont", "pid_cont", "educ_pid", "female") openends <- c(paste0("oe0", 1:9), "oe10") cces$resp <- apply(cces[, openends], 1, paste, collapse = " ") cces <- cces[!apply(cces[, meta], 1, anyNA), ] processed <- stm::textProcessor(cces$resp, metadata = cces[, meta]) out <- stm::prepDocuments(processed$documents, processed$vocab, processed$meta, lower.thresh = 10) stm_fit <- stm::stm(out$documents, out$vocab, prevalence = as.matrix(out$meta), K=25, seed=12345) ntopics(stm_fit, out)
meta <- c("age", "educ_cont", "pid_cont", "educ_pid", "female") openends <- c(paste0("oe0", 1:9), "oe10") cces$resp <- apply(cces[, openends], 1, paste, collapse = " ") cces <- cces[!apply(cces[, meta], 1, anyNA), ] processed <- stm::textProcessor(cces$resp, metadata = cces[, meta]) out <- stm::prepDocuments(processed$documents, processed$vocab, processed$meta, lower.thresh = 10) stm_fit <- stm::stm(out$documents, out$vocab, prevalence = as.matrix(out$meta), K=25, seed=12345) ntopics(stm_fit, out)
Internal function to compute Shannon entropy in relative word counts across a set of elements in a character vecotr. Entropy is re-scaled to range from 0 to 1. Function used in discursive_range()
.
oe_shannon(x)
oe_shannon(x)
x |
Character vector containing open-ended responses. |
Numeric vector with the same length as x.