Structural Equation Models (SEM) and particular cases using rstan interface

sem(
  data,
  blocks,
  paths,
  exogenous,
  signals,
  row_names = rownames(data),
  prior_specs = list(beta = c("normal(0,1)"), sigma2 = c("inv_gamma(2.1, 1.1)"), gamma0
    = c("normal(0,1)"), gamma = c("normal(0,1)"), tau2 = c("inv_gamma(2.1, 1.1)")),
  cores = parallel::detectCores(),
  pars = c("alpha", "lambda", "sigma2"),
  iter = 2000,
  chains = 4,
  scaled = FALSE,
  verbose = FALSE,
  refresh = 100,
  ...
)

Arguments

data

a mandatory 'matrix' object where the columns are variables and the rows are observations

blocks

a mandatory named list of colnames (or integers in 1:ncol(data)) indicating the manisfest variables corresponding to each block; generic names are assumed for latent variables internally if not defined

paths

list referring to the inner model paths; a list of characters or integers referring to the scores relationship; the jth first latent variable are explained if names(paths) is NULL

exogenous

list referring to the inner model exogenous; a list of characters or integers referring to relationship between exogenous and latent variables; the lth first columns are explained if names(exogenous) is NULL

signals

list referring to the signals of the factor loadings initial values; must be true: (length(signals) == length(blocks)) && (lengths(signals) == lengths(blocks)); (not allowed in runShiny)

row_names

optional identifier for the observations (observation = row)

prior_specs

prior settings for the Bayesian approach; only `normal` and `cauchy` for gamma0, gamma and beta; `gamma`, `lognormal` and `inv_gamma` for sigma2 and tau2 are available, those prior specifications are ignored if not needed (FA or SEM)

cores

number of core threads to be used

pars

allows parameters to omitted in the outcome; options are any subset of default c("alpha", "lambda", "sigma2")

iter

number of iterations

chains

number of chains

scaled

logical; indicates whether to center and scale the data; default FALSE

verbose

logical; see sampling; default FALSE

refresh

defaults to 100; see sampling;

...

further arguments passed to Stan such as warmup, adapt_delta and others, see sampling.

Value

An object of class bsem; a list of 14 to 19:

stanfit

S4 object of class stanfit

posterior

the list of posterior draws separated by chains

model

character; pointer to pre-defined stan model

mean_alpha

matrix of factor loadings posterior means

mean_lambda

matrix of factor scores posterior means

mean_sigma2

vector of error variances posterior means

mean_beta

vector of regression coefficients posterior means

mean_tau2

vector of inner paths error variances posterior means

mean_gamma

vector of inner paths regression coefficients posterior means

mean_gamma0

vector of inner paths intercept posterior means

stats

posterior descriptives statistics

blocks

list of blocks

paths

list of paths

credint

Highest posterior density intervals (HPD)

h

vector of posterior communalities

PTVE

vector of total variance proportions

R2

adjusted coefficient of determination

SQE

explained sums of squares

SQT

total sums of squares

Details

Fits the SEM to specific data

Consider:

- the outer model as: -- outer blocks:

$$X_{p x n} = \alpha_{p x k}\lambda_{k x n} + \epsilon_{p x n}$$ where \(X\) is the data matrix with variables in the rows and sample elements in the columns, \(\alpha_{p x j}\) is the column vector of loadings for the \(jth\) latent variable and \(\lambda_{j x n}\) is the row vector of scores for the \(jth\) unobserved variable, \(j =1,\dots,k\). Normality is assumed for the errors as \(\epsilon_{ij}~ N(0, \sigma_i ^2)\) for \(i = 1,\dots, p\).

- the inner model as:

-- inner paths: $$\lambda_{j x n} = \beta \lambda^(-j) + \nu$$ where \(\beta\) is a column vector of constant coefficients and \(\lambda^(-j)_{ (k-1) x n}\) represents a subset of the matrix of scores, i.e. at least excluding the \(jth\) row scores. The error assumes \(\nu_j ~ N(0,1)\).

-- inner exogenous: $$Y_{l x n} = \gamma_0 + \gamma \lambda + \xi$$ where \(\gamma\) is a column vector of constant coefficients and \(\gamma_0\) is the intercept. \(\lambda_{k x n}\) is the matrix of scores and the error assumes \(\xi_l~ N(0,\tau_l^2)\).

See also

Examples

dt <- bsem::simdata() names(dt)
#> [1] "data" "real" "blocks" "signals" "paths" "exogenous"
if (FALSE) { semfit <- bsem::sem( data = dt$data, blocks = dt$blocks, paths = dt$paths, exogenous = dt$exogenous, signals = dt$signals, iter = 2000, warmup = 1000, chains = 4 ) summary(semfit) }