Understanding t-tests, ANOVA, and MANOVA
Imagine you love baking cookies and invite your friends over for a cookie party. You want to know how many cookies you should make so you ask your friends about how many cookies they think they will each eat. They respond:
- Francesca: 5 cookies
- Sydney: 3 cookies
- Noelle: 1 cookie
- James: 7 cookies
- Brooke: 2 cookies
We take these numbers and add all of them together to estimate that about 18 cookies will be eaten in total at our party.
Understanding Polychoric Correlation
Polychoric correlation is a measure of association between two ordered categorical variables, each assumed to represent latent continuous variables that have a bivariate standard normal distribution. When we say two variables have a bivariate standard normal distribution, we mean they’re both normally distributed with mean 0 and standard deviation 1, and that they have linear correlation.
Power and Sample Size Calculations for Ordered Categorical Data
In this article we demonstrate how to calculate the sample size needed to achieve a desired power for experiments with an ordered categorical outcome. We assume the data will be analyzed with a proportional odds logistic regression model. We’ll use the R statistical computing environment and functions from the {Hmisc} package to implement the calculations.
Bootstrapped Association Rule Mining in R
Association rule mining is a machine learning technique designed to uncover associations between categorical variables in data. It produces association rules which reflect how frequently categories are found together. For instance, a common application of association rule mining is the “Frequently Bought Together” section of the checkout page from an online store. Through mining across transactions from that store, items that are frequently bought together have been identified (e.g., shaving cream and razors).
Spatial R: Using the sf package
The spread of disease, politics, the movement of animals, regions vulnerable to earthquakes and where people are most likely to buy frosted flakes are all informed by spatial data. Spatial data links information to specific positions on earth and can tell us about patterns that play out from location to location. We can use spatial data to uncover processes over space and tackle complex problems.
Assessing Model Assumptions with Lineup Plots
When fitting a linear model we make two assumptions about the distribution of residuals:
Bootstrapping Residuals for Linear Models with Heteroskedastic Errors Invites Trouble
Bootstrapping—resampling data with replacement and recomputing quantities of interest—lets analysts approximate sampling distributions for complex estimators and frees them of the reliably unmet assumptions of traditional, parametric inferential statistics. It’s an elegant, intuitive approach in which an analyst exploits the (often) parallel resample-to-sample and sample-to-population relationships to understand uncertainty in an estimate.
Power and Sample Size Estimation for Logistic Regression
In this article we demonstrate how to use simulation in R to estimate power and sample size for proposed logistic regression models that feature two binary predictors and their interaction.
Recall that logistic regression attempts to model the probability of an event conditional on the values of predictor variables. If we have a binary response, y, and two predictors, x and z, that interact, we specify the logistic regression model as follows:
Understanding Somers' D
When it comes to summarizing the association between two numeric variables, we can use Pearson or Spearman correlation. When accompanied with a scatterplot, they allow us to quantify association on a scale from -1 to 1. But what if we have two ordered categorical variables with just a few levels? How can we summarize their association? One approach is to calculate Somers’ Delta, or Somers’ D for short.
Starting with Non-Metric Multidimensional Scaling (NMDS)
Real world problems and data are complex and there are often situations where we want to simultaneously look at relationships between more than one variable. We call these analyses multivariate statistics. Non-metric multidimensional scaling, or NMDS, is one multivariate technique that allows us to visualize these complex relationships in less dimensions. In other words, NMDS takes complex, multivariate data and represents the relationships in a way that is easier for interpretation.