Introduction to Mediation Analysis
This post intends to introduce the basics of mediation analysis and does not explain statistical details. For details, please refer to the articles at the end of this post.
What is mediation?
Let’s say previous studies have suggested that higher grades predict higher happiness: X (grades) → Y (happiness). (This research example is made up for illustration purposes. Please don’t consider it a scientific statement.)
Reading PDF Files into R for Text Mining
Let's say we're interested in text mining the opinions of the Supreme Court of the United States. At the time of this writing, the opinions are published as PDF files at the following web page in the section titled "Opinions of the Court": https://www.supremecourt.gov/opinions/opinions.aspx. For the purposes of this introductory tutorial, we'll look at just three opinions from the 2014 term: (1) Glossip v. Gross, (2) State Legislature v.
Understanding Two-Way Interactions
When doing linear modeling or ANOVA it’s useful to examine whether or not the effect of one variable depends on the level of one or more variables. If it does then we have what is called an “interaction”. This means variables combine or interact to affect the response. The simplest type of interaction is the interaction between two two-level categorical variables. Let’s say we have gender (male and female), treatment (yes or no), and a continuous response measure. If the response to treatment depends on gender, then we have an interaction.
Comparing Proportions with Relative Risk and Odds Ratios
The classic two-by-two table displays counts of what may be called “successes” and “failures” versus some two-level grouping variable, such as sex (male and female) or treatment (placebo and active drug). An example of one such table is given in the book An Introduction to Categorical Data Analysis (Agresti, 1996, p. 20). The table classifies myocardial infarction (Yes/No) with treatment group (Placebo/Aspirin).
Using and Interpreting Cronbach's Alpha
Cronbach's alpha is a measure used to assess the reliability, or internal consistency, of a set of scale or test items. In other words, the reliability of any given measurement refers to the extent to which it is a consistent measure of a concept, and Cronbach’s alpha is one way of measuring the strength of that consistency.
Is R-squared Useless?
On Thursday, October 15, 2015, a disbelieving student posted on Reddit: My stats professor just went on a rant about how R-squared values are essentially useless, is there any truth to this? It attracted a fair amount of attention, at least compared to other posts about statistics on Reddit.
Fitting and Interpreting a Proportional Odds Model
Take a look at the following table. It is a cross tabulation of data taken from the 1991 General Social Survey that relates political party affiliation to political ideology (Agresti, An Introduction to Categorical Data Analysis, 1996).
Understanding Diagnostic Plots for Linear Regression Analysis
You ran a linear regression analysis and the stats software spit out a bunch of numbers. The results were significant (or not). You might think that you’re done with analysis. No, not yet. After running a regression analysis, you should check if the model works well for the data.
Getting Started with Quantile Regression
When we think of regression, we usually think of linear regression, the tried and true method for estimating a mean of some variable conditional on the levels or values of independent variables. In other words, we're pretty sure the mean of our variable of interest differs depending on other variables. For example, the mean weight of 1st-year UVA males is some unknown value. But we could in theory take a random sample and discover there is a relationship between weight and height.
Should I Always Transform My Variables to Make Them Normal?
When I first learned data analysis, I always checked normality for each variable and made sure they were normally distributed before running any analyses, such as t-test, ANOVA, or linear regression. I thought normal distribution of variables was the important assumption to proceed to analyses. That’s why stats textbooks show you how to draw histograms and QQ-plots in the beginning of data analysis in the early chapters and see if variables are normally distributed, isn’t it?