Working with Dates and Times in R Using the lubridate Package
Sometimes we have data with dates and/or times that we want to manipulate or summarize. A common example in the health sciences is time-in-study. A subject may enter a study on February 12, 2008, and exit on November 4, 2009. How many days was the person in the study? (Don’t forget 2008 was a leap year; February had 29 days.) What was the median time-in-study for all subjects?
The Wilcoxon Rank Sum Test
The Wilcoxon Rank Sum Test is often described as the non-parametric version of the two-sample t-test. You sometimes see it in analysis flowcharts after a question such as "is your data normal?" A "no" branch off this question will recommend a Wilcoxon test if you're comparing two groups of continuous measures.
So what is this Wilcoxon test? What makes it non-parametric? What does that even mean? And how do we implement it and interpret it? Those are some of the questions we aim to address in this post.
Pairwise comparisons of proportions
Pairwise comparison means comparing all pairs of something. If I have three items, A, B and C, that means comparing A to B, A to C, and B to C. Given n items, I can determine the number of possible pairs using the binomial coefficient: $$ \frac{n!}{2!(n - 2)!} = \binom {n}{2}$$ Using the R statistical computing environment, we can use the choose()
function to quickly calculate this.
Stata Basics: foreach and forvalues
There are times we need to do some repetitive tasks in the process of data preparation, analysis, or presentation. For instance, we may need to compute a set of variables in the same manner, rename or create a series of variables, or repetitively recode values of a number of variables. In this post, we show a few simple example "loops" using the Stata commands foreach
, local
and forvalues
to handle some common repetitive tasks.
Stata Basics: Reshape Data
In this post, we demonstrate how to convert datasets between wide form and long form. This is also known as "reshaping data". Reshaping is often needed when you work with datasets that contain variables with some kinds of sequences, say, time-series data. It is fairly easy to transform data between wide and long forms in Stata using the reshape
command, however you'll want to be careful when you do so to eliminate possible mistakes in the process of transforming. First, let's see how the wide and long forms look.
Stata Basics: Combine Data (Append and Merge)
When we first start working with data, usually in a statistics class, we mostly use clean and completed datasets as examples. Later on, we realize data is not always clean or complete when doing research or data analysis for other purposes. In reality, we often need to put two or more datasets together to begin whatever statistical analysis tasks we would like to perform. In this post, we demonstrate how to combine datasets using append
and merge
, which are row-wise combining and column-wise combining, respectively.
Stata Basics: Subset Data
Sometimes only parts of a dataset mean something to you. In this post, we show you how to subset a dataset in Stata by variables or by observations. We use the census.dta dataset installed with Stata as the sample data.
Stata Basics: Create, Recode and Label Variables
In this article we demonstrate how to create new variables, recode existing variables, and label variables and values of variables. We work with the census.dta data that is included with Stata to provide examples.
generate: create variables
Here we use the generate
command to create a new variable representing the population younger than 18 years old. We do so by summing up the two existing variables: poplt5 (population < 5 years old) and pop5_17 (population of 5 to 17 years old).
Stata Basics: Data Import, Use and Export
In Stata, the first step of analyzing a dataset is opening the data in Stata so that it knows which file you are working with. Yes, you can simply double click on a Stata data file that ends in .dta to open it, but we prefer to write syntax so we can easily reproduce the same work or use the scripts again when working on similar tasks. In this post, we introduce methods of reading in, using, and saving Stata and other formats of data files.
Using Data.gov APIs in R
Data.gov catalogs US government data and makes them available on the web; you can find data on a variety of topics such as agriculture, business, climate, education, energy, finance, public safety, and many more. It is a good starting point for finding data if you don’t already know which particular data source to begin your search with; however, it can still be time consuming when it comes to actually downloading the raw data you need. Fortunately, Data.gov also includes APIs from across the government, which can help with obtaining raw datasets.