I use SPSS and Stata quite a bit and want to know how to get .sav and .dta files into R.
Getting generic data files into R (e.g. .csv & .txt) are fairly simple and handled with R’s base library. Getting this pre-formatted data files into R takes a bit more work (primarily the use of an additional package).
The foreign() package is apparently the key to doing this. Packages are compact and efficient code repositories that the R community has generated and maintained over the years. I think we’re going to find these quite useful throughout the semester.
So back to the task – getting an SPSS file into R. There are some working SPSS files in Blackboard under “Course Documents –> R –> R Data”. Feel free to use your own or download the course data files. Either way, we either 1) have to know where this data is stored once we download it (or where your data is) or 2) have R shortcuts mapped to the files where this data is stored (see previous post).
If you haven’t created shortcuts mapped to your working directory you can always set your working directory once you’re in R:
This is a generic folder and path name that you may use but you may also have something more elaborate like:
setwd("C:/Documents and Settings/Rwilliams/My Documents/R Data")
If you don’t set your working directory, you can still get data into R, but you’ll have to enter the entire path name each time (e.g. “C:/Documents and Settings/Rwilliams/My Documents/R Data/data.sav”).
You can also check the contents of your working directory with:
If you see you data in there, then you’re ready to start calling it.
The following code should get the SPSS data file into R:
This calls the foreign package from your R library so we can read in foreign datasets
data<-read.spss("NELS88_student.sav",use.value.labels=T, max.value.labels=Inf, to.data.frame=T)
Since SPSS files often have labels that represent categorical variables we tell R to treat these labels as such (may not always be appropriate). We also don’t put a limit on factor levels by specifying “Inf”. Finally, we tell R that this is to become a data frame.
Your data should now be stored in the R object “data”. Depending on the size of the data you just imported, it may not be useful to take a look at all of the data at one time. A couple of things I typically do to make sure my data made it in correctly are below.
This gives a summary of each vector in the data frame. Sometimes this can be cumbersome but if you know what you’re looking for it can help.
This provides the first 6 rows of the data frame.
This provides the last six rows of the data.
This let’s us know how may rows are in the data
This lets us know how many columns are in the data
Datasets from Stata can easily be imported using the following command: