Differences

This shows you the differences between two versions of the page.

--- cluster:r [2018/10/01 17:39] – mcloughlin
+++ cluster:r [2024/11/11 20:55] (current) – removed mcloughlin
@@ Line 1: / Line 1: @@
-====== R ======
-R is available on the cluster. R can also be installed on your computer for free by visiting the [[http://r-project.org/|R-project main page]].
-For more information about R, you might want to use the [[http://cran.r-project.org/doc/manuals/R-intro.html|R manual]] or [[http://rseek.org|RSeek]], the search engine for R related resources.
-===== Running R on the Cluster =====
-You can run R interactively on the cluster with:<code>R</code>
-You can also run a .R file in batch mode with<code>R CMD BATCH Rfile.R</code>where Rfile.R is your R file.
-To run your R command in the background, see [[cluster:managing_jobs|Managing Jobs]]
-===== Introduction to R =====
-The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011.  The data used in the examples is located at [[http://terpconnect.umd.edu/~pdbailey/R/MDemp.csv|this link]].
-===== R Background =====
-  * Based on Bell Labs S
-  * Open source software
-    * Large group of contributors
-    * Most R code is written in R
-    * Computationally intensive code written in FORTRAN or C
-  * Datasets, matrices are native types
-  * Easy, customizable graphics
-==== R Pros ====
-  * Free
-  * Easy to get a sense of what is going on with data
-  * Excellent at simulation
-  * Interfaces with lots of other software (i.e. WINBUGS, SQL)
-==== R Cons ====
-  * Uses RAM to store data
-  * Support mainly via listserves
-  * Difficult to get started
-==== Read in Data ====
-  * Some type specific methods, and a general method <code>dat <- read.csv("MDemp.csv")</code> and
-    general methods <code>dat <- read.table("MDemp.csv",sep=",")</code>
-==Getting Help==
-* You can use <code>?/<code> command to get the help page for a command
-* To search for text in help text use <code>??</code> command
-==Summary==
-* Getting summaries is easy
- summary(dat)
-* You can also focus on one variable
- summary(dat$num_child)
- table(dat$num_child)
-* (Extended example)
-==Subset Data==
-* When you reference something with <code>[condition,]</code> you can select rows
- dat.lf <- dat[dat$emp %in% c("emp","unemp"),]
- dat.hs <- dat.lf[dat.lf$educ==39,]
-==Linear Models==
-* The <code>lm</code> function fits linear models with a ``formula''
- lm1 <- lm(weekly_earn ~ age + year,data=dat)
- summary(lm1)
-* You can also treat a variable as a ``factor''
- dat$yearf <- as.factor(dat$year)
- lm2 <- lm(weekly_earn ~ age + yearf,data=dat)
- summary(lm2)
-* And change constraints
- contrasts(dat$yearf) <- "contr.sum"
- lm3 <- lm(weekly_earn ~ age + yearf,data=dat)
- summary(lm3)
-==Aggregate==
-* Allows you to create summary statistics for groups
-* First argument is what you want to summarize
-* Second argument is what you want to group by
-* Their argument is what to do to the groups
- agg.hs <- aggregate(dat.hs$emps,by=list(dat.lf$yq),mean)
-* Results names a little odd.
-==Merge==
-* Groups two datasets by shared columns
-  merged <- merge(data.a,data.b)
-* Lots of options for this one
-==Parallel==
-Some basic info can be found at the [http://cran.r-project.org/web/views/HighPerformanceComputing.html High Performance Computing CRAN view]. You can use the "parallel" package (which merges both "snow" and "multicore".
-You can also use [http://cran.r-project.org/web/packages/Rmpi/index.html Rmpi] and [http://cran.r-project.org/web/packages/npRmpi/index.html npRmpi] packages. You have your choice of MPI2 libraries (both OpenMPI and MPICH2). You will have to install the packages in your userspace (requiring compilation).
-{| class="wikitable"
-|-
-!
-! OpenMPI
-! MPICH2
-|-
-| Before anything (installation or usage)
-| >module load openmpi-x86_64
-| >module load mpich2-x86_64
-|-
-| Installation
-| R> install.packages("<package>", configure.args="--with-Rmpi-include=/usr/lib64/openmpi/1.4-gcc/include --with-Rmpi-libpath=/usr/lib64/openmpi/1.4-gcc/lib --with-Rmpi-type=OPENMPI")
-| R> install.packages("<package>", configure.args="-with-Rmpi-include=/usr/include/mpich2-x86_64 --with-Rmpi-libpath=/usr/lib64/mpich2/lib --with-Rmpi-type=MPICH")
-|}
-A good intro guide is [http://onlinelibrary.wiley.com/doi/10.1002/jae.1221/pdf npRmpi: A package for parallel distributed kernel estimation in R] ''Journal of Applied Econometrics''.
-==Other functions==
-* <code>merge</code> merges datasets
-* <code>glm</code> fits limited dependent variable models.
-* <code>optim</code> minimizes / finds zeros
-* [http://cran.r-project.org/web/views/Econometrics.html contributed econometrics packages]