User Tools

Site Tools


cluster:r

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cluster:r [2018/10/01 17:48] – [Parallel] mcloughlincluster:r [2024/11/11 20:55] (current) – removed mcloughlin
Line 1: Line 1:
-====== R ====== 
-R is available on the cluster. R can also be installed on your computer for free by visiting the [[http://r-project.org/|R-project main page]].  
- 
-For more information about R, you might want to use the [[http://cran.r-project.org/doc/manuals/R-intro.html|R manual]] or [[http://rseek.org|RSeek]], the search engine for R related resources. 
- 
-===== Running R on the Cluster ===== 
-You can run R interactively on the cluster with:<code>R</code> 
-You can also run a .R file in batch mode with<code>R CMD BATCH Rfile.R</code>where Rfile.R is your R file.  
-To run your R command in the background, see [[cluster:managing_jobs|Managing Jobs]] 
- 
-===== Introduction to R ===== 
-The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011.  The data used in the examples is located at [[http://terpconnect.umd.edu/~pdbailey/R/MDemp.csv|this link]]. 
- 
-===== R Background ===== 
-  * Based on Bell Labs S 
-  * Open source software 
-    * Large group of contributors 
-    * Most R code is written in R 
-    * Computationally intensive code written in FORTRAN or C 
-  * Datasets, matrices are native types 
-  * Easy, customizable graphics 
- 
-==== R Pros ==== 
-  * Free 
-  * Easy to get a sense of what is going on with data 
-  * Excellent at simulation 
-  * Interfaces with lots of other software (i.e. WINBUGS, SQL) 
- 
-==== R Cons ==== 
-  * Uses RAM to store data 
-  * Support mainly via listserves 
-  * Difficult to get started 
- 
-==== Read in Data ==== 
-  * Some type specific methods, and a general method <code>dat <- read.csv("MDemp.csv")</code> and general methods <code>dat <- read.table("MDemp.csv",sep=",")</code> 
- 
-==== Getting Help ==== 
-  * You can use the following command to get the help page for a command: <code>?</code> 
-  * To search for text in help text use the following command: <code>??</code> 
- 
-====Summary==== 
-  * Getting summaries is easy: <code>summary(dat)</code> 
-  * You can also focus on one variable  
-<code> 
-summary(dat$num_child) 
-table(dat$num_child) 
-</code> 
- 
-====Subset Data==== 
-  * When you reference something with <code>[condition,]</code> you can select rows: 
-<code> 
-dat.lf <- dat[dat$emp %in% c("emp","unemp"),] 
-dat.hs <- dat.lf[dat.lf$educ==39,] 
-</code> 
- 
-==== Linear Models ==== 
-  * The **lm** function fits linear models with a formula: 
-<code> 
-lm1 <- lm(weekly_earn ~ age + year,data=dat) 
-summary(lm1) 
-</code> 
-  * You can also treat a variable as a factor: 
-<code> 
-dat$yearf <- as.factor(dat$year) 
-lm2 <- lm(weekly_earn ~ age + yearf,data=dat) 
-summary(lm2) 
-</code> 
-  * And change constraints: 
-<code> 
-contrasts(dat$yearf) <- "contr.sum" 
-lm3 <- lm(weekly_earn ~ age + yearf,data=dat) 
-summary(lm3) 
-</code> 
- 
-==== Aggregate ==== 
-  * Allows you to create summary statistics for groups 
-  * First argument is what you want to summarize 
-  * Second argument is what you want to group by 
-  * Their argument is what to do to the groups 
-<code> 
-agg.hs <- aggregate(dat.hs$emps,by=list(dat.lf$yq),mean) 
-</code> 
-  * Results names a little odd. 
-  *  
-==== Merge ==== 
-  * Groups two datasets by shared columns 
-<code> 
-merged <- merge(data.a,data.b) 
-</code> 
-* Lots of options for this one 
- 
-==== Parallel ==== 
-Some basic info can be found at the [[http://cran.r-project.org/web/views/HighPerformanceComputing.html|High Performance Computing CRAN view]]. You can use the "parallel" package (which merges both "snow" and "multicore").  
- 
-You can also use [[http://cran.r-project.org/web/packages/Rmpi/index.html|Rmpi]] and [[http://cran.r-project.org/web/packages/npRmpi/index.html|npRmpi]] packages. You have your choice of MPI2 libraries (both OpenMPI and MPICH2). You will have to install the packages in your userspace (requiring compilation). 
- 
-^^OpenMPI^MPICH2^ 
-|Before anything (installation or usage)|>module load openmpi-x86_64|>module load mpich2-x86_64| 
-|Installation| R> install.packages("<package>", configure.args="--with-Rmpi-include=/usr/lib64/openmpi/1.4-gcc/include --with-Rmpi-libpath=/usr/lib64/openmpi/1.4-gcc/lib --with-Rmpi-type=OPENMPI")|R> install.packages("<package>", configure.args="-with-Rmpi-include=/usr/include/mpich2-x86_64 --with-Rmpi-libpath=/usr/lib64/mpich2/lib --with-Rmpi-type=MPICH")| 
- 
-A good intro guide is [[http://onlinelibrary.wiley.com/doi/10.1002/jae.1221/pdf|npRmpi: A package for parallel distributed kernel estimation in R]]. 
- 
-==== Other functions ==== 
-  * Merge datasets with: <code>merge</code> 
-  * Fit limited dependent variable models with <code>glm</code> 
-  * Minimizes / finds zeros with <code>optim</code> 
-  * [http://cran.r-project.org/web/views/Econometrics.html contributed econometrics packages] 
  
cluster/r.1538416133.txt.gz · Last modified: 2018/10/01 17:48 (external edit)