cluster:r
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:r [2018/10/01 17:50] – [Other functions] mcloughlin | cluster:r [2024/11/11 20:55] (current) – removed mcloughlin | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== R ====== | ||
| - | R is available on the cluster. R can also be installed on your computer for free by visiting the [[http:// | ||
| - | |||
| - | For more information about R, you might want to use the [[http:// | ||
| - | |||
| - | ===== Running R on the Cluster ===== | ||
| - | You can run R interactively on the cluster with:< | ||
| - | You can also run a .R file in batch mode with< | ||
| - | To run your R command in the background, see [[cluster: | ||
| - | |||
| - | ===== Introduction to R ===== | ||
| - | The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011. The data used in the examples is located at [[http:// | ||
| - | |||
| - | ===== R Background ===== | ||
| - | * Based on Bell Labs S | ||
| - | * Open source software | ||
| - | * Large group of contributors | ||
| - | * Most R code is written in R | ||
| - | * Computationally intensive code written in FORTRAN or C | ||
| - | * Datasets, matrices are native types | ||
| - | * Easy, customizable graphics | ||
| - | |||
| - | ==== R Pros ==== | ||
| - | * Free | ||
| - | * Easy to get a sense of what is going on with data | ||
| - | * Excellent at simulation | ||
| - | * Interfaces with lots of other software (i.e. WINBUGS, SQL) | ||
| - | |||
| - | ==== R Cons ==== | ||
| - | * Uses RAM to store data | ||
| - | * Support mainly via listserves | ||
| - | * Difficult to get started | ||
| - | |||
| - | ==== Read in Data ==== | ||
| - | * Some type specific methods, and a general method < | ||
| - | |||
| - | ==== Getting Help ==== | ||
| - | * You can use the following command to get the help page for a command: < | ||
| - | * To search for text in help text use the following command: < | ||
| - | |||
| - | ====Summary==== | ||
| - | * Getting summaries is easy: < | ||
| - | * You can also focus on one variable | ||
| - | < | ||
| - | summary(dat$num_child) | ||
| - | table(dat$num_child) | ||
| - | </ | ||
| - | |||
| - | ====Subset Data==== | ||
| - | * When you reference something with < | ||
| - | < | ||
| - | dat.lf <- dat[dat$emp %in% c(" | ||
| - | dat.hs <- dat.lf[dat.lf$educ==39, | ||
| - | </ | ||
| - | |||
| - | ==== Linear Models ==== | ||
| - | * The **lm** function fits linear models with a formula: | ||
| - | < | ||
| - | lm1 <- lm(weekly_earn ~ age + year, | ||
| - | summary(lm1) | ||
| - | </ | ||
| - | * You can also treat a variable as a factor: | ||
| - | < | ||
| - | dat$yearf <- as.factor(dat$year) | ||
| - | lm2 <- lm(weekly_earn ~ age + yearf, | ||
| - | summary(lm2) | ||
| - | </ | ||
| - | * And change constraints: | ||
| - | < | ||
| - | contrasts(dat$yearf) <- " | ||
| - | lm3 <- lm(weekly_earn ~ age + yearf, | ||
| - | summary(lm3) | ||
| - | </ | ||
| - | |||
| - | ==== Aggregate ==== | ||
| - | * Allows you to create summary statistics for groups | ||
| - | * First argument is what you want to summarize | ||
| - | * Second argument is what you want to group by | ||
| - | * Their argument is what to do to the groups | ||
| - | < | ||
| - | agg.hs <- aggregate(dat.hs$emps, | ||
| - | </ | ||
| - | * Results names a little odd. | ||
| - | * | ||
| - | ==== Merge ==== | ||
| - | * Groups two datasets by shared columns | ||
| - | < | ||
| - | merged <- merge(data.a, | ||
| - | </ | ||
| - | * Lots of options for this one | ||
| - | |||
| - | ==== Parallel ==== | ||
| - | Some basic info can be found at the [[http:// | ||
| - | |||
| - | You can also use [[http:// | ||
| - | |||
| - | ^^OpenMPI^MPICH2^ | ||
| - | |Before anything (installation or usage)|> | ||
| - | |Installation| R> | ||
| - | |||
| - | A good intro guide is [[http:// | ||
| - | |||
| - | ==== Other functions ==== | ||
| - | * Merge datasets with: '' | ||
| - | * Fit limited dependent variable models with < | ||
| - | * Minimizes / finds zeros with < | ||
| - | * [http:// | ||
cluster/r.1538416217.txt.gz · Last modified: 2018/10/01 17:50 (external edit)
