Differences

This shows you the differences between two versions of the page.

--- cluster:software [2024/11/11 20:54] – [Running Stata on the Cluster] mcloughlin
+++ cluster:software [2024/11/14 14:47] (current) – external edit 127.0.0.1
@@ Line 2: / Line 2: @@
 The list of currently installed software on the cluster. If you wish to have additional software installed, please email econcluster@umd.edu.
-//Currently updating to reflect new OS refresh. --2024/11/11 bpmcln//
 ^Software ^Version ^Terminal Command ^
@@ Line 9: / Line 7: @@
 | Matlab | R2023a | matlab |
 | Python | 3.9 | python |
+| Python | 3.11 | python3.11 |
 | R | 4.4.2 | R |
 | Stata | 18 MP8 | stata-mp |
@@ Line 24: / Line 23: @@
 (you should now see the environment on the far left of the terminal line). After that, you can simply install any library using pip from the command line
 <code>pip install pandas</code>
+=====R=====
+====Batch Mode====
+You can run an R file in batch mode with <code>R CMD BATCH filename.R</code>
+To run your R command in the background, see [[cluster:managing_jobs|Managing Jobs]].
+====Installing Packages=====
+To install an R package, type in the interactive mode <code>install.packages($PACKAGE_NAME)</code>
+====Introduction to R====
+The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011.  The data used in the examples is located at [[http://terpconnect.umd.edu/~pdbailey/R/MDemp.csv|this link]].
+===R Background===
+  * Based on Bell Labs S
+  * Open source software
+    * Large group of contributors
+    * Most R code is written in R
+    * Computationally intensive code written in FORTRAN or C
+  * Datasets, matrices are native types
+  * Easy, customizable graphics
+===R Pros===
+  * Free
+  * Easy to get a sense of what is going on with data
+  * Excellent at simulation
+  * Interfaces with lots of other software (i.e. WINBUGS, SQL)
+===R Cons===
+  * Uses RAM to store data
+  * Support mainly via listserves
+  * Difficult to get started
+===Read in Data===
+  * Some type specific methods, and a general method <code>dat <- read.csv("MDemp.csv")</code> and general methods <code>dat <- read.table("MDemp.csv",sep=",")</code>
+===Getting Help===
+  * You can use the following command to get the help page for a command: <code>?</code>
+  * To search for text in help text use the following command: <code>??</code>
+===Summary===
+  * Getting summaries is easy: ''summary(dat)''
+  * You can also focus on one variable
+<code>
+summary(dat$num_child)
+table(dat$num_child)
+</code>
+===Subset Data===
+  * When you reference something with ''[condition,]'' you can select rows:
+<code>
+dat.lf <- dat[dat$emp %in% c("emp","unemp"),]
+dat.hs <- dat.lf[dat.lf$educ==39,]
+</code>
+===Linear Models===
+  * The **lm** function fits linear models with a formula:
+<code>
+lm1 <- lm(weekly_earn ~ age + year,data=dat)
+summary(lm1)
+</code>
+  * You can also treat a variable as a factor:
+<code>
+dat$yearf <- as.factor(dat$year)
+lm2 <- lm(weekly_earn ~ age + yearf,data=dat)
+summary(lm2)
+</code>
+  * And change constraints:
+<code>
+contrasts(dat$yearf) <- "contr.sum"
+lm3 <- lm(weekly_earn ~ age + yearf,data=dat)
+summary(lm3)
+</code>
+===Aggregate===
+  * Allows you to create summary statistics for groups
+  * First argument is what you want to summarize
+  * Second argument is what you want to group by
+  * Their argument is what to do to the groups
+<code>
+agg.hs <- aggregate(dat.hs$emps,by=list(dat.lf$yq),mean)
+</code>
+  * Results names a little odd.
+  *
+===Merge===
+  * Groups two datasets by shared columns
+<code>
+merged <- merge(data.a,data.b)
+</code>
+* Lots of options for this one
+===Parallel===
+Some basic info can be found at the [[http://cran.r-project.org/web/views/HighPerformanceComputing.html|High Performance Computing CRAN view]]. You can use the "parallel" package (which merges both "snow" and "multicore").
+You can also use [[http://cran.r-project.org/web/packages/Rmpi/index.html|Rmpi]] and [[http://cran.r-project.org/web/packages/npRmpi/index.html|npRmpi]] packages. You have your choice of MPI2 libraries (both OpenMPI and MPICH2). You will have to install the packages in your userspace (requiring compilation).
+^^OpenMPI^MPICH2^
+|Before anything (installation or usage)|>module load openmpi-x86_64|>module load mpich2-x86_64|
+|Installation|	R> install.packages("<package>", configure.args="--with-Rmpi-include=/usr/lib64/openmpi/1.4-gcc/include --with-Rmpi-libpath=/usr/lib64/openmpi/1.4-gcc/lib --with-Rmpi-type=OPENMPI")|R> install.packages("<package>", configure.args="-with-Rmpi-include=/usr/include/mpich2-x86_64 --with-Rmpi-libpath=/usr/lib64/mpich2/lib --with-Rmpi-type=MPICH")|
+A good intro guide is [[http://onlinelibrary.wiley.com/doi/10.1002/jae.1221/pdf|npRmpi: A package for parallel distributed kernel estimation in R]].
+===Other functions===
+  * ''merge'' merges datasets
+  * ''glm'' fits limited dependent variable models.
+  * ''optim'' minimizes / finds zeros
+  * [[http://cran.r-project.org/web/views/Econometrics.html|Contributed econometrics packages]]
 =====Stata=====