User Tools

Site Tools


cluster:software

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cluster:software [2024/11/11 20:54] – [Running Stata on the Cluster] mcloughlincluster:software [2024/11/14 14:47] (current) – external edit 127.0.0.1
Line 2: Line 2:
  
 The list of currently installed software on the cluster. If you wish to have additional software installed, please email econcluster@umd.edu. The list of currently installed software on the cluster. If you wish to have additional software installed, please email econcluster@umd.edu.
- 
-//Currently updating to reflect new OS refresh. --2024/11/11 bpmcln// 
  
 ^Software ^Version ^Terminal Command ^ ^Software ^Version ^Terminal Command ^
Line 9: Line 7:
 | Matlab | R2023a | matlab | | Matlab | R2023a | matlab |
 | Python | 3.9 | python | | Python | 3.9 | python |
 +| Python | 3.11 | python3.11 |
 | R | 4.4.2 | R | | R | 4.4.2 | R |
 | Stata | 18 MP8 | stata-mp | | Stata | 18 MP8 | stata-mp |
Line 24: Line 23:
 (you should now see the environment on the far left of the terminal line). After that, you can simply install any library using pip from the command line (you should now see the environment on the far left of the terminal line). After that, you can simply install any library using pip from the command line
 <code>pip install pandas</code> <code>pip install pandas</code>
 +
 +=====R=====
 +====Batch Mode====
 +You can run an R file in batch mode with <code>R CMD BATCH filename.R</code>
 +To run your R command in the background, see [[cluster:managing_jobs|Managing Jobs]].
 +
 +====Installing Packages=====
 +To install an R package, type in the interactive mode <code>install.packages($PACKAGE_NAME)</code>
 + 
 +====Introduction to R====
 +The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011.  The data used in the examples is located at [[http://terpconnect.umd.edu/~pdbailey/R/MDemp.csv|this link]].
 +
 +===R Background===
 +  * Based on Bell Labs S
 +  * Open source software
 +    * Large group of contributors
 +    * Most R code is written in R
 +    * Computationally intensive code written in FORTRAN or C
 +  * Datasets, matrices are native types
 +  * Easy, customizable graphics
 +
 +===R Pros===
 +  * Free
 +  * Easy to get a sense of what is going on with data
 +  * Excellent at simulation
 +  * Interfaces with lots of other software (i.e. WINBUGS, SQL)
 +
 +===R Cons===
 +  * Uses RAM to store data
 +  * Support mainly via listserves
 +  * Difficult to get started
 +
 +===Read in Data===
 +  * Some type specific methods, and a general method <code>dat <- read.csv("MDemp.csv")</code> and general methods <code>dat <- read.table("MDemp.csv",sep=",")</code>
 +
 +===Getting Help===
 +  * You can use the following command to get the help page for a command: <code>?</code>
 +  * To search for text in help text use the following command: <code>??</code>
 +
 +===Summary===
 +  * Getting summaries is easy: ''summary(dat)''
 +  * You can also focus on one variable
 +<code>
 +summary(dat$num_child)
 +table(dat$num_child)
 +</code>
 +
 +===Subset Data===
 +  * When you reference something with ''[condition,]'' you can select rows:
 +<code>
 +dat.lf <- dat[dat$emp %in% c("emp","unemp"),]
 +dat.hs <- dat.lf[dat.lf$educ==39,]
 +</code>
 +
 +===Linear Models===
 +  * The **lm** function fits linear models with a formula:
 +<code>
 +lm1 <- lm(weekly_earn ~ age + year,data=dat)
 +summary(lm1)
 +</code>
 +  * You can also treat a variable as a factor:
 +<code>
 +dat$yearf <- as.factor(dat$year)
 +lm2 <- lm(weekly_earn ~ age + yearf,data=dat)
 +summary(lm2)
 +</code>
 +  * And change constraints:
 +<code>
 +contrasts(dat$yearf) <- "contr.sum"
 +lm3 <- lm(weekly_earn ~ age + yearf,data=dat)
 +summary(lm3)
 +</code>
 +
 +===Aggregate===
 +  * Allows you to create summary statistics for groups
 +  * First argument is what you want to summarize
 +  * Second argument is what you want to group by
 +  * Their argument is what to do to the groups
 +<code>
 +agg.hs <- aggregate(dat.hs$emps,by=list(dat.lf$yq),mean)
 +</code>
 +  * Results names a little odd.
 +  * 
 +===Merge===
 +  * Groups two datasets by shared columns
 +<code>
 +merged <- merge(data.a,data.b)
 +</code>
 +* Lots of options for this one
 +
 +===Parallel===
 +Some basic info can be found at the [[http://cran.r-project.org/web/views/HighPerformanceComputing.html|High Performance Computing CRAN view]]. You can use the "parallel" package (which merges both "snow" and "multicore"). 
 +
 +You can also use [[http://cran.r-project.org/web/packages/Rmpi/index.html|Rmpi]] and [[http://cran.r-project.org/web/packages/npRmpi/index.html|npRmpi]] packages. You have your choice of MPI2 libraries (both OpenMPI and MPICH2). You will have to install the packages in your userspace (requiring compilation).
 +
 +^^OpenMPI^MPICH2^
 +|Before anything (installation or usage)|>module load openmpi-x86_64|>module load mpich2-x86_64|
 +|Installation| R> install.packages("<package>", configure.args="--with-Rmpi-include=/usr/lib64/openmpi/1.4-gcc/include --with-Rmpi-libpath=/usr/lib64/openmpi/1.4-gcc/lib --with-Rmpi-type=OPENMPI")|R> install.packages("<package>", configure.args="-with-Rmpi-include=/usr/include/mpich2-x86_64 --with-Rmpi-libpath=/usr/lib64/mpich2/lib --with-Rmpi-type=MPICH")|
 +
 +A good intro guide is [[http://onlinelibrary.wiley.com/doi/10.1002/jae.1221/pdf|npRmpi: A package for parallel distributed kernel estimation in R]].
 +
 +===Other functions===
 +  * ''merge'' merges datasets
 +  * ''glm'' fits limited dependent variable models.
 +  * ''optim'' minimizes / finds zeros
 +  * [[http://cran.r-project.org/web/views/Econometrics.html|Contributed econometrics packages]]
  
 =====Stata===== =====Stata=====
cluster/software.1731358491.txt.gz · Last modified: 2024/11/11 20:54 (external edit)