User Tools

Site Tools


cluster:software

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cluster:software [2024/11/11 20:22] – [Cluster Software] mcloughlincluster:software [2024/11/14 14:47] (current) – external edit 127.0.0.1
Line 1: Line 1:
-====== Cluster Software ======+======Cluster Software======
  
 The list of currently installed software on the cluster. If you wish to have additional software installed, please email econcluster@umd.edu. The list of currently installed software on the cluster. If you wish to have additional software installed, please email econcluster@umd.edu.
  
-//Currently updating to reflect new OS refresh--2024/11/11 bpmcln//+^Software ^Version ^Terminal Command ^ 
 +| GCC | 11.4.1 | gcc | 
 +| Matlab | R2023a | matlab | 
 +| Python | 3.9 | python | 
 +| Python | 3.11 | python3.11 
 +| R | 4.4.2 | R | 
 +| Stata | 18 MP8 | stata-mp |
  
-=====Matlab===== +=====Python=====
-The current version of Matlab is R2023a. Use "matlab" command to open command line.+
  
-=====Stata===== +To run a pre-written python script, type <code>python script.py</code> 
-The current version of Stata is 18 MP8. Use "stata-mp" command to open in MP mode.+ 
 +==== Installing Libraries via VENV ==== 
 + 
 +To install a library that doesn't come with the initial installation, you first need to create a virtual environment (where $NAME is what you choose to name your virtual environment) 
 +<code>python -m venv $NAME</code> 
 +after creating, activate the environment 
 +<code>source environment_name/bin/activate</code> 
 +(you should now see the environment on the far left of the terminal line)After that, you can simply install any library using pip from the command line 
 +<code>pip install pandas</code>
  
 =====R===== =====R=====
-The current version of R is 4.4.2Use "R" command to open.+====Batch Mode==== 
 +You can run an R file in batch mode with <code>R CMD BATCH filename.R</code> 
 +To run your R command in the background, see [[cluster:managing_jobs|Managing Jobs]]. 
 + 
 +====Installing Packages===== 
 +To install an R package, type in the interactive mode <code>install.packages($PACKAGE_NAME)</code> 
 +  
 +====Introduction to R==== 
 +The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011.  The data used in the examples is located at [[http://terpconnect.umd.edu/~pdbailey/R/MDemp.csv|this link]]. 
 + 
 +===R Background=== 
 +  * Based on Bell Labs S 
 +  * Open source software 
 +    * Large group of contributors 
 +    * Most code is written in R 
 +    * Computationally intensive code written in FORTRAN or C 
 +  * Datasets, matrices are native types 
 +  * Easy, customizable graphics 
 + 
 +===R Pros=== 
 +  * Free 
 +  * Easy to get a sense of what is going on with data 
 +  * Excellent at simulation 
 +  * Interfaces with lots of other software (i.e. WINBUGS, SQL) 
 + 
 +===R Cons=== 
 +  * Uses RAM to store data 
 +  * Support mainly via listserves 
 +  * Difficult to get started 
 + 
 +===Read in Data=== 
 +  * Some type specific methods, and a general method <code>dat <- read.csv("MDemp.csv")</code> and general methods <code>dat <- read.table("MDemp.csv",sep=",")</code> 
 + 
 +===Getting Help=== 
 +  * You can use the following command to get the help page for a command: <code>?</code> 
 +  * To search for text in help text use the following command: <code>??</code> 
 + 
 +===Summary=== 
 +  * Getting summaries is easy: ''summary(dat)'' 
 +  * You can also focus on one variable 
 +<code> 
 +summary(dat$num_child) 
 +table(dat$num_child) 
 +</code> 
 + 
 +===Subset Data=== 
 +  * When you reference something with ''[condition,]'' you can select rows: 
 +<code> 
 +dat.lf <- dat[dat$emp %in% c("emp","unemp"),
 +dat.hs <- dat.lf[dat.lf$educ==39,
 +</code> 
 + 
 +===Linear Models=== 
 +  * The **lm** function fits linear models with a formula: 
 +<code> 
 +lm1 <- lm(weekly_earn ~ age + year,data=dat) 
 +summary(lm1) 
 +</code> 
 +  * You can also treat a variable as a factor: 
 +<code> 
 +dat$yearf <- as.factor(dat$year) 
 +lm2 <- lm(weekly_earn ~ age + yearf,data=dat) 
 +summary(lm2) 
 +</code> 
 +  * And change constraints: 
 +<code> 
 +contrasts(dat$yearf) <- "contr.sum" 
 +lm3 <- lm(weekly_earn ~ age + yearf,data=dat) 
 +summary(lm3) 
 +</code> 
 + 
 +===Aggregate=== 
 +  * Allows you to create summary statistics for groups 
 +  * First argument is what you want to summarize 
 +  * Second argument is what you want to group by 
 +  * Their argument is what to do to the groups 
 +<code> 
 +agg.hs <- aggregate(dat.hs$emps,by=list(dat.lf$yq),mean) 
 +</code> 
 +  * Results names a little odd. 
 +  *  
 +===Merge=== 
 +  * Groups two datasets by shared columns 
 +<code> 
 +merged <- merge(data.a,data.b) 
 +</code> 
 +* Lots of options for this one 
 + 
 +===Parallel=== 
 +Some basic info can be found at the [[http://cran.r-project.org/web/views/HighPerformanceComputing.html|High Performance Computing CRAN view]]. You can use the "parallel" package (which merges both "snow" and "multicore").  
 + 
 +You can also use [[http://cran.r-project.org/web/packages/Rmpi/index.html|Rmpi]] and [[http://cran.r-project.org/web/packages/npRmpi/index.html|npRmpi]] packages. You have your choice of MPI2 libraries (both OpenMPI and MPICH2). You will have to install the packages in your userspace (requiring compilation). 
 + 
 +^^OpenMPI^MPICH2^ 
 +|Before anything (installation or usage)|>module load openmpi-x86_64|>module load mpich2-x86_64| 
 +|Installation| R> install.packages("<package>", configure.args="--with-Rmpi-include=/usr/lib64/openmpi/1.4-gcc/include --with-Rmpi-libpath=/usr/lib64/openmpi/1.4-gcc/lib --with-Rmpi-type=OPENMPI")|R> install.packages("<package>", configure.args="-with-Rmpi-include=/usr/include/mpich2-x86_64 --with-Rmpi-libpath=/usr/lib64/mpich2/lib --with-Rmpi-type=MPICH")| 
 + 
 +A good intro guide is [[http://onlinelibrary.wiley.com/doi/10.1002/jae.1221/pdf|npRmpi: A package for parallel distributed kernel estimation in R]]. 
 + 
 +===Other functions=== 
 +  * ''merge'' merges datasets 
 +  * ''glm'' fits limited dependent variable models. 
 +  * ''optim'' minimizes / finds zeros 
 +  * [[http://cran.r-project.org/web/views/Econometrics.html|Contributed econometrics packages]] 
 + 
 +=====Stata===== 
 +====Batch Mode===== 
 +You can run a .do file in batch mode with 
 +<code> 
 +stata-mp -b do dofile.do 
 +</code> 
 + 
 +To allow your do-file to continue running when you log off from your terminal, preface the command with "nohup". For example: 
 +<code> 
 +nohup stata-mp -b do dofile.do & 
 +</code> 
 + 
 +For more information on how to run your Stata command in the background, see [[Managing_Jobs|Managing Jobs]] 
 + 
 +==== Temporary Files ==== 
 +By default, Stata saves tempfiles (from -tempfile- or -preserve-) to /nfs/home/$USERNAME/stata-tmp/. If you would like Stata to save temporary files in a new location (e.g. $HOME/statatmp) then from the command-line execute the follow before executing Stata: 
 +<code> 
 +export STATATMP=$HOME/statatmp  
 +</code> 
 + 
 +One reason you might want to do this is that files are removed from /home/stata-tmp/ if they haven't been touched for a day. If you have a Stata process that runs for longer this may cause problems with reading from tempfiles or -restore-. 
 + 
 +==== Installing Extra Packages ==== 
 +If you are using extra packages on your home/work computer and need them installed on the cluster, you can install them via ssc:  
 + 
 +<code> 
 +ssc install outreg 
 +</code> 
 + 
 +You will then have a folder installed within your home directory called "ado", which contains your new commands filed away.
cluster/software.1731356538.txt.gz · Last modified: 2024/11/11 20:22 by mcloughlin