Cluster Software

Cluster Software

The list of currently installed software on the cluster. If you wish to have additional software installed, please email econcluster@umd.edu.

Software	Version	Terminal Command
GCC	11.4.1	gcc
Matlab	R2023a	matlab
Python	3.9	python
Python	3.11	python3.11
R	4.4.2	R
Stata	18 MP8	stata-mp

Python

To run a pre-written python script, type

python script.py

Installing Libraries via VENV

To install a library that doesn't come with the initial installation, you first need to create a virtual environment (where $NAME is what you choose to name your virtual environment)

python -m venv $NAME

after creating, activate the environment

source environment_name/bin/activate

(you should now see the environment on the far left of the terminal line). After that, you can simply install any library using pip from the command line

pip install pandas

R

Batch Mode

You can run an R file in batch mode with

R CMD BATCH filename.R

To run your R command in the background, see Managing Jobs.

Installing Packages

To install an R package, type in the interactive mode

install.packages($PACKAGE_NAME)

Introduction to R

The following section comes initially from an introductory talk on R given by Paul Bailey in February 2011. The data used in the examples is located at this link.

R Background

Based on Bell Labs S
Open source software
- Large group of contributors
- Most R code is written in R
- Computationally intensive code written in FORTRAN or C
Datasets, matrices are native types
Easy, customizable graphics

R Pros

Free
Easy to get a sense of what is going on with data
Excellent at simulation
Interfaces with lots of other software (i.e. WINBUGS, SQL)

R Cons

Uses RAM to store data
Support mainly via listserves
Difficult to get started

Read in Data

Some type specific methods, and a general method
```
dat <- read.csv("MDemp.csv")
```
and general methods
```
dat <- read.table("MDemp.csv",sep=",")
```

Getting Help

You can use the following command to get the help page for a command:
```
?
```
To search for text in help text use the following command:
```
??
```

Summary

Getting summaries is easy: summary(dat)
You can also focus on one variable

summary(dat$num_child)
table(dat$num_child)

Subset Data

When you reference something with [condition,] you can select rows:

dat.lf <- dat[dat$emp %in% c("emp","unemp"),]
dat.hs <- dat.lf[dat.lf$educ==39,]

Linear Models

The lm function fits linear models with a formula:

lm1 <- lm(weekly_earn ~ age + year,data=dat)
summary(lm1)

You can also treat a variable as a factor:

dat$yearf <- as.factor(dat$year)
lm2 <- lm(weekly_earn ~ age + yearf,data=dat)
summary(lm2)

And change constraints:

contrasts(dat$yearf) <- "contr.sum"
lm3 <- lm(weekly_earn ~ age + yearf,data=dat)
summary(lm3)

Aggregate

Allows you to create summary statistics for groups
First argument is what you want to summarize
Second argument is what you want to group by
Their argument is what to do to the groups

agg.hs <- aggregate(dat.hs$emps,by=list(dat.lf$yq),mean)

Results names a little odd.

Merge

Groups two datasets by shared columns

merged <- merge(data.a,data.b)

* Lots of options for this one

Parallel

Some basic info can be found at the High Performance Computing CRAN view. You can use the “parallel” package (which merges both “snow” and “multicore”).

You can also use Rmpi and npRmpi packages. You have your choice of MPI2 libraries (both OpenMPI and MPICH2). You will have to install the packages in your userspace (requiring compilation).

OpenMPI	MPICH2
Before anything (installation or usage)	>module load openmpi-x86_64	>module load mpich2-x86_64
Installation	R> install.packages(“<package>”, configure.args=“–with-Rmpi-include=/usr/lib64/openmpi/1.4-gcc/include –with-Rmpi-libpath=/usr/lib64/openmpi/1.4-gcc/lib –with-Rmpi-type=OPENMPI”)	R> install.packages(“<package>”, configure.args=“-with-Rmpi-include=/usr/include/mpich2-x86_64 –with-Rmpi-libpath=/usr/lib64/mpich2/lib –with-Rmpi-type=MPICH”)

A good intro guide is npRmpi: A package for parallel distributed kernel estimation in R.

Other functions

merge merges datasets
glm fits limited dependent variable models.
optim minimizes / finds zeros
Contributed econometrics packages

Stata

Batch Mode

You can run a .do file in batch mode with

stata-mp -b do dofile.do

To allow your do-file to continue running when you log off from your terminal, preface the command with “nohup”. For example:

nohup stata-mp -b do dofile.do &

For more information on how to run your Stata command in the background, see Managing Jobs

Temporary Files

By default, Stata saves tempfiles (from -tempfile- or -preserve-) to /nfs/home/$USERNAME/stata-tmp/. If you would like Stata to save temporary files in a new location (e.g. $HOME/statatmp) then from the command-line execute the follow before executing Stata:

export STATATMP=$HOME/statatmp

One reason you might want to do this is that files are removed from /home/stata-tmp/ if they haven't been touched for a day. If you have a Stata process that runs for longer this may cause problems with reading from tempfiles or -restore-.

Installing Extra Packages

If you are using extra packages on your home/work computer and need them installed on the cluster, you can install them via ssc:

ssc install outreg

You will then have a folder installed within your home directory called “ado”, which contains your new commands filed away.

Table of Contents

Cluster Software

Python

Installing Libraries via VENV

R

Batch Mode

Installing Packages

Introduction to R

R Background

R Pros

R Cons

Read in Data

Getting Help

Summary

Subset Data

Linear Models

Aggregate

Merge

Parallel

Other functions

Stata

Batch Mode

Temporary Files

Installing Extra Packages