NASA HPCC/ESS Cooperative Agreement:
Turbulent convection and Dynamos in Stars

Documentation for the HPS code

The HPS code is a hybrid finite-difference/pseudo-spectral code for the solution of the Navier-Stokes magnetohydrodynamic (MHD) equations in three-dimensions on parallel architecture supercomputers. Currently, the code is implemeted for the Cray T3E and the IBM SP2, although adaptation to other machines may be realtively simple, since the parallelisation is based around the portable Message Passing Interface (MPI) communication system.

Problems that can be solved

The HPS code is designed to solve problems in compressible magnetohydrodynamics and magnetoconvection. It was built with astrophysical applications in mind, hence the compressibility, and is currently used most heavily to address outstanding questions in solar interior physics. The code can address the pure convective dynamics of the solar convection zone, magnetoconvection in regions of the convection zone, including sunspots and flux expulsion from the convection zone into the stable interior, the motions of magnetic flux tubes in a compressible atmosphere, etc etc.

Physics

The HPS code solves the Navier-Stokes equations for compressible fluid flow coupled with the magnetic induction equation to produce a solver for magnetohydrodynamic systems. The equations solved are given below, and consist of an equation of continuity(the conservation of mass), the conservation of momentum, the conservation of energy formulated as a temperature equation, and the inductance of magnetic field. The equations are formulated around a polytropic mean profile and are non-dimensionalised in terms of the depth of the layer, the isothermal sound crossing time at the top of the box, the thermodynamic quantities at the top of the domain and the initial size of the magnetic field (unless a dynamo problem is to be solved). The shear viscosity, thermal conductivity, magnetic permittivity and specific heats are assumed to be constant. The parameter regime is then governed by a number of dimensionless parameters: the Rayleigh number (a measure of the supercriticality via Ck), the Prandtl number (the ratio of the viscous and thermal diffusivities), the magnetic Prandlt number (the ratio of the magnetic diffusivity to the thermal diffusivity). Rotation is implemented via a modified f-plane formulation, where both horizontal and vertical components of a constant rotation vector (independent of position within the domain) are taken into account, with the strength of the rotation (compared to advective influences) measured by the Taylor number, and the position of the domain on the sphere dictated by the latitudinal angle, phi. The code also allows for the domain to be divided vertically into a convective region and a stable region in what is known as penetrative convection, where motions may overshoot from the upper convectively unstable layer into the lower quiescent layer. The change from unstable to stable is enforced by piece-wise polytropic thermal conductivity profiles. Arbitrary shear velocity profiles may also be forced within the domain.

The domain is a Cartesian rectilinear box, periodic in the two horizontal directions, and bounded by stress-free plates in the vertical. The temperature is fixed at the upper boundary, whereas there is a fixed input thermal flux at the lower boundary. The magnetic field is allowed various configurations at the boundaries, including no field, no flux, vertical field and input flux options. All the boundary conditions in the vertical direction are easily changeable, whereas the condition of periodicity in the horizontal must be strictly retained.

Basic nature of code

The code is a hybrid finite-difference/pseudospectral model. The horizontal spatial discretisation is by a collocation pseudo-spectral method whereas the vertical spatial discretisation is via fourth order centred finite differences. The horizontal dependences of variables are transformed into Fourier space via a Fast Fourier Transform and the majority of computations are done in phase space. Nonlinear terms however are calculated in configuration space, and require a transform backwards and then forwards - hence the pseudospectral nature of the method. The Fourier nature of the horizontal dependences allow derivatives to be reduced to simple multiplications by a wavenumber. Vertical derivatives are calculated by fourth order centred finite differences. This direction is calculated in this manner to allow greater flexibility in the boundary conditions and the stretching of grids.

The time discretisation is different for the different equations and for the linear and nonlinear terms. All the nonlinear terms are calculated by a 3-level Adams-Bashforth (AB) update (except for the very first and second steps which are Euler and 2-level AB respectively). The linear terms of the momentum equation are updated via a simple forward Euler step which has the most lenient timestep restriction for an explicit diffusive step. The linear terms of the induction equation are either done this way, or may be changed via a flag such that the horizontal components of diffusion are instead done by an integrating factor method. The diffusive terms of the temperature equation are updated implicitly via a Crank-Nicolson formulation since this is the term that will most restrict the timestep (via the density denominator in the diffusion coefficient, which becomes very small at the top of the domain).

The magnetic field is stored and updated as poloidal and toroidal potential functions thus ensuring a divergence free field. The nonlinear terms, being calculated in configuration space, are calculated from the original field components and then suitable derivatives taken, so that poloidal and toroidal representations are only needed in phase space. It is not clear whether carrying around the higher derivatives required significantly affects the accuracy of the code. The mean fields are not representable via a poloidal and toroidal representation and so these must be carried around independently and are kept in configuration space. Physically however, they are kept in the k=0 components of the toroidal and poloidal phase space complex arrays.

The code is written in Fortran 90 and makes use of many of the advanced features of that language. The memory is dynamically assigned at run time in particular.

Parallel implementation

The code is implemented in parallel in a one-dimensional domain decomposition with communication in general implemented using MPI. The vertical direction is divided across processors such that a number of horizontal planes is sent to each processor. FFTs are carried out within the planes in the horizontal directions, and thus these operations remain local. In configuration space, the domain decomposition divides the z-planes across processors. After transform, in spectral space, each processor contains the full complement of z-planes, but only a subset of the spectral wavenumbers. That is

configuration space : X(my_z%min:my_z%max,:,:)
spectral space : X(:,my_k%min:my_k%max)

Vertical derivatives could be calculated across processors, but instead, the whole domain is transformed such that the vertical direction lies in processor before any vertical computations are completed.

The whole efficiency of the code depends on the implementation of the FFT and the global transpose associated with it. For this reason, these parts of the code are the most machine-specific. For each machine that the code currently operates on, there exists a subdirectory of the src directory with the machine name. This directory contains the fft routine for each machine, and other machines specific routines. This routine is sucked into the FourierTransform.F routine on compilation by the macros processors. Which machine to use is set by a C-preprocessor (cpp) flag in the file header.h

e.g.

#define T3E

The currently available machine types are:

T3E -- Cray T3E
O2K -- Cray/SGI Origin 2000
SP2 -- IBM SP series

The efficient implementation of the code is achieved by using the most efficient available library fft, and by doing the global transpose as well as possible. On the T3E and the O2K, the standard Craylib 1-D fft's are used. On the SP2, the ESSL 2-D fft is used. The Cray machines (T3E in particular) have optimal caching (E-registers) and direct memory transfer routines (shmem routines) which can speed up the transpose and other parts of the code. The code can be switched between pure MPI communication and the use of these machine-dependent routines via switches in header.h

e.g. #define SHMEM

E-registers (and shmem routines) are also used on the T3E to help the cache flow of the code in the other main sticking point for speed: in the formulation of the nonlinear terms. Since this type of code is limited by cache performance, careful attention has been paid to cache management thoughout the code.

It should be noted that since the ffts are machine-dependent, the ordering of the wavenumbers in spectral space can be different on the different machines. This can make direct comparison of the spectral arrays between say the T3E and the SP2 confusing to say the least! The orderings in one plane, in terms of (kx,ky) are as follows:
T3E:
(1,1) (1,2) (1,3) ... (1,ny_dealias)
(2,1) (2,2) (2,3) ... (2,ny_dealias)
...
SP2:
(1,1) (2,1) (3,1) ... (nx_dealias/2,1)
(1,2) (2,2) (3,2) ... (nx_dealias/2,2)
...

The code can run on any number of processors up to the total number of vertical points (horizontal planes). There is some load minor imbalance if the number of processors equals the numbers of planes (due to implementing the boundary conditions), but this does not seem to affect performance too badly.

The other main parallelisation implementation is that of input/output (IO). The code avoids parallel IO problems by caching all its IO through separate processors, assigned solely for that task. The code is this broken down across processors into compute nodes and IO nodes. All data is input and output via the IO nodes and thus file coherency is maintained easily.

How to use the code

Use of the code requires compilation, editing of the input parameters and available options, and running on the parallel architecture.

Compilation

The distribution of the source code contains two directories, one called 'obj' and the other 'src'. The src directory contains the source Fortran files for the code. It contains a subdirectory with the machine name either 'T3E', 'O2K' or 'SP2'. These directories contain the Fortran source code for the FFT routines for that machine. The two sets of code are different. The obj directory will contain the object code and the executable after compilation. Before compilation it contains a makefile. The first thing that has to be done, is to specify the machine type and machine dependent switches. This is done in the file 'header.h' in src. The file is very simple and might look like

#define T3E
#define SHMEM

Change the name of the machine in the define statement to the desired machine, either T3E, O2K or SP2. Then in the obj directory, type "make". The first stage of the compilation runs the cpp macro processor and converts the machine-dependendent source code from macro format (e.g. file.m4) into Fortran 90 code (e.g. file.F). The make then continues and compiles the whole source and produces an executable called 'f-plane'.

Editing the parameters

The code requires a number of environment variables to be set and parameters to be input for the control of its operation. The environment variables that need to be set mainly control the redirection of the IO. The code attempts to write IO first to a temporary disk from the IO nodes, and then, when sufficient IO has been collected, to a mass storage device. The user is not required to prefetch startup files, nor save files after the job is finished -- these actions are all performed internally from within the code. To enable these operations, the code picks up the directories to and from which to write, and the method of writing via environment variables. The variables are set by the following commands:

setenv RGET user_rget
setenv RPUT user_rput
setenv RMKDIR user_rmkdir
setenv TMPDIR user_tmpdir
setenv PERMANENT_DIR user_permanent_dir
setenv CASEDIR user_casedir
setenv N_IO_NODES user_n_io_nodes
setenv NCPUS user_ncpus

RGET, RPUT and RMKDIR are varibles that point to the pathnames of procedures that store and retrieve a file from the mass store and make a directory there. These are user written procedures whose usage is of the form "user_rget mass_store_file local_file" and "user_rput local_file mass_store_file". Some examples of these procedures are included in the code distribution under the directory 'samples'.

TMPDIR defines the path of a temporary working space directory that has enough space to store the intermediate files that the IO produces. This space usually is called something like '/tmp/username'.

PERMANENT_DIR defines the path of the top level mass storage directory

CASEDIR defines the path of the current subdirectory within the mass storage. Usually under a single username (PERMANENT_DIR) on the mass store, a user will want to run many different jobs. These are distinguished, and the data kept separate, via subdirectories (CASEDIR)

N_IO_NODES defines the number of io nodes that will be used for cached io.

NCPUS defines the total number of processors to be used. NCPUS = N_IO_NODES + no. of compute nodes.

Once the environment varibales are set, the user must then decide on and set the various parameters and flags that control the particular topic and case to be studied. These control parameters and options are all set via NAMELIST definitions in a text-editable file called 'input' which the code reads on startup. An example of 'input' files is included in the 'samples' directory but might look something like the following:

 &problem_size_namelist
   nx     = 64
   ny     = 64
   nz     = 64
   x_max  = 1.
   y_max  = 1.
   height = 1.
   cx     = 0.66
   cy     = 0.66
/
 ¶meters_namelist
   ray_ref          = -10000.
   gamma            = 1.6666666666666666666
   sigma            = 1.0
   theta            = 2.
   zeta             = 1.0
   Q                = 10.
   polytropic_index = 1.6
   reference_height = 0.5
   safety_factor    = 0.2
   mflx0            = 1.0
   mflx1            = 0.0
/
 &controls_namelist
   restart           = 'checkpoint'
   n_iterations      = 15001
   n_start           = -1
   restart_directory = 'Tubes/Shear/Test_1/Checkpoint/010000'
   penetrative       = F
   polytrope         = T
   reference_model   = F
   nonlinear         = T
   rotation          = F
   Radiative_Bottom_Boundary = F
   magnetism         = T
   add_magnetic_fields = T
   magnetic_diffusion  = T
   N_magnetic_layers   = 1
   magnetic_geometry = 'Horizontal'
   magnetic_boundary = 'Inoutflux'
   mag_layer_bot     = 2
   mag_layer_top       = -1 
   mag_elbow_width     = 0.01
   Forcing = T
   forcing_param1 = 0.5
   forcing_param2 = 0.5
   forcing_param3 = 1.0
/
 &io_namelist
   scalar%frequency           = 50
      scalar%Records_per_file = 10
   average%frequency          = 50
      average%Records_per_file = 10
      average%offset        = 0	
   layers%frequency      = 1000
      layers%Records_per_file  = 2
      n_xy_slices = 3
      xy_slices   = 10, 32, 54
   bulk%frequency        = 250
      bulk%offset        = 25
   checkpt%frequency     = 250
   Make_Movie            = F
/
 &bob_namelist
  bob_magnetic_energy%frequency  = 50
  bob_magnetic_energy%estimated_min   =  0.
  bob_magnetic_energy%estimated_max   =  100.0
/
 &initial_cond_nml
   thermal_noise    = 0.001
   noise_top        = 5
   noise_bottom     = 60
/
 &penetrative_namelist
   Penetration_S = 15.
   Elbow_Width   = 0.05
   Elbow_Height  = 1.0
   Relax_Flux    = F
   Relax_Flux_Frequency = 100
   Relax_Flux_Lower_Limit = 260
   Relax_Flux_Sound_time = 0.5
   Relax_Flux_Thermal_Time = 0.05
/

The options are grouped by type within input in separate NAMELISTs. Each of theses groups, the individual options available and a brief description of their function is given below.

PROBLEM_SIZE_NAMELIST - parameters determining the size of the computation

nx no. of collocation points in the horizontal x (East-West) direction.

ny no. of collocation points in the horizontal y (North-South) direction.

nz no. of grid points in the vertical z (depth) direction.

x_max aspect ratio in the x direction, compared to the vertical dimension (unity) of the unstable layer

y_max aspect ratio in the y direction, compared to the vertical dimension (unity) of the unstable layer

height if the calculation is penetrative (i.e. includes a lower stable layer), this is the aspect ratio of the total z depth compared to the vertical dimension

cx dealiasing factor in the x direction; wavenumbers up to cx*kx_max will be calculated. Normally set to 2/3.

cy dealiasing factor in the y direction; wavenumbers up to cx*kx_max will be calculated. Normally set to 2/3.

PROBLEM_SIZE_NAMELIST - parameters determining the size of the computation
nx	no. of collocation points in the horizontal x (East-West) direction.
ny	no. of collocation points in the horizontal y (North-South) direction.
nz	no. of grid points in the vertical z (depth) direction.
x_max	aspect ratio in the x direction, compared to the vertical dimension (unity) of the unstable layer
y_max	aspect ratio in the y direction, compared to the vertical dimension (unity) of the unstable layer
height	if the calculation is penetrative (i.e. includes a lower stable layer), this is the aspect ratio of the total z depth compared to the vertical dimension
cx	dealiasing factor in the x direction; wavenumbers up to cx*kx_max will be calculated. Normally set to 2/3.
cy	dealiasing factor in the y direction; wavenumbers up to cx*kx_max will be calculated. Normally set to 2/3.

CONTROLS_NAMELIST- controls the nature of the MHD computation performed.

Variables Default Other values Comments
n_iterations integer Total number of iterations for code to run
n_start 1 0/-1 1 => restart from an iteration number
0 => restart iteration number taken from last entry of .log files
-1 => restart file directory given explicitly by restart_directory
restart_directory character If nstart=-1, the directory from which to get restart files
Restart 'none' 'checkpoint' / 'spatial_start' Startup method:
'none' => start from scratch
'checkpoint' => start from previous solution via checkpoint files.
'spatial_start' => start from previous solution via a regular spatial dump files.

Nonlinear On (True) Off (False) Include nonlinear terms or not
Three_Dimensional On(True) Off (False) Three- or two-dimensional calculation.
Rotation Off(False) On(True) Include f-plane rotation terms or not.
Penetrative On(False) On(True) Include penetrative region or not.
Polytrope True(On) False (Off) If restart is 'none', use a polytropic model
Reference_Model False(Off) True (On) If restart is 'none', use another reference model

Explicit_Temperature_Diffusion Off(False) On(True) Include implicit temperature solver or not.
viscous_heating Off(False) On(True) Include viscous heating terms or not.
Radiative_Bottom_Boundary Not implemented

Magnetism Off(False) On(True) Include magnetic fields or not
Ohmic_Heating On(True) Off (False) Include ohmic heating terms or not if magnetism is on.
Magnetic_Diffusion On(True) Off (False) Include magnetic diffusion terms or not if magnetism is on.
Lorentz_Forces On(True) Off (False) Include nonlinear Lorentz force terms or not if magnetism is on.
Dynamo Off(False) On (True) Change scaling to dynamo scaling of eqns or not if magnetism is on.
Magnetic_Boundary 'B = 0' 'No Flux' /'Vertical'/'Inoutflux' Magnetic boundary conditions:
'B=0' => B=0 at both boundaries
'No flux' => Bx=By=0, dBz/dz=0 at both boundaries
'Inoutflux' => Bx=By=0, dBz/dz = mflx_i at boundary i=0,zmx
Add_Magnetic_Fields False (Off) True(On) Add magnetic fields to a pure convection simulation or not.
Magnetic_Geometry 'None' 'Horizontal'/'Vertical' If adding magnetic fields, determines configuration.
'None' => no precified form
'Horizontal' => layers of magnetic field added (up to 10 layers)
'Vertical' => vertical magnetic field added everywhere
n_magnetic_layers integer (0-10) If magnetic layers added, number of layers to add.
mag_elbow_width(10) 0 float If magnetic layers added, width of smoothed elbow at edge of layer
mag_layer_bot(10) 0 float If magnetic layers added, z-positions of bottom of magnetic layers
mag_layer_top(10) 0 float If magnetic layers added, z-positions of top of magnetic layers
mag_layer_amp 1.0 float Normally amplitude set by Q, but can boost on input from 'spatial_start' by changing this value. Must set mag_fix_density too.
mag_fix_density Off(False) On(True) Adjusts density such that pressure balance is achieved when mag field added

Forcing Off (False) On(True) Include shear forcing terms in momentum eqn or not. Forcing function has a particular form hardwired into code in subroutine forcing_terms in module Physics.F.
forcing_param1 0 float depth of top of forcing function
forcing_param2 0 float amplitude of forcing
forcing_param3 0 float depth of bottom of forcing function

CONTROLS_NAMELIST- controls the nature of the MHD computation performed.
Variables	Default	Other values	Comments
n_iterations		integer	Total number of iterations for code to run
n_start	1	0/-1	1 => restart from an iteration number
			0 => restart iteration number taken from last entry of .log files
			-1 => restart file directory given explicitly by restart_directory
restart_directory		character	If nstart=-1, the directory from which to get restart files
Restart	'none'	'checkpoint' / 'spatial_start'	Startup method:
			'none' => start from scratch
			'checkpoint' => start from previous solution via checkpoint files.
			'spatial_start' => start from previous solution via a regular spatial dump files.
Nonlinear	On (True)	Off (False)	Include nonlinear terms or not
Three_Dimensional	On(True)	Off (False)	Three- or two-dimensional calculation.
Rotation	Off(False)	On(True)	Include f-plane rotation terms or not.
Penetrative	On(False)	On(True)	Include penetrative region or not.
Polytrope	True(On)	False (Off)	If restart is 'none', use a polytropic model
Reference_Model	False(Off)	True (On)	If restart is 'none', use another reference model
Explicit_Temperature_Diffusion	Off(False)	On(True)	Include implicit temperature solver or not.
viscous_heating	Off(False)	On(True)	Include viscous heating terms or not.
Radiative_Bottom_Boundary	Not implemented
Magnetism	Off(False)	On(True)	Include magnetic fields or not
Ohmic_Heating	On(True)	Off (False)	Include ohmic heating terms or not if magnetism is on.
Magnetic_Diffusion	On(True)	Off (False)	Include magnetic diffusion terms or not if magnetism is on.
Lorentz_Forces	On(True)	Off (False)	Include nonlinear Lorentz force terms or not if magnetism is on.
Dynamo	Off(False)	On (True)	Change scaling to dynamo scaling of eqns or not if magnetism is on.
Magnetic_Boundary	'B = 0'	'No Flux' /'Vertical'/'Inoutflux'	Magnetic boundary conditions:
			'B=0' => B=0 at both boundaries
			'No flux' => Bx=By=0, dBz/dz=0 at both boundaries
			'Inoutflux' => Bx=By=0, dBz/dz = mflx_i at boundary i=0,zmx
Add_Magnetic_Fields	False (Off)	True(On)	Add magnetic fields to a pure convection simulation or not.
Magnetic_Geometry	'None'	'Horizontal'/'Vertical'	If adding magnetic fields, determines configuration.
			'None' => no precified form
			'Horizontal' => layers of magnetic field added (up to 10 layers)
			'Vertical' => vertical magnetic field added everywhere
n_magnetic_layers		integer (0-10)	If magnetic layers added, number of layers to add.
mag_elbow_width(10)	0	float	If magnetic layers added, width of smoothed elbow at edge of layer
mag_layer_bot(10)	0	float	If magnetic layers added, z-positions of bottom of magnetic layers
mag_layer_top(10)	0	float	If magnetic layers added, z-positions of top of magnetic layers
mag_layer_amp	1.0	float	Normally amplitude set by Q, but can boost on input from 'spatial_start' by changing this value. Must set mag_fix_density too.
mag_fix_density	Off(False)	On(True)	Adjusts density such that pressure balance is achieved when mag field added
Forcing	Off (False)	On(True)	Include shear forcing terms in momentum eqn or not. Forcing function has a particular form hardwired into code in subroutine forcing_terms in module Physics.F.
forcing_param1	0	float	depth of top of forcing function
forcing_param2	0	float	amplitude of forcing
forcing_param3	0	float	depth of bottom of forcing function

PARAMETERS_NAMELIST - the specific physics parameters of the calculation

gamma the ratio of the specific heats
sigma the Prandlt number
theta the applied heat flux at the lower boundary; the reference polytrope is specified by T_0 = (1+theta z)
polytropic_index the polytropic index
reference_height the reference height at which depth dependent parameters are set and quoted (e.g. the Rayleigh number is Ra(z))
tay_ref the Taylor number
ray_ref the Rayleigh number
phi the latitude (in degrees)
Q the Chandresekhar number
zeta the magnetic Prandlt number (ratio of magnetic permittivity to thermal diffusivity at the top of the domain)
mflx0 the flux of magnetic field at the upper boundary - dependent on size of field there; dB/dz = mflx0 B
mflx1 the flux of magnetic field in at the lower boundary - constant; dB/dz = mflx1
Safety_Factor the Courant number or factor by which the estimated min step size is multiplied.

PARAMETERS_NAMELIST - the specific physics parameters of the calculation
gamma	the ratio of the specific heats
sigma	the Prandlt number
theta	the applied heat flux at the lower boundary; the reference polytrope is specified by T_0 = (1+theta z)
polytropic_index	the polytropic index
reference_height	the reference height at which depth dependent parameters are set and quoted (e.g. the Rayleigh number is Ra(z))
tay_ref	the Taylor number
ray_ref	the Rayleigh number
phi	the latitude (in degrees)
Q	the Chandresekhar number
zeta	the magnetic Prandlt number (ratio of magnetic permittivity to thermal diffusivity at the top of the domain)
mflx0	the flux of magnetic field at the upper boundary - dependent on size of field there; dB/dz = mflx0 B
mflx1	the flux of magnetic field in at the lower boundary - constant; dB/dz = mflx1
Safety_Factor	the Courant number or factor by which the estimated min step size is multiplied.

INITIAL_COND_NML_NAMELIST - initial conditions namelist

thermal_noise if anything other than zero, then it is the amplitude of a noise applied to the initial condition.
noise_bottom the deepest z level that noise is to be applied to.
noise_top the highest z level that noise is to be applied to.

INITIAL_COND_NML_NAMELIST - initial conditions namelist
thermal_noise	if anything other than zero, then it is the amplitude of a noise applied to the initial condition.
noise_bottom	the deepest z level that noise is to be applied to.
noise_top	the highest z level that noise is to be applied to.

PENETRATIVE_NAMELIST - parameters for penetrative convection

Penetration_S the degree of stiffness, S, of the penetrative layer, in terms of the polytropic coefficients: S = (m_upper - m_a)/(m_a - m_lower)
Elbow_Width the width of the smoothing elbow between the unstable and stable regions.
elbow_height 1 float the height (depth) at which the elbow (division between unstable and stable) occurs.
Relax_Flux Off(False) On (True) Includes a relaxation of the fluxes in the stable region or not.
Relax_Flux_Frequency Huge(1) integer The frequency of the relaxation.
Relax_Flux_coef
Relax_Flux_lower_limit
Relax_Flux_upper_limit
Relax_Flux_sound_time
Relax_Flux_thermal_time

PENETRATIVE_NAMELIST - parameters for penetrative convection
Penetration_S			the degree of stiffness, S, of the penetrative layer, in terms of the polytropic coefficients: S = (m_upper - m_a)/(m_a - m_lower)
Elbow_Width			the width of the smoothing elbow between the unstable and stable regions.
elbow_height	1	float	the height (depth) at which the elbow (division between unstable and stable) occurs.
Relax_Flux	Off(False)	On (True)	Includes a relaxation of the fluxes in the stable region or not.
Relax_Flux_Frequency		Huge(1)	integer The frequency of the relaxation.
Relax_Flux_coef
Relax_Flux_lower_limit
Relax_Flux_upper_limit
Relax_Flux_sound_time
Relax_Flux_thermal_time

IO_NAMELIST - variables controlling the rate and type of IO. The first five variables are of a user programmed Fortran 90 "type" and therefore in this case each contain sub variables (name,logfile,frequency, records_per_file, n_records_current, offset). Only the frequency and records_per_file should be set in the namelist in general ; the others are used to keep track of the current state of the files.

In this case, the sub variables that can be set for each variable are as follows

variable%frequency how often dumps are made to the "variable" file, measured in no. of steps
variable%records_per_file how many dumps are collected into one scalar file before storing the file
variable%offset the offset in steps - usually left at zero
The file_control variables with these sub-variables that can be set are

scalar
average
layers
bulk
checkpt
Other variables in this namelist are:

n_xy_slices the number of horizontal slices to be saved (max nz)
xy_slices array; the z positions of the xy slices
Make_Movie False (Off) True (On) Whether to activate the file byte scaling and movie making procedure (BoB)

IO_NAMELIST - variables controlling the rate and type of IO. The first five variables are of a user programmed Fortran 90 "type" and therefore in this case each contain sub variables (name,logfile,frequency, records_per_file, n_records_current, offset). Only the frequency and records_per_file should be set in the namelist in general ; the others are used to keep track of the current state of the files.
In this case, the sub variables that can be set for each variable are as follows
variable%frequency	how often dumps are made to the "variable" file, measured in no. of steps
variable%records_per_file	how many dumps are collected into one scalar file before storing the file
variable%offset	the offset in steps - usually left at zero
The file_control variables with these sub-variables that can be set are
scalar
average
layers
bulk
checkpt
Other variables in this namelist are:
n_xy_slices			the number of horizontal slices to be saved (max nz)
xy_slices			array; the z positions of the xy slices
Make_Movie	False (Off)	True (On)	Whether to activate the file byte scaling and movie making procedure (BoB)

BOB_NAMELIST - parameters for making byte scaled files for movies. These variables are a user progammed Fortran 90 "type" and have sub variables which need to be set. i

In this case, the sub variables that can be set for each variable are as follows:

variable%frequency the no of steps between dumps
variable%offset the offset in steps - usually left at zero
variable%estmated_min an estimated lower bound of the variable required for byte scaling
variable%estimated_max an estimated upper bound of the variable required for byte scaling

The bob_file_control variables with these sub-variables that can be set are:

BoB_Enstrophy
BoB_Kinetic_Flux
BoB_Enthalpy_Flux
BoB_Acoustic_Flux
BoB_Buoyancy_Work
BoB_Temperature_Fluct
BoB_Convective_Flux
BoB_Magnetic_Energy
BoB_Magnetic_Flux
BoB_Bx_Flux
BoB_By_Flux
BoB_Bz_Flux
BoB_Current_Density

BOB_NAMELIST - parameters for making byte scaled files for movies. These variables are a user progammed Fortran 90 "type" and have sub variables which need to be set. i
In this case, the sub variables that can be set for each variable are as follows:
variable%frequency	the no of steps between dumps
variable%offset	the offset in steps - usually left at zero
variable%estmated_min	an estimated lower bound of the variable required for byte scaling
variable%estimated_max	an estimated upper bound of the variable required for byte scaling
The bob_file_control variables with these sub-variables that can be set are:
BoB_Enstrophy
BoB_Kinetic_Flux
BoB_Enthalpy_Flux
BoB_Acoustic_Flux
BoB_Buoyancy_Work
BoB_Temperature_Fluct
BoB_Convective_Flux
BoB_Magnetic_Energy
BoB_Magnetic_Flux
BoB_Bx_Flux
BoB_By_Flux
BoB_Bz_Flux
BoB_Current_Density

Running the code

Having compiled the code and edited the input file to control the computation as desired, all that remains is to run the code. This of course will vary with the machine on which you are running.

Running on the Cray T3E

To run on the T3E, type "mpprun -n cpus f-plane" where ncpus is the number of processors to run on; ncpus must be equal to the number of compute nodes plus the number of IO nodes and therefore equal to the user supplied environment variable NCPUS. The code calculates the number of compute nodes as the total number of processors, NCPUS, minus the number of IO processors, N_IO_NODES. The IO nodes are the last in the numerical sequence of nodes.

On the T3E it is often more convenient to submit the job to run in the background via the NQS queues. This involves putting the various commands previously mentioned - setting the environment variables and "mpprun" running the code - in an NQS wrapper which sets the NQS parameters and determines the queue. An example of this type of batch job is given in the 'samples' directory of the distribution. Here follows an example of a file that could be submitted using qsub:

{start of file}
#QSUB -s /bin/csh
#QSUB -q t3e_128
#QSUB -l mpp_p=129
#QSUB -l mpp_t=1200
#QSUB -me
#QSUB -eo
#QSUB -nr
#QSUB -x
#
cd $QSUB_WORKDIR
#
#set the environment variables
#
setenv CASEDIR Fdynamo/Test9
setenv NCPUS 129
setenv N_IO_NODES 1
setenv RGET 'jsimpget'
setenv RPUT 'jsimpput'
setenv PERMANENT_DIR /scr/brummell
setenv TMPDIR /tmp/brummell/$CASEDIR
setenv HOME_DIR /u4/brummell
#
setenv MPI_SM_POOL 1000000
if (-e precompute) mv precompute precompute.previous
#
mkdir -p $TMPDIR
#
$RGET $PERMANENT_DIR/Programs/f-plane $TMPDIR
#
# run
#
date
ja
cp restart.in input
touch_log
mpprun -n $NCPUS $TMPDIR/f-plane 
ja -st
{end of file}

This would then be submitted using qsub.

Running on the IBM SP2

On the IBM SP machines, the above type of scripts also work except that the heading code has to be adapted for the LoadLeveller queuing system instead of NQS, and the command to run the code invokes poe rather than mpprun. Here is an example (more in SAMPLES):

{start of file}
#!/usr/bin/csh
# @ environment = COPY_ALL; MP_EUILIB=us; MP_SHARED_MEMORY=yes; MP_PULSE=0; MP_INTRDELAY=100;
MP_STDOUTMODE=ordered
# @ output = hps_out.$(jobid)
# @ error = hps_err.$(jobid)
# @ initialdir = /rmount/paci/ucol/ux450250/Penetrative/S=3b_vvvhi
# @ job_type = parallel
# @ notify_user = brummell@solarz.colorado.edu
# @ network.MPI = css0,not_shared,US
# @ node = 72
# @ node_usage = not_shared
# @ tasks_per_node = 8
# @ notification = error
# @ notification = complete
# @ wall_clock_limit = 36:00:00
# @ class = high
# @ queue
#
setenv MP_PULSE 0
setenv MP_INTRDELAY 100
#
setenv HOME_DIR /rmount/paci/ucol/ux450250
setenv PERMANENT_DIR .
setenv CASEDIR Penetrative/S=3b_vvvhi
setenv TMPDIR /gpfs/ux450250/$CASEDIR
setenv NCPUS 576
setenv N_IO_NODES 1
setenv RGET "/paci/ucol/ux450250/bin/sdscget"
setenv RPUT "/paci/ucol/ux450250/bin/sdscput"
setenv RMKDIR "/paci/ucol/ux450250/bin/mymkdir"
#
mkdir -p $TMPDIR
echo "Time at start of job ..."
date
~/bin/touch_log
#
rm $TMPDIR/*
cp $HOME_DIR/Programs/f-plane_sp2 $TMPDIR/f-plane
chmod a+x $TMPDIR/f-plane
#
cp restart.in input
rm .running precompute
poe $TMPDIR/f-plane
{end of file}

This would then be submitted using llsubmit.

Output of the code: method of storage and results format

Monitoring

The code creates two files in the directory from which it was submitted. The first, called 'precompute' contains a list file of the parameters used for the current calculation and logs the progress of the simulation by writing a few statistics (time, step, energy values etc) out to that file every 50 steps or so. This file can be monitored to check on the progress of the simulation. Another file, anmes '.running', is created at the beginning of the job and contains a positive integer. The code checks this file periodically and if the integer has been changed to a negative value, then the simulation aborts. This is a clean way of aborting the simulation without simply killing the job. The code also has a subroutine called 'debug' which buffers informational output from within the code to a series of status files, one for each processor, named 'statusnnn', where nnn is the 3 digit integer number of the processor. These files are stored in the temporary directory TMPDIR and remain after the completion of the job. These files are intended for debugging purposes and can give information as to the status of all processors and the position of the code at the time of an abort.

Output

The code stores output by first accumulating it in TMPDIR and then internally writing it to the mass storage device using RPUT. Files are written in the mass storage under the top level directory given by PERMANENT_DIR in the CASEDIR subdirectory. IO is organised by type in subdirectories. The subdirectories are

scalar controlled by the 'scalar' file_control function
bulk controlled by the 'bulk' file_control function
checkpoint controlled by the 'checkpoint' file_control function
flux controlled by the 'averages' file_control function
x-y_layer controlled by the 'layers' file_control function
x-z_layer controlled by the 'layers' file_control function
y-z_layer controlled by the 'layers' file_control function

Under these subdirectories, the data is stored referenced by iteration number. For the 'bulk' and 'checkpoint' directories, the iteration number is a directory which contains the required information. For the other types of IO, the iteration number is a file itself containing the relevent information.

The data contained in each type is as follows, described assuming that magnetism is 'on' (if not then drop all the magnetic quantities):

Bulk

This directory contains full dump output of all the configuration space fields calculation. All files are stored (nx,ny,nz).

u horizontal velocity in the x (East-West) direction
v horizontal velocity in the y (North-South) direction
w vertical velocity
temperature temperature
density density
Bx horizontal component of the magnetic field in the x direction
By horizontal component of the magnetic field in the y direction
Bz vertical component of the magnetic field

Checkpoint

This directory contains files that are in native format to the running machine. The two previous levels of the AB3 scheme are stored and are files containing (nz,nx*ny) (T3E) or (nz,ny,nx) (SP2) complex numbers.

description a file containing the parameters required to restart the calculation
rho_u_force x direction AB3 level 2 momentum forces
rho_u_force2 x direction AB3 level 3 momentum forces
rho_v_force y direction AB3 level 2 momentum forces
rho_v_force2 y direction AB3 level 3 momentum forces
rho_w_force z direction AB3 level 2 momentum forces
rho_w_force2 z direction AB3 level 3 momentum forces
temperature_force AB3 level 2 temperature forces
temperature_force2 AB3 level 3 temperature forces
density_force AB3 level 2 density forces
density_force2 AB3 level 3 density forces
bx_force x direction AB3 level 2 magnetic field forces
bx_force2 x direction AB3 level 3 magnetic field forces
by_force y direction AB3 level 2 magnetic field forces
by_force2 y direction AB3 level 3 magnetic field forces
bz_force z direction AB3 level 2 magnetic field forces
bz_force2 z direction AB3 level 3 magnetic field forces

Layer sub-directories

Each iteration number stored, n*layers%frequency*layers%records_per_file, n=0,1,2, ... is a file (n1,n2,nvariables,nlayers,layers%records_per_file) containing planes of data for each variable calculated at a number of positions dumped every layers%frequency steps.

For x-y_layers, n1=nx,n2=ny, nlayers = n_xy_layers and the planes dumped are dictated by the array xy_planes.

For x-z_layers, n1=nx,n2=nz, nlayers = 1 and the plane dumped is y=0

For y-z_layers, n1=ny,n2=nz, nlayers = 1 and the plane dumped is x=0 The variables are

u
v
w
temperature
Bx
By
Bz

Scalar

Each iteration number stored, n*scalar%frequency*scalar%records_per_file, n=0,1,2, ... is a file (40,scalar%records_per_file) containing 40 scalar time traces dumped every scalar%frequency steps. The scalars currently stored are

nstep
total_time
u_min
u_max
momentum(1)
v_min
v_max
momentum(2)
w_min
w_max
 momentum(3) 
kinetic_energy
max_kinetic_energy
 mach_no(3)
bx_min
bx_max
b_avg(1)
by_min
by_max
b_avg(2)
bz_min
bz_max
b_avg(3)
Magnetic_Energy
Max_Magnetic_Pressure

Flux

Each iteration number stored, n*average%frequency*average%records_per_file, n=0,1,2, ... is a file (nz,40,6,average%records_per_file) containing 6 moments of 40 variables dumped every average%frequency steps. The 6 moments are the average, the second, third and fourth moments of the field, and the min and max of the field. The fluxes are

rho_u the x direction horizontal momentum
rho_v the y direction horizontal momentum
rho_ w the vertical momentum
t_ avg the mean temperature
rho_avg the mean density
u_avg the average x direction velocity
v_avg the average y direction velocity
w_avg the average vertical velocity
ke the kinetic energy
ke_flux the kinetic energy flux
P_flux the pressure flux
visc_flux the viscous flux
buoy_work the bouyancy work
P the pressure
P_work the pressure work
enth_flux the enthalpy flux
rad_flux the radiative flux
entropy the entropy
rho_u_w the x direction Reynolds stress
rho_v_w the y direction Reynolds stress
rho_u_T the x direction heat flux
rho_v_T the y direction heat flux
tot_flux the total flux
bx_avg the mean magnetic field in the x direction
by_avg the mean magnetic field in the y direction
bz_avg the mean vertical magnetic field
bx_flux the magnetic flux in the x direction
by_flux the magnetic flux in the y direction
bz_flux the magnetic flux in the vertical direction
me the mechanical energy
me_flux the mechanical energy flux
time the time
dtime the time step

List of routines and their function

scalar	controlled by the 'scalar' file_control function
bulk	controlled by the 'bulk' file_control function
checkpoint	controlled by the 'checkpoint' file_control function
flux	controlled by the 'averages' file_control function
x-y_layer	controlled by the 'layers' file_control function
x-z_layer	controlled by the 'layers' file_control function
y-z_layer	controlled by the 'layers' file_control function

u	horizontal velocity in the x (East-West) direction
v	horizontal velocity in the y (North-South) direction
w	vertical velocity
temperature	temperature
density	density
Bx	horizontal component of the magnetic field in the x direction
By	horizontal component of the magnetic field in the y direction
Bz	vertical component of the magnetic field

description	a file containing the parameters required to restart the calculation
rho_u_force	x direction AB3 level 2 momentum forces
rho_u_force2	x direction AB3 level 3 momentum forces
rho_v_force	y direction AB3 level 2 momentum forces
rho_v_force2	y direction AB3 level 3 momentum forces
rho_w_force	z direction AB3 level 2 momentum forces
rho_w_force2	z direction AB3 level 3 momentum forces
temperature_force	AB3 level 2 temperature forces
temperature_force2	AB3 level 3 temperature forces
density_force	AB3 level 2 density forces
density_force2	AB3 level 3 density forces
bx_force	x direction AB3 level 2 magnetic field forces
bx_force2	x direction AB3 level 3 magnetic field forces
by_force	y direction AB3 level 2 magnetic field forces
by_force2	y direction AB3 level 3 magnetic field forces
bz_force	z direction AB3 level 2 magnetic field forces
bz_force2	z direction AB3 level 3 magnetic field forces

rho_u	the x direction horizontal momentum
rho_v	the y direction horizontal momentum
rho_ w	the vertical momentum
t_ avg	the mean temperature
rho_avg	the mean density
u_avg	the average x direction velocity
v_avg	the average y direction velocity
w_avg	the average vertical velocity
ke	the kinetic energy
ke_flux	the kinetic energy flux
P_flux	the pressure flux
visc_flux	the viscous flux
buoy_work	the bouyancy work
P	the pressure
P_work	the pressure work
enth_flux	the enthalpy flux
rad_flux	the radiative flux
entropy	the entropy
rho_u_w	the x direction Reynolds stress
rho_v_w	the y direction Reynolds stress
rho_u_T	the x direction heat flux
rho_v_T	the y direction heat flux
tot_flux	the total flux
bx_avg	the mean magnetic field in the x direction
by_avg	the mean magnetic field in the y direction
bz_avg	the mean vertical magnetic field
bx_flux	the magnetic flux in the x direction
by_flux	the magnetic flux in the y direction
bz_flux	the magnetic flux in the vertical direction
me	the mechanical energy
me_flux	the mechanical energy flux
time	the time
dtime	the time step

NASA HPCC/ESS Cooperative Agreement: Turbulent convection and Dynamos in Stars Documentation for the HPS code