Documentation for the HPS code
The domain is a Cartesian rectilinear box, periodic in the two horizontal directions, and bounded by stress-free plates in the vertical. The temperature is fixed at the upper boundary, whereas there is a fixed input thermal flux at the lower boundary. The magnetic field is allowed various configurations at the boundaries, including no field, no flux, vertical field and input flux options. All the boundary conditions in the vertical direction are easily changeable, whereas the condition of periodicity in the horizontal must be strictly retained.
The time discretisation is different for the different equations and for the linear and nonlinear terms. All the nonlinear terms are calculated by a 3-level Adams-Bashforth (AB) update (except for the very first and second steps which are Euler and 2-level AB respectively). The linear terms of the momentum equation are updated via a simple forward Euler step which has the most lenient timestep restriction for an explicit diffusive step. The linear terms of the induction equation are either done this way, or may be changed via a flag such that the horizontal components of diffusion are instead done by an integrating factor method. The diffusive terms of the temperature equation are updated implicitly via a Crank-Nicolson formulation since this is the term that will most restrict the timestep (via the density denominator in the diffusion coefficient, which becomes very small at the top of the domain).
The magnetic field is stored and updated as poloidal and toroidal potential functions thus ensuring a divergence free field. The nonlinear terms, being calculated in configuration space, are calculated from the original field components and then suitable derivatives taken, so that poloidal and toroidal representations are only needed in phase space. It is not clear whether carrying around the higher derivatives required significantly affects the accuracy of the code. The mean fields are not representable via a poloidal and toroidal representation and so these must be carried around independently and are kept in configuration space. Physically however, they are kept in the k=0 components of the toroidal and poloidal phase space complex arrays.
The code is written in Fortran 90 and makes use of many of the advanced features of that language. The memory is dynamically assigned at run time in particular.
configuration space : X(my_z%min:my_z%max,:,:)
spectral space : X(:,my_k%min:my_k%max)
Vertical derivatives could be calculated across processors, but instead, the whole domain is transformed such that the vertical direction lies in processor before any vertical computations are completed.
The whole efficiency of the code depends on the implementation of the FFT and the global transpose associated with it. For this reason, these parts of the code are the most machine-specific. For each machine that the code currently operates on, there exists a subdirectory of the src directory with the machine name. This directory contains the fft routine for each machine, and other machines specific routines. This routine is sucked into the FourierTransform.F routine on compilation by the macros processors. Which machine to use is set by a C-preprocessor (cpp) flag in the file header.h
e.g.
#define T3E
The currently available machine types are:
T3E -- Cray T3E
O2K -- Cray/SGI Origin 2000
SP2 -- IBM SP series
The efficient implementation of the code is achieved by using the most efficient available library fft, and by doing the global transpose as well as possible. On the T3E and the O2K, the standard Craylib 1-D fft's are used. On the SP2, the ESSL 2-D fft is used. The Cray machines (T3E in particular) have optimal caching (E-registers) and direct memory transfer routines (shmem routines) which can speed up the transpose and other parts of the code. The code can be switched between pure MPI communication and the use of these machine-dependent routines via switches in header.h
e.g. #define SHMEM
E-registers (and shmem routines) are also used on the T3E to help the cache flow of the code in the other main sticking point for speed: in the formulation of the nonlinear terms. Since this type of code is limited by cache performance, careful attention has been paid to cache management thoughout the code.
It should be noted that since the ffts are machine-dependent, the
ordering of the wavenumbers in spectral space can be different on
the different machines. This can make direct comparison of the
spectral arrays between say the T3E and the SP2 confusing to say the
least! The orderings in one plane, in terms of (kx,ky) are as follows:
T3E:
(1,1) (1,2) (1,3) ... (1,ny_dealias)
(2,1) (2,2) (2,3) ... (2,ny_dealias)
...
SP2:
(1,1) (2,1) (3,1) ... (nx_dealias/2,1)
(1,2) (2,2) (3,2) ... (nx_dealias/2,2)
...
The code can run on any number of processors up to the total number of vertical points (horizontal planes). There is some load minor imbalance if the number of processors equals the numbers of planes (due to implementing the boundary conditions), but this does not seem to affect performance too badly.
The other main parallelisation implementation is that of input/output (IO). The code avoids parallel IO problems by caching all its IO through separate processors, assigned solely for that task. The code is this broken down across processors into compute nodes and IO nodes. All data is input and output via the IO nodes and thus file coherency is maintained easily.
#define T3E
#define SHMEM
Change the name of the machine in the define statement to the desired machine, either T3E, O2K or SP2. Then in the obj directory, type "make". The first stage of the compilation runs the cpp macro processor and converts the machine-dependendent source code from macro format (e.g. file.m4) into Fortran 90 code (e.g. file.F). The make then continues and compiles the whole source and produces an executable called 'f-plane'.
setenv RGET user_rget setenv RPUT user_rput setenv RMKDIR user_rmkdir setenv TMPDIR user_tmpdir setenv PERMANENT_DIR user_permanent_dir setenv CASEDIR user_casedir setenv N_IO_NODES user_n_io_nodes setenv NCPUS user_ncpusRGET, RPUT and RMKDIR are varibles that point to the pathnames of procedures that store and retrieve a file from the mass store and make a directory there. These are user written procedures whose usage is of the form "user_rget mass_store_file local_file" and "user_rput local_file mass_store_file". Some examples of these procedures are included in the code distribution under the directory 'samples'.
TMPDIR defines the path of a temporary working space directory that has enough space to store the intermediate files that the IO produces. This space usually is called something like '/tmp/username'.
PERMANENT_DIR defines the path of the top level mass storage directory
CASEDIR defines the path of the current subdirectory within the mass storage. Usually under a single username (PERMANENT_DIR) on the mass store, a user will want to run many different jobs. These are distinguished, and the data kept separate, via subdirectories (CASEDIR)
N_IO_NODES defines the number of io nodes that will be used for cached io.
NCPUS defines the total number of processors to be used. NCPUS = N_IO_NODES + no. of compute nodes.
Once the environment varibales are set, the user must then decide on and set the various parameters and flags that control the particular topic and case to be studied. These control parameters and options are all set via NAMELIST definitions in a text-editable file called 'input' which the code reads on startup. An example of 'input' files is included in the 'samples' directory but might look something like the following:
&problem_size_namelist nx = 64 ny = 64 nz = 64 x_max = 1. y_max = 1. height = 1. cx = 0.66 cy = 0.66 / ¶meters_namelist ray_ref = -10000. gamma = 1.6666666666666666666 sigma = 1.0 theta = 2. zeta = 1.0 Q = 10. polytropic_index = 1.6 reference_height = 0.5 safety_factor = 0.2 mflx0 = 1.0 mflx1 = 0.0 / &controls_namelist restart = 'checkpoint' n_iterations = 15001 n_start = -1 restart_directory = 'Tubes/Shear/Test_1/Checkpoint/010000' penetrative = F polytrope = T reference_model = F nonlinear = T rotation = F Radiative_Bottom_Boundary = F magnetism = T add_magnetic_fields = T magnetic_diffusion = T N_magnetic_layers = 1 magnetic_geometry = 'Horizontal' magnetic_boundary = 'Inoutflux' mag_layer_bot = 2 mag_layer_top = -1 mag_elbow_width = 0.01 Forcing = T forcing_param1 = 0.5 forcing_param2 = 0.5 forcing_param3 = 1.0 / &io_namelist scalar%frequency = 50 scalar%Records_per_file = 10 average%frequency = 50 average%Records_per_file = 10 average%offset = 0 layers%frequency = 1000 layers%Records_per_file = 2 n_xy_slices = 3 xy_slices = 10, 32, 54 bulk%frequency = 250 bulk%offset = 25 checkpt%frequency = 250 Make_Movie = F / &bob_namelist bob_magnetic_energy%frequency = 50 bob_magnetic_energy%estimated_min = 0. bob_magnetic_energy%estimated_max = 100.0 / &initial_cond_nml thermal_noise = 0.001 noise_top = 5 noise_bottom = 60 / &penetrative_namelist Penetration_S = 15. Elbow_Width = 0.05 Elbow_Height = 1.0 Relax_Flux = F Relax_Flux_Frequency = 100 Relax_Flux_Lower_Limit = 260 Relax_Flux_Sound_time = 0.5 Relax_Flux_Thermal_Time = 0.05 /The options are grouped by type within input in separate NAMELISTs. Each of theses groups, the individual options available and a brief description of their function is given below.
PROBLEM_SIZE_NAMELIST - parameters determining the size of the
computation |
|
---|---|
nx | no. of collocation points in the horizontal x (East-West) direction. |
ny | no. of collocation points in the horizontal y (North-South) direction. |
nz | no. of grid points in the vertical z (depth) direction. |
x_max | aspect ratio in the x direction, compared to the vertical dimension (unity) of the unstable layer |
y_max | aspect ratio in the y direction, compared to the vertical dimension (unity) of the unstable layer |
height | if the calculation is penetrative (i.e. includes a lower stable layer), this is the aspect ratio of the total z depth compared to the vertical dimension |
cx | dealiasing factor in the x direction; wavenumbers up to cx*kx_max will be calculated. Normally set to 2/3. |
cy | dealiasing factor in the y direction; wavenumbers up to cx*kx_max will be calculated. Normally set to 2/3. |
CONTROLS_NAMELIST- controls the nature of the MHD computation
performed. |
|||
---|---|---|---|
Variables | Default | Other values | Comments |
n_iterations | integer | Total number of iterations for code to run | |
n_start | 1 | 0/-1 | 1 => restart from an iteration number |
0 => restart iteration number taken from last entry of .log files | |||
-1 => restart file directory given explicitly by restart_directory | |||
restart_directory | character | If nstart=-1, the directory from which to get restart files | |
Restart | 'none' | 'checkpoint' / 'spatial_start' | Startup method: |
'none' => start from scratch | |||
'checkpoint' => start from previous solution via checkpoint files. | |||
'spatial_start' => start from previous solution via a regular spatial dump files. | |||
Nonlinear | On (True) | Off (False) | Include nonlinear terms or not |
Three_Dimensional | On(True) | Off (False) | Three- or two-dimensional calculation. |
Rotation | Off(False) | On(True) | Include f-plane rotation terms or not. |
Penetrative | On(False) | On(True) | Include penetrative region or not. |
Polytrope | True(On) | False (Off) | If restart is 'none', use a polytropic model |
Reference_Model | False(Off) | True (On) | If restart is 'none', use another reference model |
Explicit_Temperature_Diffusion | Off(False) | On(True) | Include implicit temperature solver or not. |
viscous_heating | Off(False) | On(True) | Include viscous heating terms or not. |
Radiative_Bottom_Boundary | Not implemented | ||
Magnetism | Off(False) | On(True) | Include magnetic fields or not |
Ohmic_Heating | On(True) | Off (False) | Include ohmic heating terms or not if magnetism is on. |
Magnetic_Diffusion | On(True) | Off (False) | Include magnetic diffusion terms or not if magnetism is on. |
Lorentz_Forces | On(True) | Off (False) | Include nonlinear Lorentz force terms or not if magnetism is on. |
Dynamo | Off(False) | On (True) | Change scaling to dynamo scaling of eqns or not if magnetism is on. |
Magnetic_Boundary | 'B = 0' | 'No Flux' /'Vertical'/'Inoutflux' | Magnetic boundary conditions: |
'B=0' => B=0 at both boundaries | |||
'No flux' => Bx=By=0, dBz/dz=0 at both boundaries | |||
'Inoutflux' => Bx=By=0, dBz/dz = mflx_i at boundary i=0,zmx | |||
Add_Magnetic_Fields | False (Off) | True(On) | Add magnetic fields to a pure convection simulation or not. |
Magnetic_Geometry | 'None' | 'Horizontal'/'Vertical' | If adding magnetic fields, determines configuration. |
'None' => no precified form | |||
'Horizontal' => layers of magnetic field added (up to 10 layers) | |||
'Vertical' => vertical magnetic field added everywhere | |||
n_magnetic_layers | integer (0-10) | If magnetic layers added, number of layers to add. | |
mag_elbow_width(10) | 0 | float | If magnetic layers added, width of smoothed elbow at edge of layer |
mag_layer_bot(10) | 0 | float | If magnetic layers added, z-positions of bottom of magnetic layers |
mag_layer_top(10) | 0 | float | If magnetic layers added, z-positions of top of magnetic layers |
mag_layer_amp | 1.0 | float | Normally amplitude set by Q, but can boost on input from 'spatial_start' by changing this value. Must set mag_fix_density too. |
mag_fix_density | Off(False) | On(True) | Adjusts density such that pressure balance is achieved when mag field added |
Forcing | Off (False) | On(True) | Include shear forcing terms in momentum eqn or not. Forcing function has a particular form hardwired into code in subroutine forcing_terms in module Physics.F. |
forcing_param1 | 0 | float | depth of top of forcing function |
forcing_param2 | 0 | float | amplitude of forcing |
forcing_param3 | 0 | float | depth of bottom of forcing function |
PARAMETERS_NAMELIST - the specific physics parameters of the calculation
|
|||
---|---|---|---|
gamma | the ratio of the specific heats | ||
sigma | the Prandlt number | ||
theta | the applied heat flux at the lower boundary; the reference polytrope is specified by T_0 = (1+theta z) | ||
polytropic_index | the polytropic index | ||
reference_height | the reference height at which depth dependent parameters are set and quoted (e.g. the Rayleigh number is Ra(z)) | ||
tay_ref | the Taylor number | ||
ray_ref | the Rayleigh number | ||
phi | the latitude (in degrees) | ||
Q | the Chandresekhar number | ||
zeta | the magnetic Prandlt number (ratio of magnetic permittivity to thermal diffusivity at the top of the domain) | ||
mflx0 | the flux of magnetic field at the upper boundary - dependent on size of field there; dB/dz = mflx0 B | ||
mflx1 | the flux of magnetic field in at the lower boundary - constant; dB/dz = mflx1 | ||
Safety_Factor | the Courant number or factor by which the estimated min step size is multiplied. |
INITIAL_COND_NML_NAMELIST - initial conditions namelist
|
|||
---|---|---|---|
thermal_noise | if anything other than zero, then it is the amplitude of a noise applied to the initial condition. | ||
noise_bottom | the deepest z level that noise is to be applied to. | ||
noise_top | the highest z level that noise is to be applied to. |
PENETRATIVE_NAMELIST - parameters for penetrative convection
|
|||
---|---|---|---|
Penetration_S | the degree of stiffness, S, of the penetrative layer, in terms of the polytropic coefficients: S = (m_upper - m_a)/(m_a - m_lower) | ||
Elbow_Width | the width of the smoothing elbow between the unstable and stable regions. | ||
elbow_height | 1 | float | the height (depth) at which the elbow (division between unstable and stable) occurs. |
Relax_Flux | Off(False) | On (True) | Includes a relaxation of the fluxes in the stable region or not. |
Relax_Flux_Frequency | Huge(1) | integer The frequency of the relaxation. | |
Relax_Flux_coef | |||
Relax_Flux_lower_limit | |||
Relax_Flux_upper_limit | |||
Relax_Flux_sound_time | |||
Relax_Flux_thermal_time | |||
IO_NAMELIST - variables controlling the rate and type of IO. The first five variables are of a user
programmed Fortran 90 "type" and therefore in this case each contain sub variables
(name,logfile,frequency, records_per_file, n_records_current, offset). Only the frequency and
records_per_file should be set in the namelist in general ; the others are used to keep track
of the current state of the files.
|
||||
---|---|---|---|---|
In this case, the sub variables that can be set for each variable are as follows | ||||
variable%frequency | how often dumps are made to the "variable" file, measured in no. of steps | |||
variable%records_per_file | how many dumps are collected into one scalar file before storing the file | |||
variable%offset | the offset in steps - usually left at zero | |||
The file_control variables with these sub-variables that can be set are | ||||
scalar | ||||
average | ||||
layers | ||||
bulk | ||||
checkpt | ||||
Other variables in this namelist are: | ||||
n_xy_slices | the number of horizontal slices to be saved (max nz) | |||
xy_slices | array; the z positions of the xy slices | |||
Make_Movie | False (Off) | True (On) | Whether to activate the file byte scaling and movie making procedure (BoB) |
BOB_NAMELIST - parameters for making byte scaled files for movies.
These variables are a user progammed Fortran 90 "type" and have sub variables which need to be set. i
|
||
---|---|---|
In this case, the sub variables that can be set for each variable are as follows: | ||
variable%frequency | the no of steps between dumps | |
variable%offset | the offset in steps - usually left at zero | |
variable%estmated_min | an estimated lower bound of the variable required for byte scaling | |
variable%estimated_max | an estimated upper bound of the variable required for byte scaling | |
The bob_file_control variables with these sub-variables that can be set are: | ||
BoB_Enstrophy | ||
BoB_Kinetic_Flux | ||
BoB_Enthalpy_Flux | ||
BoB_Acoustic_Flux | ||
BoB_Buoyancy_Work | ||
BoB_Temperature_Fluct | ||
BoB_Convective_Flux | ||
BoB_Magnetic_Energy | ||
BoB_Magnetic_Flux | ||
BoB_Bx_Flux | ||
BoB_By_Flux | ||
BoB_Bz_Flux | ||
BoB_Current_Density | ||
On the T3E it is often more convenient to submit the job to run in the background via the NQS queues. This involves putting the various commands previously mentioned - setting the environment variables and "mpprun" running the code - in an NQS wrapper which sets the NQS parameters and determines the queue. An example of this type of batch job is given in the 'samples' directory of the distribution. Here follows an example of a file that could be submitted using qsub:
{start of file} #QSUB -s /bin/csh #QSUB -q t3e_128 #QSUB -l mpp_p=129 #QSUB -l mpp_t=1200 #QSUB -me #QSUB -eo #QSUB -nr #QSUB -x # cd $QSUB_WORKDIR # #set the environment variables # setenv CASEDIR Fdynamo/Test9 setenv NCPUS 129 setenv N_IO_NODES 1 setenv RGET 'jsimpget' setenv RPUT 'jsimpput' setenv PERMANENT_DIR /scr/brummell setenv TMPDIR /tmp/brummell/$CASEDIR setenv HOME_DIR /u4/brummell # setenv MPI_SM_POOL 1000000 if (-e precompute) mv precompute precompute.previous # mkdir -p $TMPDIR # $RGET $PERMANENT_DIR/Programs/f-plane $TMPDIR # # run # date ja cp restart.in input touch_log mpprun -n $NCPUS $TMPDIR/f-plane ja -st {end of file}This would then be submitted using qsub.
{start of file} #!/usr/bin/csh # @ environment = COPY_ALL; MP_EUILIB=us; MP_SHARED_MEMORY=yes; MP_PULSE=0; MP_INTRDELAY=100; MP_STDOUTMODE=ordered # @ output = hps_out.$(jobid) # @ error = hps_err.$(jobid) # @ initialdir = /rmount/paci/ucol/ux450250/Penetrative/S=3b_vvvhi # @ job_type = parallel # @ notify_user = brummell@solarz.colorado.edu # @ network.MPI = css0,not_shared,US # @ node = 72 # @ node_usage = not_shared # @ tasks_per_node = 8 # @ notification = error # @ notification = complete # @ wall_clock_limit = 36:00:00 # @ class = high # @ queue # setenv MP_PULSE 0 setenv MP_INTRDELAY 100 # setenv HOME_DIR /rmount/paci/ucol/ux450250 setenv PERMANENT_DIR . setenv CASEDIR Penetrative/S=3b_vvvhi setenv TMPDIR /gpfs/ux450250/$CASEDIR setenv NCPUS 576 setenv N_IO_NODES 1 setenv RGET "/paci/ucol/ux450250/bin/sdscget" setenv RPUT "/paci/ucol/ux450250/bin/sdscput" setenv RMKDIR "/paci/ucol/ux450250/bin/mymkdir" # mkdir -p $TMPDIR echo "Time at start of job ..." date ~/bin/touch_log # rm $TMPDIR/* cp $HOME_DIR/Programs/f-plane_sp2 $TMPDIR/f-plane chmod a+x $TMPDIR/f-plane # cp restart.in input rm .running precompute poe $TMPDIR/f-plane {end of file}This would then be submitted using llsubmit.
scalar | controlled by the 'scalar' file_control function |
bulk | controlled by the 'bulk' file_control function |
checkpoint | controlled by the 'checkpoint' file_control function |
flux | controlled by the 'averages' file_control function |
x-y_layer | controlled by the 'layers' file_control function |
x-z_layer | controlled by the 'layers' file_control function |
y-z_layer | controlled by the 'layers' file_control function |
Under these subdirectories, the data is stored referenced by iteration number. For the 'bulk' and 'checkpoint' directories, the iteration number is a directory which contains the required information. For the other types of IO, the iteration number is a file itself containing the relevent information.
The data contained in each type is as follows, described assuming that magnetism is 'on' (if not then drop all the magnetic quantities):
u | horizontal velocity in the x (East-West) direction |
v | horizontal velocity in the y (North-South) direction |
w | vertical velocity |
temperature | temperature |
density | density |
Bx | horizontal component of the magnetic field in the x direction |
By | horizontal component of the magnetic field in the y direction |
Bz | vertical component of the magnetic field |
description | a file containing the parameters required to restart the calculation |
rho_u_force | x direction AB3 level 2 momentum forces |
rho_u_force2 | x direction AB3 level 3 momentum forces |
rho_v_force | y direction AB3 level 2 momentum forces |
rho_v_force2 | y direction AB3 level 3 momentum forces |
rho_w_force | z direction AB3 level 2 momentum forces |
rho_w_force2 | z direction AB3 level 3 momentum forces |
temperature_force | AB3 level 2 temperature forces |
temperature_force2 | AB3 level 3 temperature forces |
density_force | AB3 level 2 density forces |
density_force2 | AB3 level 3 density forces |
bx_force | x direction AB3 level 2 magnetic field forces |
bx_force2 | x direction AB3 level 3 magnetic field forces |
by_force | y direction AB3 level 2 magnetic field forces |
by_force2 | y direction AB3 level 3 magnetic field forces |
bz_force | z direction AB3 level 2 magnetic field forces |
bz_force2 | z direction AB3 level 3 magnetic field forces |
For x-y_layers, n1=nx,n2=ny, nlayers = n_xy_layers and the planes dumped are dictated by the array xy_planes.
For x-z_layers, n1=nx,n2=nz, nlayers = 1 and the plane dumped is y=0
For y-z_layers, n1=ny,n2=nz, nlayers = 1 and the plane dumped is x=0 The variables are
u v w temperature Bx By Bz
nstep total_time u_min u_max momentum(1) v_min v_max momentum(2) w_min w_max momentum(3) kinetic_energy max_kinetic_energy mach_no(3) bx_min bx_max b_avg(1) by_min by_max b_avg(2) bz_min bz_max b_avg(3) Magnetic_Energy Max_Magnetic_Pressure
rho_u | the x direction horizontal momentum |
rho_v | the y direction horizontal momentum |
rho_ w | the vertical momentum |
t_ avg | the mean temperature |
rho_avg | the mean density |
u_avg | the average x direction velocity |
v_avg | the average y direction velocity |
w_avg | the average vertical velocity |
ke | the kinetic energy |
ke_flux | the kinetic energy flux |
P_flux | the pressure flux |
visc_flux | the viscous flux |
buoy_work | the bouyancy work |
P | the pressure |
P_work | the pressure work |
enth_flux | the enthalpy flux |
rad_flux | the radiative flux |
entropy | the entropy |
rho_u_w | the x direction Reynolds stress |
rho_v_w | the y direction Reynolds stress |
rho_u_T | the x direction heat flux |
rho_v_T | the y direction heat flux |
tot_flux | the total flux |
bx_avg | the mean magnetic field in the x direction |
by_avg | the mean magnetic field in the y direction |
bz_avg | the mean vertical magnetic field |
bx_flux | the magnetic flux in the x direction |
by_flux | the magnetic flux in the y direction |
bz_flux | the magnetic flux in the vertical direction |
me | the mechanical energy |
me_flux | the mechanical energy flux |
time | the time |
dtime | the time step |