$Id: usage.txt,v 1.15 2007/09/15 14:38:43 pkapyla Exp $ The memory consumption of MPI jobs is tested using the top command and looking at the RES output. Should: print,2048.^3*14.*4./256e6 In summary, the numbers suggest that the minimal memory consumption is equal to a constant C times the number of points in one direction, but C is different for different machines and/or compilers: machine C [MB] mpif90 (AMD Opteron, Muska) 0.25 ifort/lam (Axel's laptop) 0.16 ifort/lam (Petri's MacPro) 0.17 mpif90/g95 (QMUL cluster) 0.07 ---------------------------------------------------------------------------- Murska: AMD Opteron 2.6 GHz dual core processors, HP model CP4000 BL ProLiant The following numbers suggest that when the number of processors becomes large, the memory consumption levels off at a value that depends linearly on the number of mesh point in one direction. These extrapolations are indicated below by the remark nproc=infty. nxgrid nproc RSS should rest 1024 infty 240 0 2048 256 2250 1900 1024 256 440 235 512 infty ?120 0 512 256 179 29 256 infty 60 0 53 = nx*C - nx^3/nproc, C=0.23 MB 256 64 62 15 256 32 68 29 256 16 95 59 256 8 162 117 256 4 283 235 128 32 30 4 128 16 30 8 22 = nx*C - ..., C=0.23 MB 128 8 39 15 128 4 52 29 128 2 79 59 128 1 131 117 64 16 18 1 64 8 17 2 64 4 17 4 64 2 18 8 64 1 20 15 ---------------------------------------------------------------------------- Axel: intel 4, Dell laptop ifort -O3 -I/usr/include/lam nxgrid nproc RSS 64 16 10 C=0.16 64 8 10 64 8 12 64 2 15 8 64 1 21 15 ---------------------------------------------------------------------------- cluster.maths.qmul.ac.uk /home/dhruba/mpich/bin/mpif90 -f90=/home/dhruba/bin/g95 nxgrid nproc RSS 128 64 8.9 C=0.07 128 32 12 128 16 19 64 64 4.4 C=0.07 64 16 5.9 to 6.5 64 8 8 32 64 2.8 C=0.09 Processor layout matters a little: for nx=64, nproc=8, we get 5.9 MB with 4x4 and 6.5 MB with 8x2 procs in the y and z directions. ---------------------------------------------------------------------------- Petri: MacBook Pro, Intel Core 2 Duo 2.16GHz, Intel Fortran 10.0.016, lam-mpi 7.1.2 libraries, test: interlocked-fluxrings grid nprocs rsize vsize 64 8 11 57 C=0.17 64 4 12 72 64 2 16 102 64 1 21 144