Dependencies

Hilapp

Dependencies	Minimum Version	Required
Clang	8 -	Yes

Installing dependencies for HILA preprocessor:

‍NOTE:

If one opts to use a singularity container, skip to HILA applications dependencies.

If one opts to use a docker container, skip to installation section.

For building hilapp, you need clang development tools (actually, only include files). These can be found in most Linux distribution repos, e.g. in Ubuntu 22.04:

export LLVM_VERSION=15
sudo apt-get -y install clang-$LLVM_VERSION \
                   libclang-$LLVM_VERSION-dev

HILA applications

Dependencies	Minimum Version	Required
Clang / GCC	8 - / x	Yes
FFTW3	x	Yes
MPI	x	Yes
OpenMP	x	No
CUDA	x	No
HIP	x	No

Installing dependencies for HILA applications:

‍NOTE: If one opts to use docker, skip directly to the installation section.

Installing non GPU dependencies on ubuntu:

sudo apt install build-essential \
            libopenmpi-dev \
            libfftw3-dev \
            libomp-dev

CUDA:

See NVIDIA drivers and CUDA documentation: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html

HIP:

See ROCm and HIP documentation: https://docs.amd.com/, https://rocmdocs.amd.com/en/latest/Installation_Guide/HIP-Installation.html

Building and Installation

Begin by cloning HILA main repository:

git clone https://github.com/CFT-HY/HILA

Currently one can switch to alpha release for more stability and less features:

git fetch --tags

git checkout tags/{release-version}

The installation process is split into two parts. Building the HILA preprocessor and compiling HILA applications. Both can be installed from source with proper dependency supports, and both steps have their respective containerization options available. In certain situation, users may face on issue about building HILA preprocessor. This usually is caused by lack of LLVM/Clang support. There are few tested strategies to deal with this problem, read HILA home page for details. The variety in options is to address different issues which arise in platform dependencies.

When it comes to installing HILA applications there are many avenues one can take depending on their platform. The available platforms and offered methods are listed below, which link to the necessary section in the installation guide.

Platforms

LINUX Workstation

HILA has originally been developed on linux, hence all of the available options can be used. Most of mainstream LINUX distributions should have direct dependency supports of source building from their package mangers. Alternatively, user could also set up pre-built singualrity container of HILA preprocessor to avoid dependencies problem. Additionally one can opt to use the docker container which installs the HILA preprocessor directly.

If the user wishes to constantly use or switch between different versions of LLVM/Clang for new features, a secure and efficient way is using Spack ported LLVM building. This is particularly helpful for building and managing LLVM/Clang easily on HPC platforms. For the details of Spack building, please jump to HPC building section.

‍NOTE: It is advised to use the docker container only for development purposes, since containerization can add computational overhead. This is especially evident in containerized MPI communication.

Containerization of the hilapp on the other hand adds no computational overhead, except in the compilation process, thus for production runs one can use the singularity container and reach maximal computational performance.

MAC

On mac the installation of the HILA preprocessor dependencies and HILA application dependencies can be tedious, and in some cases impossible. Availability of clang libtoolbox is open ended. For this reason the best option is to use the available docker container.

On the other hand, HPC library building system Spack has quite good support on MacOS, user may go through same process as those on HPC platform to build LLVM/Clang after Spack is properly installed onto MacOS.

WINDOWS

On windows the installation of the HILA preprocessor dependencies and HILA application dependencies are untested. For this reason the best option is to use the available docker container.

One can also opt to use WSL, which is compatible with CUDA. In this case, see guide Installing HILA on WSL

HPC

On supercomputing platforms the HILA application dependencies are most likely available. However, the availability of dependencies of HILA preprocessor frequently turns out be issue, since lack of open-box LLVM/Clang support on HPC platform.

As discussed in HILA home page, there are few tested strategies to deal with this problem. These strategies utilize different properties or features available on hardware architectures or carefully designed library building systems and containerization. Among them, the suggested solutions are library building/managing system and containerization of HILA preprocessor. The former includes the remarkable Spack project, as well as EasyBuild. While best example of the later one on HPC system is the singularity container.

Most of properly maintained computing clusters have pre-installed and -configured Spack system. If user has had experiences of Spack, then it will just works as-is, and building the LLVM/Clang with different versions could be achieved by few commands with Spack module loaded. Moreover, every built LLVM/Clang appears as loadable module which can be used in same way as system modules.

For step-by-step guide of Spack building LLVM/Clang, please read article Use Spack Build and Install LLVM on Workstation or Supercomputer. On computing platform without module system such as Linux Workstation, command spack find --path llvm will return the installation location, which is necessary information for builing HILA preprocessor. In this case, user may pass the absolute path through shell variable or GNU Make flag.

After installing the HILA preprocessor with one of the above options one can move on to the building HILA applications section.

Containers

HILA comes with both a singularity and docker container for differing purposes. The aim is to make use easy on any platform be it linux, mac, windows or a supercomputer.

Docker

The docker container is meant to develop and produce HILA applications, libraries and hilapp with ease. One can produce HILA applications on their local machine and run them in a container without having to worry about dependencies. Note that there is overhead when running MPI communication in docker, thus one will not get optimal simulation performance when running highly paralelled code in a container. This is a non issue with small scale simulations or testing.

Docker container instructions

All commands are run in docker folder

Docker image for HILA

Create docker image:

docker build -t hila -f Dockerfile .

Launch image interactively with docker compose

docker compose run --rm hila-applications

Developing with docker

The applications folder is automatically mounted from the local host to the docker image when launching the service hila-applications

../applications:/HILA/applications

This allows one to develop HILA applications directly from source and launch them in the docker image with ease.

When developing HILA libraries and hilapp one can also launch the service hila-source which mounts the HILA/libraries and HILA/hilapp/src folders to the container

docker compose run --rm hila-source

Singularity

The singularity container offers a more packaged approach where one doesn't need to worry about clang libtoolbox support for compiling the HILA pre processor. Hence for HPC platforms where the access of such compiler libraries can be tedious one can simply opt to use the container version of hilapp. This approach is mainly meant to be used for pre processing applications on an HPC platform.

Singularity container instructions

One can download the singularity container hilapp.sif directly from this github repositories release page. If downloaded skip directly to the Using singulartiy container section:

wget https://github.com/CFT-HY/HILA/releases/download/Nightly/hilapp.sif

If one is using the latest alpha release then one can download the corresponding container

wget https://github.com/CFT-HY/HILA/releases/download/{release-version}/hilapp.sif

Lastly one needs to give appropriate permissions to the singularity container

chmod +x hilapp.sif

Installing singularity

Simplest way to install singularity is by downloading the latest .deb or .rpm from github release page and installing directly with ones package manager

Ubuntu:

dpkg -i singularity-ce_$(SINGULARITY_VERSION)-$(UBUNTU_VERSION)_amd64.deb

Building singularity container

‍NOTE: sudo privileges are required for building a singularity container

For building the container we have two options. One can either build the container using the release version of hilapp from github or one can build using the local hilapp source. Especially in the situation that one is developing the HILA preprocessor and would like to test it on a HPC platform then building the singularity container from a local source is the preferred option. There are two different singularity definition files for both cases.

Building using release version:

sudo singularity build hilapp.sif hilapp_git.def

Building using local source

sudo singularity build hilapp.sif hilapp_local.def

Using singulartiy container

The hilapp.sif file will act as a singularity container and equivalently as the hilapp binary and can be used as such when pre processing HILA code. Thus you can move it to your HILA projects bin folder

mkdir HILA/hilapp/bin

mv hilapp.sif HILA/hilapp/bin/hilapp

Now one can simply move the singularity container to any give supercomputer.

Note that on supercomputers the default paths aren't the same as on default linux operating systems. Thus one will need to mount their HILA source folder to singularity using the APPTAINER_BIND or SINGULARITY_BIND environment variable. Simple navigate to the base of your HILA source directory and run

export APPTAINER_BIND=$(pwd)

or

export SINGULARITY_BIND=$(pwd)

Building HILA preprocessor

Before building the preprocessor one must first install the dependencies. See the dependencies

Compile hilapp:

cd hila/hilapp
make [-j4]
make install

This builds hilapp in hila/hilapp/build, and make install moves it to hila/hilapp/bin, which is the default location for the program. Build takes 1-2 min.

‍NOTE: clang dev libraries are not installed in most supercomputer systems. However, if the system has x86_64 processors (by far most common), you can use make static -command to build statically linked hilapp. Copy hila/hilapp/build/hilapp to directory hila/hilapp/bin on the target machine. Simpler approach for HPC platforms is use of singularity containers

Test that hilapp works:

./bin/hilapp --help

Expected output

$ ./bin/hilapp --help
USAGE: hilapp [options] <source files>
 
OPTIONS:
 
Generic Options:
 
  --help                    - Display available options (--help-hidden for more)
  --help-list               - Display list of available options (--help-list-hidden for more)
  --version                 - Display the version of this program
 
hilapp:
 
  --AVXinfo=<int>           - AVX vectorization information level 0-2. 0 quiet, 1 not vectorizable loops, 2 all loops
  -D <macro[=value]>        - Define name/macro for preprocessor
  -I <directory>            - Directory for include file search
  --allow-func-globals      - Allow using global or extern variables in functions called from site loops.
                              This will not work in kernelized code (for example GPUs)
  --check-init              - Insert checks that Field variables are appropriately initialized before use
  --comment-pragmas         - Comment out '#pragma hila' -pragmas in output
  --dump-ast                - Dump AST tree
  --function-spec-no-inline - Do not mark generated function specializations "inline"
  --gpu-slow-reduce         - Use slow (but memory economical) reduction on gpus
  --ident-functions         - Comment function call types in output
  --insert-includes         - Insert all project #include files in .cpt -files (portable)
  --method-spec-no-inline   - Do not mark generated method specializations "inline"
  --no-include              - Do not insert any '#include'-files (for debug, may not compile)
  --no-interleave           - Do not interleave communications with computation
  --no-output               - No output file, for syntax check
  -o <filename>             - Output file (default: <file>.cpt, write to stdout: -o - 
  --syntax-only             - Same as no-output
  --target:AVX              - Generate AVX vectorized loops
  --target:AVX512           - Generate AVX512 vectorized loops
  --target:CUDA             - Generate CUDA kernels
  --target:HIP              - Generate HIP kernels
  --target:openacc          - Offload to GPU using openACC
  --target:openmp           - Hybrid OpenMP - MPI
  --target:vanilla          - Generate loops in place
  --target:vectorize=<int>  - Generate vectorized loops with given vector size 
                              For example -target:vectorize=32 is equivalent to -target:AVX
  --verbosity=<int>         - Verbosity level 0-2.  Default 0 (quiet)

Building HILA applications

First we will try to build and run a health check test application with the default computing platform which is CPU with MPI enabled. To do so navigate to the applications folder and try:

cd hila/applications/hila_healthcheck
make -j4
./build/hila_healthcheck

Expected output

$ ./build/hila_healthcheck 
----- HILA ⩩ lattice framework ---------------------------
Running program ./build/hila_healthcheck
with command line arguments ''
Code version: git SHA d0222bca
Compiled Jun  1 2023 at 11:13:10
with options: EVEN_SITES_FIRST SPECIAL_BOUNDARY_CONDITIONS
Starting -- date Thu Jun  1 11:13:28 2023  run time 8.328e-05s
No runtime limit given
GNU c-library performance: not returning allocated memory
----- Reading file parameters ------------------------------
lattice size         256,256,256
random seed          0
------------------------------------------------------------
------------------------------------------------------------
LAYOUT: lattice size  256 x 256 x 256  =  16777216 sites
Dividing to 1 nodes
 
Sites on node: 256 x 256 x 256  =  16777216
Processor layout: 1 x 1 x 1  =  1 nodes
Node remapping: NODE_LAYOUT_BLOCK with blocksize 4
Node block size 1 1 1  block division 1 1 1
------------------------------------------------------------
Communication tests done -- date Thu Jun  1 11:13:31 2023  run time 3.11s
------------------------------------------------------------
Random seed from time: 3871436182438
Using node random numbers, seed for node 0: 3871436182438
--- Complex reduction value ( -2.7647453e-17, 5.5294928e-17 ) passed
--- Vector reduction, sum ( -7.1331829e-15, -1.4328816e-15 ) passed
--- Setting and reading a value at [ 37 211 27 ] passed
--- Setting and reading a value at [ 251 220 47 ] passed
--- Setting and reading a value at [ 250 249 134 ] passed
--- Maxloc is [ 112 117 164 ] passed
--- Max value 2 passed
--- Minloc is [ 192 135 27 ] passed
--- Min value -1 passed
--- Field set_elements and get_elements with 51 coordinates passed
--- SiteSelect size 51 passed
--- SiteValueSelect size 51 passed
--- SiteSelect content passed
--- SiteValueSelect content passed
--- SiteIndex passed
--- 2-dimensional slice size 65536 passed
--- slice content passed
--- 1-dimensional slice size 256 passed
--- slice content passed
--- FFT constant field passed
--- FFT inverse transform passed
--- FFT of wave vector [ 132 159 243 ] passed
--- FFT of wave vector [ 167 161 208 ] passed
--- FFT of wave vector [ 152 87 255 ] passed
--- FFT of wave vector [ 156 86 229 ] passed
--- FFT of wave vector [ 78 246 141 ] passed
--- FFT real to complex passed
--- FFT complex to real passed
--- Norm of field = 44434.862 and FFT = 44434.862 passed
--- Norm of binned FFT = 44434.862 passed
--- Binning test at vector [ 100 220 7 ] passed
--- Spectral density test with above vector  passed
--- Binning test at vector [ 193 10 49 ] passed
--- Spectral density test with above vector  passed
--- Binning test at vector [ 235 241 96 ] passed
--- Spectral density test with above vector  passed
TIMER REPORT:             total(sec)          calls     time/call  fraction
---------------------------------------------------------------------------
MPI broadcast       :          0.000             40      0.263 μs   0.0000
MPI reduction       :          0.000             34      2.003 μs   0.0000
FFT total time      :         44.544             14      3.182 s    0.6449
 copy pencils       :          3.261             15      0.217 s    0.0472
 MPI for pencils    :          0.000             90      1.298 μs   0.0000
 FFT plan           :          0.003             42     73.150 μs   0.0000
 copy fft buffers   :          2.412        5505024      0.438 μs   0.0349
 FFT execute        :          2.356        2752512      0.856 μs   0.0341
 pencil reshuffle   :         12.967             30      0.432 s    0.1878
 save pencils       :         26.043             15      1.736 s    0.3771
bin field time      :          9.014              7      1.288 s    0.1305
---------------------------------------------------------------------------
 No communications done from node 0
Finishing -- date Thu Jun  1 11:14:37 2023  run time 69.07s
------------------------------------------------------------

NOTE: Naturally the run time depends on your system**

And for running with multiple processes:

mpirun -n 4 ./build/hila_healthcheck

Expected output

$ mpirun -n 4 ./build/hila_healthcheck
----- HILA ⩩ lattice framework ---------------------------
Running program ./build/hila_healthcheck
with command line arguments ''
Code version: git SHA d0222bca
Compiled Jun  1 2023 at 11:13:10
with options: EVEN_SITES_FIRST SPECIAL_BOUNDARY_CONDITIONS
Starting -- date Thu Jun  1 11:18:22 2023  run time 0.0001745s
No runtime limit given
GNU c-library performance: not returning allocated memory
----- Reading file parameters ------------------------------
lattice size         256,256,256
random seed          0
------------------------------------------------------------
------------------------------------------------------------
LAYOUT: lattice size  256 x 256 x 256  =  16777216 sites
Dividing to 4 nodes
 
Sites on node: 256 x 128 x 128  =  4194304
Processor layout: 1 x 2 x 2  =  4 nodes
Node remapping: NODE_LAYOUT_BLOCK with blocksize 4
Node block size 1 2 2  block division 1 1 1
------------------------------------------------------------
Communication tests done -- date Thu Jun  1 11:18:23 2023  run time 1.046s
------------------------------------------------------------
Random seed from time: 4184648360436
Using node random numbers, seed for node 0: 4184648360436
--- Complex reduction value ( -2.7539926e-17, 5.5079939e-17 ) passed
--- Vector reduction, sum ( 1.4328816e-15, -7.4627804e-15 ) passed
--- Setting and reading a value at [ 139 215 41 ] passed
--- Setting and reading a value at [ 231 44 102 ] passed
--- Setting and reading a value at [ 238 201 150 ] passed
--- Maxloc is [ 80 69 74 ] passed
--- Max value 2 passed
--- Minloc is [ 219 105 178 ] passed
--- Min value -1 passed
--- Field set_elements and get_elements with 51 coordinates passed
--- SiteSelect size 51 passed
--- SiteValueSelect size 51 passed
--- SiteSelect content passed
--- SiteValueSelect content passed
--- SiteIndex passed
--- 2-dimensional slice size 65536 passed
--- slice content passed
--- 1-dimensional slice size 256 passed
--- slice content passed
--- FFT constant field passed
--- FFT inverse transform passed
--- FFT of wave vector [ 239 139 86 ] passed
--- FFT of wave vector [ 218 12 247 ] passed
--- FFT of wave vector [ 94 206 99 ] passed
--- FFT of wave vector [ 34 78 96 ] passed
--- FFT of wave vector [ 221 224 199 ] passed
--- FFT real to complex passed
--- FFT complex to real passed
--- Norm of field = 44418.915 and FFT = 44418.915 passed
--- Norm of binned FFT = 44418.915 passed
--- Binning test at vector [ 106 69 123 ] passed
--- Spectral density test with above vector  passed
--- Binning test at vector [ 240 142 174 ] passed
--- Spectral density test with above vector  passed
--- Binning test at vector [ 226 28 118 ] passed
--- Spectral density test with above vector  passed
TIMER REPORT:             total(sec)          calls     time/call  fraction
---------------------------------------------------------------------------
MPI broadcast       :          0.002             40     49.358 μs   0.0001
MPI reduction       :          0.289             34      8.508 ms   0.0120
MPI post receive    :          0.000              4      1.782 μs   0.0000
MPI start send      :          0.000              4      3.923 μs   0.0000
MPI wait receive    :          0.001              4      0.277 ms   0.0000
MPI wait send       :          0.002              4      0.404 ms   0.0001
MPI send field      :          0.001             15     67.812 μs   0.0000
FFT total time      :         14.922             14      1.066 s    0.6182
 copy pencils       :          1.941             15      0.129 s    0.0804
 MPI for pencils    :          1.644             90     18.263 ms   0.0681
 FFT plan           :          0.006             42      0.140 ms   0.0002
 copy fft buffers   :          1.164        1376256      0.846 μs   0.0482
 FFT execute        :          0.933         688128      1.355 μs   0.0386
 pencil reshuffle   :          7.246             30      0.242 s    0.3002
 save pencils       :          2.994             15      0.200 s    0.1240
bin field time      :          2.792              7      0.399 s    0.1157
---------------------------------------------------------------------------
 COMMS from node 0: 4 done, 0(0%) optimized away
Finishing -- date Thu Jun  1 11:18:46 2023  run time 24.14s
------------------------------------------------------------

‍NOTE: Naturally the run time depends on your system

Now we can try to perform the same health check by targeting a differing computing platform with:

make ARCH=<platform>

where ARCH can take the following values:

ARCH=	Description
`vanilla`	default CPU implementation
`AVX2`	AVX vectorization optimized program using vectorclass
`openmp`	OpenMP parallelized program
`cuda`	Parallel CUDA program
`hip`	Parallel HIP

For cuda compilation one needs to define their CUDA version and architercure either as environment variables or during the make process:

export CUDA_VERSION=11.6
export CUDA_ARCH=61
make ARCH=cuda
or
make ARCH=cuda CUDA_VERSION=11.6 CUDA_ARCH=61

‍NOTE: Default cuda version is 11.6 and compute architecture is sm_61

Now if we execute the cuda version one should expect the following output

Expected output

$ ./build/hila_healthcheck 
GPU devices accessible from node 0: 1
----- HILA ⩩ lattice framework ---------------------------
Running program ./build/hila_healthcheck
with command line arguments ''
Code version: git SHA df945bff
Compiled Jun  2 2023 at 12:57:25
with options: EVEN_SITES_FIRST SPECIAL_BOUNDARY_CONDITIONS
Starting -- date Fri Jun  2 12:58:32 2023  run time 0.08375s
No runtime limit given
Using thread blocks of size 256 threads
Using GPU_AWARE_MPI
ReductionVector with atomic operations (GPU_VECTOR_REDUCTION_THREAD_BLOCKS=0)
CUDA driver version: 12010, runtime 12010
CUDART_VERSION 12010
Device on node rank 0 device 0:
  NVIDIA GeForce GTX 1080  capability: 6.1
  Global memory:   8113MB
  Shared memory:   48kB
  Constant memory: 64kB
  Block registers: 65536
  Warp size:         32
  Threads per block: 1024
  Max block dimensions: [ 1024, 1024, 64 ]
  Max grid dimensions:  [ 2147483647, 65535, 65535 ]
Threads in use: 256
OpenMPI library does not support CUDA-Aware MPI
GPU_AWARE_MPI is defined -- THIS MAY CRASH IN MPI
GNU c-library performance: not returning allocated memory
----- Reading file parameters ------------------------------
lattice size         256,256,256
random seed          0
------------------------------------------------------------
------------------------------------------------------------
LAYOUT: lattice size  256 x 256 x 256  =  16777216 sites
Dividing to 1 nodes
 
Sites on node: 256 x 256 x 256  =  16777216
Processor layout: 1 x 1 x 1  =  1 nodes
Node remapping: NODE_LAYOUT_BLOCK with blocksize 4
Node block size 1 1 1  block division 1 1 1
------------------------------------------------------------
Communication tests done -- date Fri Jun  2 12:58:34 2023  run time 1.827s
------------------------------------------------------------
Random seed from time: 7145975945297229
Using node random numbers, seed for node 0: 7145975945297229
GPU random number generator initialized
GPU random number thread blocks: 32 of size 256 threads
--- Complex reduction value ( -3.9112593e-17, -3.2797116e-18 ) passed
--- Vector reduction, sum ( -7.1331829e-15, -1.3218593e-15 ) passed
--- Setting and reading a value at [ 187 200 25 ] passed
--- Setting and reading a value at [ 70 161 70 ] passed
--- Setting and reading a value at [ 197 191 182 ] passed
--- Maxloc is [ 13 45 107 ] passed
--- Max value 2 passed
--- Minloc is [ 33 24 224 ] passed
--- Min value -1 passed
--- Field set_elements and get_elements with 51 coordinates passed
--- SiteSelect size 51 passed
--- SiteValueSelect size 51 passed
--- SiteSelect content passed
--- SiteValueSelect content passed
--- SiteIndex passed
--- 2-dimensional slice size 65536 passed
--- slice content passed
--- 1-dimensional slice size 256 passed
--- slice content passed
--- FFT constant field passed
--- FFT inverse transform passed
--- FFT of wave vector [ 202 153 38 ] passed
--- FFT of wave vector [ 185 196 66 ] passed
--- FFT of wave vector [ 222 252 82 ] passed
--- FFT of wave vector [ 214 47 8 ] passed
--- FFT of wave vector [ 108 142 205 ] passed
--- FFT real to complex passed
--- FFT complex to real passed
--- Norm of field = 44465.9 and FFT = 44465.9 passed
--- Norm of binned FFT = 44465.9 passed
--- Binning test at vector [ 175 117 16 ] passed
--- Spectral density test with above vector  passed
--- Binning test at vector [ 129 107 153 ] passed
--- Spectral density test with above vector  passed
--- Binning test at vector [ 237 7 157 ] passed
--- Spectral density test with above vector  passed
TIMER REPORT:             total(sec)          calls     time/call  fraction
---------------------------------------------------------------------------
MPI broadcast       :          0.000             40      0.095 μs   0.0000
MPI reduction       :          0.000             34      0.767 μs   0.0000
FFT total time      :          4.149             14      0.296 s    0.6021
 copy pencils       :          0.000             15      2.560 μs   0.0000
 MPI for pencils    :          4.490             90     49.885 ms   0.6515
 FFT plan           :          0.016              1     16.047 ms   0.0023
 copy fft buffers   :          0.001             84      6.129 μs   0.0001
 FFT execute        :          0.018             42      0.433 ms   0.0026
 pencil reshuffle   :          0.000             30      3.821 μs   0.0000
 save pencils       :          0.000             15      3.851 μs   0.0000
bin field time      :          0.208              7     29.777 ms   0.0302
---------------------------------------------------------------------------
 No communications done from node 0
 
GPU Memory pool statistics from node 0:
   Total pool size 3459.87 MB
   # of allocations 268  real allocs 17%
   Average free list search 6.3 steps
   Average free list size 16 items
 
Finishing -- date Fri Jun  2 12:58:39 2023  run time 6.891s
------------------------------------------------------------

‍NOTE: Naturally the run time depends on your system

Additionally we have some ARCH values tuned for specific HPC platforms:

ARCH=	Description
`lumi`	CPU-MPI implementation for LUMI supercomputer
`lumi-hip-CC`	GPU-MPI implementation for LUMI supercomputer using HIP with rocm 6.0.3 and cray compilers
`lumi-hip-6.2.2-gnu`	GPU-MPI implementation for LUMI supercomputer using HIP with rocm 6.2.2 and gnu compilers
`lumi-hip-6.2.2-cray`	GPU-MPI implementation for LUMI supercomputer using HIP with rocm 6.2.2 and cray compilers
`mahti`	CPU-MPI implementation for MAHTI supercomputer
`mahti-cuda`	GPU-MPI implementation for MAHTI supercomputer using CUDA

The respective makefiles will output important information on loading modules etc.

We will discuss the computing platforms more in the creating a HILA application guide.

Table of Contents

Dependencies

Hilapp

Installing dependencies for HILA preprocessor:

HILA applications

Installing dependencies for HILA applications:

Building and Installation

Platforms

LINUX Workstation

MAC

WINDOWS

HPC

Containers

Docker

Singularity

Building HILA preprocessor

Building HILA applications