libxphi - Buffered offloading for Xeon Phi

Libxphi allows to collect hotspot functions that are beneficial to be offloaded to Intel Xeon Phi coprocessors (an instance of the Intel Many Integrated Core Architecture “MIC”). The library already comes with support for some BLAS level 3 functions (xGEMM) but is not limited LAPACK or BLAS. The library leverages the offload usage model while intercepting dynamic calls to a library function that is thought to run on the host. Calling the coprocessor’s code version may be protected by a threshold that determines whether an offload is beneficial or not. The library was developed originally to speed up Quantum Espresso and works very well with this software. I also tested it with GNU octave.

libxphi on github

libxsmm - A small matrix-matrix multiplication library

Now under the lead of Hans Pabst. I originally developed this library to speed up small matrix multiplications in CP2K for sizes of M,N and K smaller than 32 on the Xeon Phi. The implementation is done fully in intrinsics and today also comprises the just-in-time generation of assembly kernels.

libxsmm on github


Cornelius is my reference Lanczos method for the Hubbard model. It features a fully thread-parallelized sparse matrix implementation, making 16 site systems a matter of seconds, 18 sites requires some minutes and 20 sites will need a well-equiped shared-memory system.

cornelius on github