Software Releases - Center for Domain-Specific Computing

Compiler package for heterogeneous mapping of applications on various targets such as multi-core CPUs, GPUs and FPGAs (Convey HC-1ex system)
Based on a collection of production-quality compilers (GNU GCC, LLVM, NVidia CC), the LLNL ROSE compiler, the Habanero-C compiler and runtime, and research tools such as PolyOpt and SDSLc.
Implements a two-level programming model, where an application is decomposed into computation steps which can be implemented in any language (C, CUDA, etc.), and the orchestration of these steps is modeled using a dataflow language (either Intel CnC or CDSC- GR, a language we developed).
An asynchronous parallel language, Habanero-C, is used to implement the inter-step parallelism. It is associated with a dynamic run-time capable of work stealing between heterogeneous targets such as CPUs, GPUs and FPGAs inside the same system.
A domain-specific language that can be embedded in a native language such as C/C++ or Matlab to represent the segments of the computation which use a stencil pattern. We provide target-specific compilers for this DSL, to CPUs, GPUs and FPGAs ensuring performance portability from a single input source.

Java-based implementation to complement Intel’s CnC implementation based on C++
CnC-HJ can be used to execute step code written in Java, C, C++, Matlab, CUDA (any language callable from Java)

A wide range of hardware accelerators available, including medical imaging, encryption, linear algebra, vision, sequencing, random number generator, etc.
For each accelerator, it includes the following files:
- Functional C-code
- RTL implementation of the accelerator
- C-code synthesizable by Xilinx AutoESL HLS tool
- Input tcl script for Xilinx AutoESL HLS tool

An ultrafast and highly scalable aligner built on top of cloud infrastructures, including Spark and Hadoop distributed file system (HDFS).
It leverages the abundant computing resources in a public or private cloud to fully exploit the parallelism obtained from the enormous number of reads.
With CSBWAMEM, the pair-end whole-genome reads (30x) can be aligned within 80 minutes in a 25-node cluster with 300 cores.