- Compiler package for heterogeneous mapping of applications on various targets such as multi-core CPUs, GPUs and FPGAs (Convey HC-1ex system)
- Based on a collection of production-quality compilers (GNU GCC, LLVM, NVidia CC), the LLNL ROSE compiler, the Habanero-C compiler and runtime, and research tools such as PolyOpt and SDSLc.
- Implements a two-level programming model, where an application is decomposed into computation steps which can be implemented in any language (C, CUDA, etc.), and the orchestration of these steps is modeled using a dataflow language (either Intel CnC or CDSC- GR, a language we developed).
- An asynchronous parallel language, Habanero-C, is used to implement the inter-step parallelism. It is associated with a dynamic run-time capable of work stealing between heterogeneous targets such as CPUs, GPUs and FPGAs inside the same system.
- A domain-specific language that can be embedded in a native language such as C/C++ or Matlab to represent the segments of the computation which use a stencil pattern. We provide target-specific compilers for this DSL, to CPUs, GPUs and FPGAs ensuring performance portability from a single input source.
- Java-based implementation to complement Intel’s CnC implementation based on C++
- CnC-HJ can be used to execute step code written in Java, C, C++, Matlab, CUDA (any language callable from Java)
- Extensions to Java for simplified task-parallel programming
- Pedagogical language used to teach COMP 322 (Rice)
- Implementation language for Habanero-CnC and CnC-CUDA
- Java-based tool to manipulate and display medical image formats (imageViewer)
- Graphical plugin for imageViewer to execute the image processing pipeline
- Source code for individual algorithms and variants
PARADE: Full-System Accelerator-Rich Architecture Simulator
- Full-system X86 support based on gem5
- Global Accelerator Management (GAM)
- Coherent cache/scratchpad with shared memory based on Ruby
- Customizable Network-on-Chip simulation based on Garnet
- Power/area simulation
- Automatic accelerator/application generation based on HLS
- A wide range of hardware accelerators available, including medical imaging, encryption, linear algebra, vision, sequencing, random number generator, etc.
- For each accelerator, it includes the following files:
-
- Functional C-code
- RTL implementation of the accelerator
- C-code synthesizable by Xilinx AutoESL HLS tool
- Input tcl script for Xilinx AutoESL HLS tool
- An ultrafast and highly scalable aligner built on top of cloud infrastructures, including Spark and Hadoop distributed file system (HDFS).
- It leverages the abundant computing resources in a public or private cloud to fully exploit the parallelism obtained from the enormous number of reads.
- With CSBWAMEM, the pair-end whole-genome reads (30x) can be aligned within 80 minutes in a 25-node cluster with 300 cores.