🚀 UniKernels#
Warning
This project is in early development. Documentation will be soon updated.
A standalone C++ library for learning GPU programming through clean, well-documented kernel implementations across CUDA, Metal, and HIP.
📖 Vision#
UniKernels is designed as a comprehensive resource for implementing compute kernels used in parallel computing. It serves as both an educational tool and a production-ready library, providing unified interfaces across languages and backends.
The project emphasizes shared programming patterns (threads, blocks, memory hierarchy, synchronization) and backend-specific optimizations (CUDA vs HIP vs Metal), while maintaining a stable C++ ABI.
🎯 Key Features#
Learn GPU programming with production-quality implementations of algorithms from Programming Massively Parallel Processors.
Unified API across CUDA, Metal, and HIP with automatic backend selection and optimization.
Version-stable C++ ABI for seamless integration into larger projects and ecosystems.
Python bindings via nanobind, Rust bindings via cxx, and pure C exports.
🎓 Educational Resources#
Each kernel implementation includes:
Comprehensive documentation explaining algorithm design and GPU optimization strategies
Performance analysis with roofline models and memory hierarchy considerations
Code walkthroughs highlighting backend-specific optimizations
Comparison studies showing performance characteristics across different hardware
This makes UniKernels an ideal companion for:
GPU programming courses and workshops
Self-study with Programming Massively Parallel Processors
Learning about high-performance computing techniques
📄 Citation#
If you use UniKernels in your research, education, or production systems, please cite:
@software{unikernels2025,
title={UniKernels: A Cross-Platform C++ GPU Computing Library for parallel algorithms},
author={Rai, Ashish},
url={https://github.com/raishish/unikernels},
version={1.0},
year={2025},
note={C++ library with Python and Rust bindings for CUDA, ROCm, and Metal GPU programming}
}