🚀 UniKernels#

Warning

This project is in early development. Documentation will be soon updated.

A standalone C++ library for learning GPU programming through clean, well-documented kernel implementations across CUDA, Metal, and HIP.

📖 Vision#

UniKernels is designed as a comprehensive resource for implementing compute kernels used in parallel computing. It serves as both an educational tool and a production-ready library, providing unified interfaces across languages and backends.

The project emphasizes shared programming patterns (threads, blocks, memory hierarchy, synchronization) and backend-specific optimizations (CUDA vs HIP vs Metal), while maintaining a stable C++ ABI.

🎯 Key Features#

Educational Focus

Learn GPU programming with production-quality implementations of algorithms from Programming Massively Parallel Processors.

Multi-Backend Support

Unified API across CUDA, Metal, and HIP with automatic backend selection and optimization.

Stable ABI

Version-stable C++ ABI for seamless integration into larger projects and ecosystems.

Multi-Language

Python bindings via nanobind, Rust bindings via cxx, and pure C exports.

🎓 Educational Resources#

Each kernel implementation includes:

Comprehensive documentation explaining algorithm design and GPU optimization strategies
Performance analysis with roofline models and memory hierarchy considerations
Code walkthroughs highlighting backend-specific optimizations
Comparison studies showing performance characteristics across different hardware

This makes UniKernels an ideal companion for:

GPU programming courses and workshops
Self-study with Programming Massively Parallel Processors
Learning about high-performance computing techniques

📄 Citation#

If you use UniKernels in your research, education, or production systems, please cite:

@software{unikernels2025,
   title={UniKernels: A Cross-Platform C++ GPU Computing Library for parallel algorithms},
   author={Rai, Ashish},
   url={https://github.com/raishish/unikernels},
   version={1.0},
   year={2025},
   note={C++ library with Python and Rust bindings for CUDA, ROCm, and Metal GPU programming}
}