---
myst:
html_meta:
"description lang=en": "UniKernels: Multi-backend GPU kernel library for CUDA, Metal, and HIP"
"keywords": "GPU, CUDA, Metal, HIP, kernels, parallel computing, deep learning"
"property=og:title": "UniKernels Documentation"
"property=og:description": "Learn GPU programming with clean, well-documented kernel implementations across CUDA, Metal, and HIP"
---
# 🚀 UniKernels
```{warning}
This project is in early development. Documentation will be soon updated.
```
```{epigraph}
**A standalone C++ library** for learning GPU programming through **clean, well-documented kernel implementations** across **CUDA, Metal, and HIP**.
```
## 📖 Vision
UniKernels is designed as a comprehensive resource for implementing compute kernels used in **parallel computing**. It serves as both an educational tool and a production-ready library, providing **unified interfaces** across languages and backends.
The project emphasizes **shared programming patterns** (threads, blocks, memory hierarchy, synchronization) and **backend-specific optimizations** (CUDA vs HIP vs Metal), while maintaining a **stable C++ ABI**.
---
## 🎯 Key Features
::::{grid} 1 2 2 2
:gutter: 3
:::{grid-item-card} {octicon}`mortar-board;2em` **Educational Focus**
Learn GPU programming with production-quality implementations of algorithms from *Programming Massively Parallel Processors*.
:::
:::{grid-item-card} {octicon}`zap;2em` **Multi-Backend Support**
Unified API across CUDA, Metal, and HIP with automatic backend selection and optimization.
:::
:::{grid-item-card} {octicon}`cpu;2em` **Stable ABI**
Version-stable C++ ABI for seamless integration into larger projects and ecosystems.
:::
:::{grid-item-card} {octicon}`code-square;2em` **Multi-Language**
Python bindings via nanobind, Rust bindings via cxx, and pure C exports.
:::
::::
---
## 🎓 Educational Resources
Each kernel implementation includes:
- **Comprehensive documentation** explaining algorithm design and GPU optimization strategies
- **Performance analysis** with roofline models and memory hierarchy considerations
- **Code walkthroughs** highlighting backend-specific optimizations
- **Comparison studies** showing performance characteristics across different hardware
This makes UniKernels an ideal companion for:
- GPU programming courses and workshops
- Self-study with *Programming Massively Parallel Processors*
- Learning about high-performance computing techniques
---
## 📄 Citation
If you use UniKernels in your research, education, or production systems, please cite:
```bibtex
@software{unikernels2025,
title={UniKernels: A Cross-Platform C++ GPU Computing Library for parallel algorithms},
author={Rai, Ashish},
url={https://github.com/raishish/unikernels},
version={1.0},
year={2025},
note={C++ library with Python and Rust bindings for CUDA, ROCm, and Metal GPU programming}
}
```