--- myst: html_meta: "description lang=en": "UniKernels: Multi-backend GPU kernel library for CUDA, Metal, and HIP" "keywords": "GPU, CUDA, Metal, HIP, kernels, parallel computing, deep learning" "property=og:title": "UniKernels Documentation" "property=og:description": "Learn GPU programming with clean, well-documented kernel implementations across CUDA, Metal, and HIP" --- # 🚀 UniKernels ```{warning} This project is in early development. Documentation will be soon updated. ``` ```{epigraph} **A standalone C++ library** for learning GPU programming through **clean, well-documented kernel implementations** across **CUDA, Metal, and HIP**. ``` ## 📖 Vision UniKernels is designed as a comprehensive resource for implementing compute kernels used in **parallel computing**. It serves as both an educational tool and a production-ready library, providing **unified interfaces** across languages and backends. The project emphasizes **shared programming patterns** (threads, blocks, memory hierarchy, synchronization) and **backend-specific optimizations** (CUDA vs HIP vs Metal), while maintaining a **stable C++ ABI**. --- ## 🎯 Key Features ::::{grid} 1 2 2 2 :gutter: 3 :::{grid-item-card} {octicon}`mortar-board;2em` **Educational Focus** Learn GPU programming with production-quality implementations of algorithms from *Programming Massively Parallel Processors*. ::: :::{grid-item-card} {octicon}`zap;2em` **Multi-Backend Support** Unified API across CUDA, Metal, and HIP with automatic backend selection and optimization. ::: :::{grid-item-card} {octicon}`cpu;2em` **Stable ABI** Version-stable C++ ABI for seamless integration into larger projects and ecosystems. ::: :::{grid-item-card} {octicon}`code-square;2em` **Multi-Language** Python bindings via nanobind, Rust bindings via cxx, and pure C exports. ::: :::: --- ## 🎓 Educational Resources Each kernel implementation includes: - **Comprehensive documentation** explaining algorithm design and GPU optimization strategies - **Performance analysis** with roofline models and memory hierarchy considerations - **Code walkthroughs** highlighting backend-specific optimizations - **Comparison studies** showing performance characteristics across different hardware This makes UniKernels an ideal companion for: - GPU programming courses and workshops - Self-study with *Programming Massively Parallel Processors* - Learning about high-performance computing techniques --- ## 📄 Citation If you use UniKernels in your research, education, or production systems, please cite: ```bibtex @software{unikernels2025, title={UniKernels: A Cross-Platform C++ GPU Computing Library for parallel algorithms}, author={Rai, Ashish}, url={https://github.com/raishish/unikernels}, version={1.0}, year={2025}, note={C++ library with Python and Rust bindings for CUDA, ROCm, and Metal GPU programming} } ```