gpuRDF2vec documentation¶
gpuRDF2Vec is a scalable GPU-based implementation of RDF2Vec embeddings for large and dense Knowledge Graphs.
Note
Licensed under the MIT License.
Key engineering improvements over CPU RDF2Vec:
GPU-native Walk Extraction: - Fully GPU-side random walks and BFS via cuGraph - Massively parallel node replication for walk creation
cuDF→PyTorch Handoff: - cuDF-backed DataLoader - DLPack tensor conversions eliminate CPU bottlenecks
Optimized Word2Vec: - Auto-batch sizing based on GPU memory - Kernel fusion and C++ backend processing
Distributed Training: - Multi-GPU via PyTorch Distributed and NCCL - all_reduce for synchronized gradient sharing
Report Issues and Bugs¶
Please open an issue with the label Bug and provide using the following template under the Github issue page:
Environment: OS, Python, CUDA, PyTorch, cuDF versions
Reproduction steps: Code or CLI input
Dataset: Format & size
Observed behavior vs expected behavior
Error logs or stack traces
We aim to respond within 3 business days. For fixes, open a PR referencing the issue.
License¶
This project is licensed under the MIT License.