gpuRDF2vec documentation

gpuRDF2Vec is a scalable GPU-based implementation of RDF2Vec embeddings for large and dense Knowledge Graphs.

RDF2VecGPU Image

Note

Licensed under the MIT License.

Key engineering improvements over CPU RDF2Vec:

  1. GPU-native Walk Extraction: - Fully GPU-side random walks and BFS via cuGraph - Massively parallel node replication for walk creation

  2. cuDF→PyTorch Handoff: - cuDF-backed DataLoader - DLPack tensor conversions eliminate CPU bottlenecks

  3. Optimized Word2Vec: - Auto-batch sizing based on GPU memory - Kernel fusion and C++ backend processing

  4. Distributed Training: - Multi-GPU via PyTorch Distributed and NCCL - all_reduce for synchronized gradient sharing

Report Issues and Bugs

Please open an issue with the label Bug and provide using the following template under the Github issue page:

  • Environment: OS, Python, CUDA, PyTorch, cuDF versions

  • Reproduction steps: Code or CLI input

  • Dataset: Format & size

  • Observed behavior vs expected behavior

  • Error logs or stack traces

We aim to respond within 3 business days. For fixes, open a PR referencing the issue.

License

This project is licensed under the MIT License.

Indices and tables