.. _vectorretrieval: Retrieval of embeddings ========================= After training RDF2Vec embeddings using the ``gpuRDF2vec`` package, you can retrieve the vector representations for all entities used within the knowledge graph. Similarly to the GPU-based training process, the retrieval of embeddings is also optimized for performance by building on top of DLPack to extract the vectors directly from GPU memory. This allows you to handle large-scale knowledge graphs efficiently. The following example demonstrates how to perform this retrieval process: .. code-block:: python from rdf2vecgpu import GPU_RDF2Vec, RDF2VecConfig # Initialize the GPU_RDF2Vec pipeline config = RDF2VecConfig( walk_strategy="random", walk_depth=4, walk_number=100, embedding_model="skipgram", epochs=5, batch_size=None, vector_size=100, window_size=5, min_count=1, learning_rate=0.01, negative_samples=5, random_state=42, reproducible=False, multi_gpu=False, generate_artifact=False, cpu_count=20, ) gpu_rdf2vec_model = GPU_RDF2Vec(config=config) # Read the knowledge graph edge_data = gpu_rdf2vec_model.read_data("data/wikidata5m/wikidata5m_kg.parquet") # Train the RDF2Vec embeddings gpu_rdf2vec_model.fit(edge_df=edge_data, walk_vertices=None) # Retrieve the embeddings for all entities embeddings = gpu_rdf2vec_model.transform() The ``transform`` method returns a cuDF dataframe where the keys are the entity URIs, together with the internal integer-based ID and the embedding vectors. If you set ``generate_artifact=True`` during the configuration, the embeddings will also be saved to disk in the specified output directory as a Parquet file.