rdf2vecgpu.embedders package¶

Submodules¶

rdf2vecgpu.embedders.word2vec module¶

class CBOW(*args, **kwargs)[source]¶

Bases: LightningModule

forward(context)[source]¶

training_step(batch, batch_idx)[source]¶

class OrderAwareCBOW(*args, **kwargs)[source]¶

Bases: LightningModule

forward(context_words, context_distances, center, negative, neg_distances=None)[source]¶

Parameters:

context_words – (batch_size, context_size) context word indices
context_distances – (batch_size, context_size) relative distances from center to each context word
center – (batch_size,) center word indices
negative – (batch_size, neg_samples) negative sample indices
neg_distances – (batch_size, neg_samples) distances for negative samples (not used in CBOW)

get_context_embeddings(distance=0)[source]¶: Get context embeddings for a specific distance

get_embeddings()[source]¶: Return the center word embeddings for downstream tasks

predict_center(context_words, context_distances, top_k=5)[source]¶: Predict the most likely center words given context

training_step(batch, batch_idx)[source]¶

class OrderAwareSkipgram(*args, **kwargs)[source]¶

Bases: LightningModule

forward(center, context, distances, negative, neg_distances=None)[source]¶

Parameters:

center – (batch_size,) center word indices
context – (batch_size,) context word indices
distances – (batch_size,) relative distances from center to context
negative – (batch_size, neg_samples) negative sample indices
neg_distances – (batch_size, neg_samples) distances for negative samples

get_context_embeddings(distance=0)[source]¶: Get context embeddings for a specific distance

get_embeddings()[source]¶: Return the center word embeddings for downstream tasks

training_step(batch, batch_idx)[source]¶

class SkipGram(*args, **kwargs)[source]¶

Bases: LightningModule

forward(center, context, negative)[source]¶

training_step(batch, batch_idx)[source]¶

rdf2vecgpu.embedders.word2vec_loader module¶

class CBOWDataModule(*args, **kwargs)[source]¶

Bases: LightningDataModule

Dataloading optimised for a GPU-resident CBOW table.

Parameters:

context_tensor (torch.Tensor) – 2-D CUDA tensor of shape (n_samples, ctx_size), where each row is the flattened context words for one target.
center_tensor (torch.Tensor) – 1-D CUDA tensor of length n_samples, the target (centre) word indices.
batch_size (int) – Number of (context_vec, center) samples per optimisation step.

setup(stage=None)[source]¶

Parameters:: stage (str | None)

class OrderAwareCBOWDataModule(*args, **kwargs)[source]¶

Bases: LightningDataModule

Dataloading optimised for a GPU-resident order-aware CBOW table. Collates batches of (context_words, context_distances, center_word).

Parameters:

context_tensor (torch.Tensor)
context_distance_tensor (torch.Tensor)
center_tensor (torch.Tensor)
batch_size (int)

class OrderAwareSkipGramDataModule(*args, **kwargs)[source]¶

Bases: LightningDataModule

Dataloading optimised for a GPU‑resident order-aware skip‑gram table. Collates batches of (center, context, distance).

Parameters:

center_tensor (torch.Tensor)
context_tensor (torch.Tensor)
distance_tensor (torch.Tensor)
batch_size (int)

class ParquetSkipGramDataModule(*args, **kwargs)[source]¶

Bases: LightningDataModule

DataModule that reads skip-gram pairs from partitioned parquet.

Designed for multi-GPU DDP training: each rank reads its own subset of parquet files and loads them onto its local GPU.

Parameters:

parquet_path (str) – Directory containing partitioned parquet files (written by dask).
batch_size (int) – Number of (centre, context) pairs per optimisation step.

setup(stage=None)[source]¶

Parameters:: stage (str | None)

class SkipGramDataModule(*args, **kwargs)[source]¶

Bases: LightningDataModule

Dataloading optimised for a GPU‑resident skip‑gram table.

Parameters:

center_tensor (torch.Tensor) – 1‑D CUDA tensors with the same length.
context_tensor (torch.Tensor) – 1‑D CUDA tensors with the same length.
batch_size (int) – Number of (centre, context) pairs per optimisation step.

setup(stage=None)[source]¶

Parameters:: stage (str | None)

rdf2vecgpu.embedders package¶

Submodules¶

rdf2vecgpu.embedders.word2vec module¶

rdf2vecgpu.embedders.word2vec_loader module¶

Module contents¶