rdf2vecgpu.embedders package

Submodules

rdf2vecgpu.embedders.word2vec module

class CBOW(*args, **kwargs)[source]

Bases: LightningModule

forward(context)[source]
training_step(batch, batch_idx)[source]
class OrderAwareCBOW(*args, **kwargs)[source]

Bases: LightningModule

forward(context_words, context_distances, center, negative, neg_distances=None)[source]
Parameters:
  • context_words – (batch_size, context_size) context word indices

  • context_distances – (batch_size, context_size) relative distances from center to each context word

  • center – (batch_size,) center word indices

  • negative – (batch_size, neg_samples) negative sample indices

  • neg_distances – (batch_size, neg_samples) distances for negative samples (not used in CBOW)

get_context_embeddings(distance=0)[source]

Get context embeddings for a specific distance

get_embeddings()[source]

Return the center word embeddings for downstream tasks

predict_center(context_words, context_distances, top_k=5)[source]

Predict the most likely center words given context

training_step(batch, batch_idx)[source]
class OrderAwareSkipgram(*args, **kwargs)[source]

Bases: LightningModule

forward(center, context, distances, negative, neg_distances=None)[source]
Parameters:
  • center – (batch_size,) center word indices

  • context – (batch_size,) context word indices

  • distances – (batch_size,) relative distances from center to context

  • negative – (batch_size, neg_samples) negative sample indices

  • neg_distances – (batch_size, neg_samples) distances for negative samples

get_context_embeddings(distance=0)[source]

Get context embeddings for a specific distance

get_embeddings()[source]

Return the center word embeddings for downstream tasks

training_step(batch, batch_idx)[source]
class SkipGram(*args, **kwargs)[source]

Bases: LightningModule

forward(center, context, negative)[source]
training_step(batch, batch_idx)[source]

rdf2vecgpu.embedders.word2vec_loader module

class CBOWDataModule(*args, **kwargs)[source]

Bases: LightningDataModule

Dataloading optimised for a GPU-resident CBOW table.

Parameters:
  • context_tensor (torch.Tensor) – 2-D CUDA tensor of shape (n_samples, ctx_size), where each row is the flattened context words for one target.

  • center_tensor (torch.Tensor) – 1-D CUDA tensor of length n_samples, the target (centre) word indices.

  • batch_size (int) – Number of (context_vec, center) samples per optimisation step.

setup(stage=None)[source]
Parameters:

stage (str | None)

class OrderAwareCBOWDataModule(*args, **kwargs)[source]

Bases: LightningDataModule

Dataloading optimised for a GPU-resident order-aware CBOW table. Collates batches of (context_words, context_distances, center_word).

Parameters:
  • context_tensor (torch.Tensor)

  • context_distance_tensor (torch.Tensor)

  • center_tensor (torch.Tensor)

  • batch_size (int)

class OrderAwareSkipGramDataModule(*args, **kwargs)[source]

Bases: LightningDataModule

Dataloading optimised for a GPU‑resident order-aware skip‑gram table. Collates batches of (center, context, distance).

Parameters:
  • center_tensor (torch.Tensor)

  • context_tensor (torch.Tensor)

  • distance_tensor (torch.Tensor)

  • batch_size (int)

class ParquetSkipGramDataModule(*args, **kwargs)[source]

Bases: LightningDataModule

DataModule that reads skip-gram pairs from partitioned parquet.

Designed for multi-GPU DDP training: each rank reads its own subset of parquet files and loads them onto its local GPU.

Parameters:
  • parquet_path (str) – Directory containing partitioned parquet files (written by dask).

  • batch_size (int) – Number of (centre, context) pairs per optimisation step.

setup(stage=None)[source]
Parameters:

stage (str | None)

class SkipGramDataModule(*args, **kwargs)[source]

Bases: LightningDataModule

Dataloading optimised for a GPU‑resident skip‑gram table.

Parameters:
  • center_tensor (torch.Tensor) – 1‑D CUDA tensors with the same length.

  • context_tensor (torch.Tensor) – 1‑D CUDA tensors with the same length.

  • batch_size (int) – Number of (centre, context) pairs per optimisation step.

setup(stage=None)[source]
Parameters:

stage (str | None)

Module contents