Experiment Tracking¶

rdf2vecgpu ships with a pluggable experiment tracking layer so that runs can be logged to MLflow, Weights & Biases, or no backend at all. The tracker is selected via the tracker field on RDF2VecConfig and instantiated by the internal build_tracker() factory.

Installing tracker backends¶

The tracker backends are optional dependencies. Install only the ones you need:

# MLflow (uses mlflow-skinny under the hood)
pip install "rdf2vecgpu[mlflow]"

# Weights & Biases
pip install "rdf2vecgpu[wandb]"

Alternatively, when working from source with uv:

uv sync --extra mlflow
uv sync --extra wandb

Selecting a tracker¶

from rdf2vecgpu import GPU_RDF2Vec, RDF2VecConfig

config = RDF2VecConfig(
    walk_strategy="random",
    walk_depth=4,
    walk_number=100,
    embedding_model="skipgram",
    epochs=5,
    tracker="mlflow",  # "none" (default), "mlflow", or "wandb"
    tracker_run_name="wikidata5m-baseline",
    tracker_kwargs={
        "mlflow": {
            "tracking_uri": "http://mlflow.internal:5000",
            "experiment_name": "rdf2vecgpu",
        },
    },
)
model = GPU_RDF2Vec(config=config)

For Weights & Biases, use tracker="wandb" and provide backend-specific kwargs under the "wandb" key (for example project, entity, group).

Note

When tracker="none" (the default), a NoOpTracker is used. All tracker calls become no-ops, so the pipeline behaves identically to an untraced run.

Pipeline stages¶

The pipeline wraps each major step with a tracker stage context manager. Parameters and metrics logged inside a stage are associated with that stage in the tracking backend:

data_loading — edge count, column count, source path
Literal_Handling — literal strategy and predicates (when configured)
walk generation — walk strategy, depth, count, timing
vocabulary construction — vocabulary size
Word2Vec training — hyperparameters, loss curves, batch-size tuning results

Run metadata such as the library name and backend are recorded via tags when the pipeline starts.

Custom tracker backends¶

Implementing a custom backend means subclassing BaseTracker and implementing the methods relevant for your use case (start_pipeline, stage, log_params, log_metrics, log_artifact, …). Because each method has a no-op default, you only need to override what you want to capture.