rdf2vecgpu package¶
Subpackages¶
- rdf2vecgpu.corpus package
- rdf2vecgpu.embedders package
- rdf2vecgpu.helper package
- rdf2vecgpu.logger package
- Submodules
- rdf2vecgpu.logger.mlflow_logger module
MlflowTrackerMlflowTracker.close()MlflowTracker.enabled()MlflowTracker.log_artifact()MlflowTracker.log_data()MlflowTracker.log_figure()MlflowTracker.log_metrics()MlflowTracker.log_model_pytorch()MlflowTracker.log_params()MlflowTracker.log_pytorch()MlflowTracker.set_tags()MlflowTracker.stage()MlflowTracker.start_pipeline()
- rdf2vecgpu.logger.wandb_logger module
WandbTrackerWandbTracker.close()WandbTracker.enabled()WandbTracker.log_artifact()WandbTracker.log_data()WandbTracker.log_dict()WandbTracker.log_figure()WandbTracker.log_metrics()WandbTracker.log_model_pytorch()WandbTracker.log_params()WandbTracker.log_pytorch()WandbTracker.log_text()WandbTracker.set_tags()WandbTracker.stage()WandbTracker.start_pipeline()
- Module contents
- rdf2vecgpu.reader package
Submodules¶
rdf2vecgpu.gpu_rdf2vec module¶
rdf2vecgpu.config module¶
- class RDF2VecConfig(*, walk_strategy='random', walk_depth=4, walk_number=100, walk_weighted=False, embedding_model='skipgram', epochs=5, batch_size=None, vector_size=256, window_size=5, min_count=1, negative_samples=5, learning_rate=0.0001, backend='pytorch', random_state=42, reproducible=False, multi_gpu=False, generate_artifact=False, cpu_count=4, tune_batch_size=True, num_nodes=1, tracker='none', tracker_kwargs=None, tracker_run_name=None, literal_predicates=None, literal_strategy='drop', literal_n_bins=5, literal_bin_strategy='quantile')[source]¶
Bases:
BaseModelConfiguration object for GPU-accelerated RDF2Vec.
- This dataclass centralizes all hyperparameters controlling:
walk generation
vocabulary construction
Word2Vec model architecture
training behavior (epochs, batch sizes, reproducibility)
execution backend (single GPU vs multi-GPU)
artifact export settings
- Parameters:
walk_strategy ({"random", "bfs"}, default "random") – Strategy used to generate walks from the knowledge graph.
walk_depth (int, default 4) – Maximum depth of each walk.
walk_number (int, default 100) – Number of walks started per vertex.
walk_weighted (bool, default False) – If True, use edge weights for biased walk transitions via cuGraph’s
biased_random_walks(). The input data must contain a"weights"column (cuGraph standard name).embedding_model ({"skipgram", "cbow"}, default "skipgram") – Word2Vec variant used for embedding training.
vector_size (int, default 256) – Dimensionality of the output embeddings.
window_size (int, default 5) – Context window size for Word2Vec.
min_count (int, default 1) – Minimum token frequency for inclusion in the vocabulary.
negative_samples (int, default 5) – Number of negative examples for negative sampling.
learning_rate (float, default 0.025) – Learning rate used by the optimizer.
epochs (int, default 5) – Number of training epochs.
batch_size (int or None, default None) – Explicit batch size; if None, Lightning’s tuner may pick one.
tune_batch_size (bool, default True) – Whether to use PyTorch Lightning’s automatic batch size tuning.
random_state (int, default 42) – Seed for reproducible walk sampling and model initialization.
reproducible (bool, default True) – If True, enables deterministic modes in PyTorch and CUDA.
multi_gpu (bool, default False) – If True, enables multi-GPU walk generation and training using Dask.
cpu_count (int, default 4) – Number of CPU workers used.
generate_artifact (bool, default False) – If True, persist word2idx and embeddings to Parquet files.
num_nodes (int, default 1) – Number of nodes involved in multi-GPU setup.
literal_predicates (list[str] or None, default None) – Predicates that identify literal (numeric) edges. When set, edges with these predicates are handled according to
literal_strategy. Predicate strings must match the values in the data exactly.literal_strategy ({"drop", "bin"}, default "drop") – How to handle literal edges.
"drop"removes them from the graph (pyRDF2Vec default)."bin"discretizes the object values into bin tokens so the edge stays in the graph.literal_n_bins (int, default 5) – Number of bins when
literal_strategy="bin".literal_bin_strategy ({"quantile", "uniform"}, default "quantile") – Binning method.
"quantile"creates equal-frequency bins (robust to skew)."uniform"creates equal-width bins.backend (Literal['pytorch', 'gensim'])
tracker (Literal['mlflow', 'wandb', 'none'])
tracker_kwargs (dict | None)
tracker_run_name (str | None)
- backend: Literal['pytorch', 'gensim']¶
- batch_size: int | None¶
- classmethod construct(_fields_set=None, **values)¶
- Parameters:
_fields_set (set[str] | None)
values (Any)
- Return type:
Self
- copy(*, include=None, exclude=None, update=None, deep=False)¶
Returns a copy of the model.
- !!! warning “Deprecated”
This method is now deprecated; use model_copy instead.
If you need include or exclude, use:
`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `- Parameters:
include (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.
- Returns:
A copy of the model with included, excluded and updated fields as specified.
- Return type:
Self
- cpu_count: int¶
- dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)¶
- Parameters:
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)
- Return type:
Dict[str, Any]
- embedding_model: Literal['skipgram', 'cbow']¶
- epochs: int¶
- classmethod from_orm(obj)¶
- Parameters:
obj (Any)
- Return type:
Self
- generate_artifact: bool¶
- json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)¶
- Parameters:
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)
encoder (Callable[[Any], Any] | None)
models_as_dict (bool)
dumps_kwargs (Any)
- Return type:
str
- learning_rate: float¶
- literal_bin_strategy: Literal['quantile', 'uniform']¶
- literal_n_bins: int¶
- literal_predicates: list[str] | None¶
- literal_strategy: Literal['drop', 'bin']¶
- min_count: int¶
- model_computed_fields = {}¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod model_construct(_fields_set=None, **values)¶
Creates a new instance of the Model class with validated data.
Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.
- !!! note
model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.
- Parameters:
_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.
- Returns:
A new instance of the Model class with validated data.
- Return type:
Self
- model_copy(*, update=None, deep=False)¶
- !!! abstract “Usage Documentation”
[model_copy](../concepts/models.md#model-copy)
Returns a copy of the model.
- !!! note
The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).
- Parameters:
update (Mapping[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.
- Returns:
New model instance.
- Return type:
Self
- model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False)¶
- !!! abstract “Usage Documentation”
[model_dump](../concepts/serialization.md#python-mode)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- Parameters:
mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to include in the output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
- Returns:
A dictionary representation of the model.
- Return type:
dict[str, Any]
- model_dump_json(*, indent=None, ensure_ascii=False, include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False)¶
- !!! abstract “Usage Documentation”
[model_dump_json](../concepts/serialization.md#json-mode)
Generates a JSON representation of the model using Pydantic’s to_json method.
- Parameters:
indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
ensure_ascii (bool) – If True, the output is guaranteed to have all incoming non-ASCII characters escaped. If False (the default), these characters will be output as-is.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to include in the JSON output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
- Returns:
A JSON string representation of the model.
- Return type:
str
- property model_extra: dict[str, Any] | None¶
Get extra fields set during validation.
- Returns:
A dictionary of extra fields, or None if config.extra is not set to “allow”.
- model_fields = {'backend': FieldInfo(annotation=Literal['pytorch', 'gensim'], required=False, default='pytorch'), 'batch_size': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, metadata=[Gt(gt=0)]), 'cpu_count': FieldInfo(annotation=int, required=False, default=4, metadata=[Gt(gt=0)]), 'embedding_model': FieldInfo(annotation=Literal['skipgram', 'cbow'], required=False, default='skipgram'), 'epochs': FieldInfo(annotation=int, required=False, default=5, metadata=[Gt(gt=0)]), 'generate_artifact': FieldInfo(annotation=bool, required=False, default=False), 'learning_rate': FieldInfo(annotation=float, required=False, default=0.0001, metadata=[Gt(gt=0)]), 'literal_bin_strategy': FieldInfo(annotation=Literal['quantile', 'uniform'], required=False, default='quantile'), 'literal_n_bins': FieldInfo(annotation=int, required=False, default=5, metadata=[Gt(gt=1)]), 'literal_predicates': FieldInfo(annotation=Union[list[str], NoneType], required=False, default=None), 'literal_strategy': FieldInfo(annotation=Literal['drop', 'bin'], required=False, default='drop'), 'min_count': FieldInfo(annotation=int, required=False, default=1, metadata=[Ge(ge=0)]), 'multi_gpu': FieldInfo(annotation=bool, required=False, default=False), 'negative_samples': FieldInfo(annotation=int, required=False, default=5, metadata=[Ge(ge=0)]), 'num_nodes': FieldInfo(annotation=int, required=False, default=1, metadata=[Gt(gt=0)]), 'random_state': FieldInfo(annotation=int, required=False, default=42, metadata=[Ge(ge=0)]), 'reproducible': FieldInfo(annotation=bool, required=False, default=False), 'tracker': FieldInfo(annotation=Literal['mlflow', 'wandb', 'none'], required=False, default='none'), 'tracker_kwargs': FieldInfo(annotation=Union[dict, NoneType], required=False, default=None), 'tracker_run_name': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'tune_batch_size': FieldInfo(annotation=bool, required=False, default=True), 'vector_size': FieldInfo(annotation=int, required=False, default=256, metadata=[Gt(gt=0)]), 'walk_depth': FieldInfo(annotation=int, required=False, default=4, metadata=[Gt(gt=0)]), 'walk_number': FieldInfo(annotation=int, required=False, default=100, metadata=[Gt(gt=0)]), 'walk_strategy': FieldInfo(annotation=Literal['random', 'bfs'], required=False, default='random'), 'walk_weighted': FieldInfo(annotation=bool, required=False, default=False), 'window_size': FieldInfo(annotation=int, required=False, default=5, metadata=[Gt(gt=1)])}¶
- property model_fields_set: set[str]¶
Returns the set of fields that have been explicitly set on this model instance.
- Returns:
- A set of strings representing the fields that have been set,
i.e. that were not filled from defaults.
- classmethod model_json_schema(by_alias=True, ref_template='#/$defs/{model}', schema_generator=<class 'pydantic.json_schema.GenerateJsonSchema'>, mode='validation', *, union_format='any_of')¶
Generates a JSON schema for a model class.
- Parameters:
by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
union_format (Literal['any_of', 'primitive_type_array']) –
The format to use when combining schemas from unions together. Can be one of:
’any_of’: Use the [anyOf](https://json-schema.org/understanding-json-schema/reference/combining#anyOf)
keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.
schema_generator (type[GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (Literal['validation', 'serialization']) – The mode in which to generate the schema.
- Returns:
The JSON schema for the given model class.
- Return type:
dict[str, Any]
- classmethod model_parametrized_name(params)¶
Compute the class name for parametrizations of generic classes.
This method can be overridden to achieve a custom naming scheme for generic BaseModels.
- Parameters:
params (tuple[type[Any], ...]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
- Returns:
String representing the new class where params are passed to cls as type variables.
- Raises:
TypeError – Raised when trying to generate concrete names for non-generic models.
- Return type:
str
- model_post_init(context, /)¶
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- Parameters:
context (Any)
- Return type:
None
- classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)¶
Try to rebuild the pydantic-core schema for the model.
This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.
- Parameters:
force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (MappingNamespace | None) – The types namespace, defaults to None.
- Returns:
Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.
- Return type:
bool | None
- classmethod model_validate(obj, *, strict=None, extra=None, from_attributes=None, context=None, by_alias=None, by_name=None)¶
Validate a pydantic model instance.
- Parameters:
obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.
- Raises:
ValidationError – If the object could not be validated.
- Returns:
The validated model instance.
- Return type:
Self
- classmethod model_validate_json(json_data, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)¶
- !!! abstract “Usage Documentation”
[JSON Parsing](../concepts/json.md#json-parsing)
Validate the given JSON data against the Pydantic model.
- Parameters:
json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.
- Returns:
The validated Pydantic model.
- Raises:
ValidationError – If json_data is not a JSON string or the object could not be validated.
- Return type:
Self
- classmethod model_validate_strings(obj, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)¶
Validate the given object with string data against the Pydantic model.
- Parameters:
obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.
- Returns:
The validated Pydantic model.
- Return type:
Self
- multi_gpu: bool¶
- negative_samples: int¶
- num_nodes: int¶
- classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)¶
- Parameters:
path (str | Path)
content_type (str | None)
encoding (str)
proto (DeprecatedParseProtocol | None)
allow_pickle (bool)
- Return type:
Self
- classmethod parse_obj(obj)¶
- Parameters:
obj (Any)
- Return type:
Self
- classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)¶
- Parameters:
b (str | bytes)
content_type (str | None)
encoding (str)
proto (DeprecatedParseProtocol | None)
allow_pickle (bool)
- Return type:
Self
- random_state: int¶
- reproducible: bool¶
- classmethod schema(by_alias=True, ref_template='#/$defs/{model}')¶
- Parameters:
by_alias (bool)
ref_template (str)
- Return type:
Dict[str, Any]
- classmethod schema_json(*, by_alias=True, ref_template='#/$defs/{model}', **dumps_kwargs)¶
- Parameters:
by_alias (bool)
ref_template (str)
dumps_kwargs (Any)
- Return type:
str
- tracker: Literal['mlflow', 'wandb', 'none']¶
- tracker_kwargs: dict | None¶
- tracker_run_name: str | None¶
- tune_batch_size: bool¶
- classmethod update_forward_refs(**localns)¶
- Parameters:
localns (Any)
- Return type:
None
- classmethod validate(value)¶
- Parameters:
value (Any)
- Return type:
Self
- vector_size: int¶
- walk_depth: int¶
- walk_number: int¶
- walk_strategy: Literal['random', 'bfs']¶
- walk_weighted: bool¶
- window_size: int¶
Module contents¶
- class RDF2VecConfig(*, walk_strategy='random', walk_depth=4, walk_number=100, walk_weighted=False, embedding_model='skipgram', epochs=5, batch_size=None, vector_size=256, window_size=5, min_count=1, negative_samples=5, learning_rate=0.0001, backend='pytorch', random_state=42, reproducible=False, multi_gpu=False, generate_artifact=False, cpu_count=4, tune_batch_size=True, num_nodes=1, tracker='none', tracker_kwargs=None, tracker_run_name=None, literal_predicates=None, literal_strategy='drop', literal_n_bins=5, literal_bin_strategy='quantile')[source]¶
Bases:
BaseModelConfiguration object for GPU-accelerated RDF2Vec.
- This dataclass centralizes all hyperparameters controlling:
walk generation
vocabulary construction
Word2Vec model architecture
training behavior (epochs, batch sizes, reproducibility)
execution backend (single GPU vs multi-GPU)
artifact export settings
- Parameters:
walk_strategy ({"random", "bfs"}, default "random") – Strategy used to generate walks from the knowledge graph.
walk_depth (int, default 4) – Maximum depth of each walk.
walk_number (int, default 100) – Number of walks started per vertex.
walk_weighted (bool, default False) – If True, use edge weights for biased walk transitions via cuGraph’s
biased_random_walks(). The input data must contain a"weights"column (cuGraph standard name).embedding_model ({"skipgram", "cbow"}, default "skipgram") – Word2Vec variant used for embedding training.
vector_size (int, default 256) – Dimensionality of the output embeddings.
window_size (int, default 5) – Context window size for Word2Vec.
min_count (int, default 1) – Minimum token frequency for inclusion in the vocabulary.
negative_samples (int, default 5) – Number of negative examples for negative sampling.
learning_rate (float, default 0.025) – Learning rate used by the optimizer.
epochs (int, default 5) – Number of training epochs.
batch_size (int or None, default None) – Explicit batch size; if None, Lightning’s tuner may pick one.
tune_batch_size (bool, default True) – Whether to use PyTorch Lightning’s automatic batch size tuning.
random_state (int, default 42) – Seed for reproducible walk sampling and model initialization.
reproducible (bool, default True) – If True, enables deterministic modes in PyTorch and CUDA.
multi_gpu (bool, default False) – If True, enables multi-GPU walk generation and training using Dask.
cpu_count (int, default 4) – Number of CPU workers used.
generate_artifact (bool, default False) – If True, persist word2idx and embeddings to Parquet files.
num_nodes (int, default 1) – Number of nodes involved in multi-GPU setup.
literal_predicates (list[str] or None, default None) – Predicates that identify literal (numeric) edges. When set, edges with these predicates are handled according to
literal_strategy. Predicate strings must match the values in the data exactly.literal_strategy ({"drop", "bin"}, default "drop") – How to handle literal edges.
"drop"removes them from the graph (pyRDF2Vec default)."bin"discretizes the object values into bin tokens so the edge stays in the graph.literal_n_bins (int, default 5) – Number of bins when
literal_strategy="bin".literal_bin_strategy ({"quantile", "uniform"}, default "quantile") – Binning method.
"quantile"creates equal-frequency bins (robust to skew)."uniform"creates equal-width bins.backend (Literal['pytorch', 'gensim'])
tracker (Literal['mlflow', 'wandb', 'none'])
tracker_kwargs (dict | None)
tracker_run_name (str | None)
- backend: Literal['pytorch', 'gensim']¶
- batch_size: int | None¶
- classmethod construct(_fields_set=None, **values)¶
- Parameters:
_fields_set (set[str] | None)
values (Any)
- Return type:
Self
- copy(*, include=None, exclude=None, update=None, deep=False)¶
Returns a copy of the model.
- !!! warning “Deprecated”
This method is now deprecated; use model_copy instead.
If you need include or exclude, use:
`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `- Parameters:
include (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.
- Returns:
A copy of the model with included, excluded and updated fields as specified.
- Return type:
Self
- cpu_count: int¶
- dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)¶
- Parameters:
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)
- Return type:
Dict[str, Any]
- embedding_model: Literal['skipgram', 'cbow']¶
- epochs: int¶
- classmethod from_orm(obj)¶
- Parameters:
obj (Any)
- Return type:
Self
- generate_artifact: bool¶
- json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)¶
- Parameters:
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)
encoder (Callable[[Any], Any] | None)
models_as_dict (bool)
dumps_kwargs (Any)
- Return type:
str
- learning_rate: float¶
- literal_bin_strategy: Literal['quantile', 'uniform']¶
- literal_n_bins: int¶
- literal_predicates: list[str] | None¶
- literal_strategy: Literal['drop', 'bin']¶
- min_count: int¶
- model_computed_fields = {}¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod model_construct(_fields_set=None, **values)¶
Creates a new instance of the Model class with validated data.
Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.
- !!! note
model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.
- Parameters:
_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.
- Returns:
A new instance of the Model class with validated data.
- Return type:
Self
- model_copy(*, update=None, deep=False)¶
- !!! abstract “Usage Documentation”
[model_copy](../concepts/models.md#model-copy)
Returns a copy of the model.
- !!! note
The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).
- Parameters:
update (Mapping[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.
- Returns:
New model instance.
- Return type:
Self
- model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False)¶
- !!! abstract “Usage Documentation”
[model_dump](../concepts/serialization.md#python-mode)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- Parameters:
mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to include in the output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
- Returns:
A dictionary representation of the model.
- Return type:
dict[str, Any]
- model_dump_json(*, indent=None, ensure_ascii=False, include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, exclude_computed_fields=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False)¶
- !!! abstract “Usage Documentation”
[model_dump_json](../concepts/serialization.md#json-mode)
Generates a JSON representation of the model using Pydantic’s to_json method.
- Parameters:
indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
ensure_ascii (bool) – If True, the output is guaranteed to have all incoming non-ASCII characters escaped. If False (the default), these characters will be output as-is.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to include in the JSON output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
exclude_computed_fields (bool) – Whether to exclude computed fields. While this can be useful for round-tripping, it is usually recommended to use the dedicated round_trip parameter instead.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.
- Returns:
A JSON string representation of the model.
- Return type:
str
- property model_extra: dict[str, Any] | None¶
Get extra fields set during validation.
- Returns:
A dictionary of extra fields, or None if config.extra is not set to “allow”.
- model_fields = {'backend': FieldInfo(annotation=Literal['pytorch', 'gensim'], required=False, default='pytorch'), 'batch_size': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, metadata=[Gt(gt=0)]), 'cpu_count': FieldInfo(annotation=int, required=False, default=4, metadata=[Gt(gt=0)]), 'embedding_model': FieldInfo(annotation=Literal['skipgram', 'cbow'], required=False, default='skipgram'), 'epochs': FieldInfo(annotation=int, required=False, default=5, metadata=[Gt(gt=0)]), 'generate_artifact': FieldInfo(annotation=bool, required=False, default=False), 'learning_rate': FieldInfo(annotation=float, required=False, default=0.0001, metadata=[Gt(gt=0)]), 'literal_bin_strategy': FieldInfo(annotation=Literal['quantile', 'uniform'], required=False, default='quantile'), 'literal_n_bins': FieldInfo(annotation=int, required=False, default=5, metadata=[Gt(gt=1)]), 'literal_predicates': FieldInfo(annotation=Union[list[str], NoneType], required=False, default=None), 'literal_strategy': FieldInfo(annotation=Literal['drop', 'bin'], required=False, default='drop'), 'min_count': FieldInfo(annotation=int, required=False, default=1, metadata=[Ge(ge=0)]), 'multi_gpu': FieldInfo(annotation=bool, required=False, default=False), 'negative_samples': FieldInfo(annotation=int, required=False, default=5, metadata=[Ge(ge=0)]), 'num_nodes': FieldInfo(annotation=int, required=False, default=1, metadata=[Gt(gt=0)]), 'random_state': FieldInfo(annotation=int, required=False, default=42, metadata=[Ge(ge=0)]), 'reproducible': FieldInfo(annotation=bool, required=False, default=False), 'tracker': FieldInfo(annotation=Literal['mlflow', 'wandb', 'none'], required=False, default='none'), 'tracker_kwargs': FieldInfo(annotation=Union[dict, NoneType], required=False, default=None), 'tracker_run_name': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'tune_batch_size': FieldInfo(annotation=bool, required=False, default=True), 'vector_size': FieldInfo(annotation=int, required=False, default=256, metadata=[Gt(gt=0)]), 'walk_depth': FieldInfo(annotation=int, required=False, default=4, metadata=[Gt(gt=0)]), 'walk_number': FieldInfo(annotation=int, required=False, default=100, metadata=[Gt(gt=0)]), 'walk_strategy': FieldInfo(annotation=Literal['random', 'bfs'], required=False, default='random'), 'walk_weighted': FieldInfo(annotation=bool, required=False, default=False), 'window_size': FieldInfo(annotation=int, required=False, default=5, metadata=[Gt(gt=1)])}¶
- property model_fields_set: set[str]¶
Returns the set of fields that have been explicitly set on this model instance.
- Returns:
- A set of strings representing the fields that have been set,
i.e. that were not filled from defaults.
- classmethod model_json_schema(by_alias=True, ref_template='#/$defs/{model}', schema_generator=<class 'pydantic.json_schema.GenerateJsonSchema'>, mode='validation', *, union_format='any_of')¶
Generates a JSON schema for a model class.
- Parameters:
by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
union_format (Literal['any_of', 'primitive_type_array']) –
The format to use when combining schemas from unions together. Can be one of:
’any_of’: Use the [anyOf](https://json-schema.org/understanding-json-schema/reference/combining#anyOf)
keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.
schema_generator (type[GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (Literal['validation', 'serialization']) – The mode in which to generate the schema.
- Returns:
The JSON schema for the given model class.
- Return type:
dict[str, Any]
- classmethod model_parametrized_name(params)¶
Compute the class name for parametrizations of generic classes.
This method can be overridden to achieve a custom naming scheme for generic BaseModels.
- Parameters:
params (tuple[type[Any], ...]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
- Returns:
String representing the new class where params are passed to cls as type variables.
- Raises:
TypeError – Raised when trying to generate concrete names for non-generic models.
- Return type:
str
- model_post_init(context, /)¶
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- Parameters:
context (Any)
- Return type:
None
- classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)¶
Try to rebuild the pydantic-core schema for the model.
This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.
- Parameters:
force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (MappingNamespace | None) – The types namespace, defaults to None.
- Returns:
Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.
- Return type:
bool | None
- classmethod model_validate(obj, *, strict=None, extra=None, from_attributes=None, context=None, by_alias=None, by_name=None)¶
Validate a pydantic model instance.
- Parameters:
obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.
- Raises:
ValidationError – If the object could not be validated.
- Returns:
The validated model instance.
- Return type:
Self
- classmethod model_validate_json(json_data, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)¶
- !!! abstract “Usage Documentation”
[JSON Parsing](../concepts/json.md#json-parsing)
Validate the given JSON data against the Pydantic model.
- Parameters:
json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.
- Returns:
The validated Pydantic model.
- Raises:
ValidationError – If json_data is not a JSON string or the object could not be validated.
- Return type:
Self
- classmethod model_validate_strings(obj, *, strict=None, extra=None, context=None, by_alias=None, by_name=None)¶
Validate the given object with string data against the Pydantic model.
- Parameters:
obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
extra (Literal['allow', 'ignore', 'forbid'] | None) – Whether to ignore, allow, or forbid extra data during model validation. See the [extra configuration value][pydantic.ConfigDict.extra] for details.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.
- Returns:
The validated Pydantic model.
- Return type:
Self
- multi_gpu: bool¶
- negative_samples: int¶
- num_nodes: int¶
- classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)¶
- Parameters:
path (str | Path)
content_type (str | None)
encoding (str)
proto (DeprecatedParseProtocol | None)
allow_pickle (bool)
- Return type:
Self
- classmethod parse_obj(obj)¶
- Parameters:
obj (Any)
- Return type:
Self
- classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)¶
- Parameters:
b (str | bytes)
content_type (str | None)
encoding (str)
proto (DeprecatedParseProtocol | None)
allow_pickle (bool)
- Return type:
Self
- random_state: int¶
- reproducible: bool¶
- classmethod schema(by_alias=True, ref_template='#/$defs/{model}')¶
- Parameters:
by_alias (bool)
ref_template (str)
- Return type:
Dict[str, Any]
- classmethod schema_json(*, by_alias=True, ref_template='#/$defs/{model}', **dumps_kwargs)¶
- Parameters:
by_alias (bool)
ref_template (str)
dumps_kwargs (Any)
- Return type:
str
- tracker: Literal['mlflow', 'wandb', 'none']¶
- tracker_kwargs: dict | None¶
- tracker_run_name: str | None¶
- tune_batch_size: bool¶
- classmethod update_forward_refs(**localns)¶
- Parameters:
localns (Any)
- Return type:
None
- classmethod validate(value)¶
- Parameters:
value (Any)
- Return type:
Self
- vector_size: int¶
- walk_depth: int¶
- walk_number: int¶
- walk_strategy: Literal['random', 'bfs']¶
- walk_weighted: bool¶
- window_size: int¶