9. Neural Network Layers

9.1 `BayesianModule`

Abstract base for Bayesian-aware modules in Tensorflow. Provides mechanisms to track if a module is Bayesian and control parameter updates through freezing/unfreezing.

Notes

All derived classes must implement freeze and kl_cost to handle parameter management and compute the KL divergence cost.

Source code in illia/nn/tf/base.py

@saving.register_keras_serializable(package="illia", name="BayesianModule")
class BayesianModule(layers.Layer, ABC):
    """
    Abstract base for Bayesian-aware modules in Tensorflow.
    Provides mechanisms to track if a module is Bayesian and control
    parameter updates through freezing/unfreezing.

    Notes:
        All derived classes must implement `freeze` and `kl_cost` to
        handle parameter management and compute the KL divergence cost.
    """

    def __init__(self, **kwargs: Any) -> None:
        """
        Initialize the Bayesian module with default flags.
        Sets `frozen` to False and `is_bayesian` to True.

        Args:
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.
        """

        super().__init__(**kwargs)

        self.frozen: bool = False
        self.is_bayesian: bool = True

    @abstractmethod
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.

        Notes:
            Must be implemented by all subclasses.
        """

    def unfreeze(self) -> None:
        """
        Unfreeze the module by setting its `frozen` flag to False.
        Allows parameters to be sampled and updated again.

        Returns:
            None.
        """

        self.frozen = False

    @abstractmethod
    def kl_cost(self) -> tuple[tf.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[tf.Tensor, int]: A tuple containing the KL divergence
                cost and the total number of parameters in the layer.

        Notes:
            Must be implemented by all subclasses.
        """

9.1.1 `init(**kwargs)`

Initialize the Bayesian module with default flags. Sets frozen to False and is_bayesian to True.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/base.py

def __init__(self, **kwargs: Any) -> None:
    """
    Initialize the Bayesian module with default flags.
    Sets `frozen` to False and `is_bayesian` to True.

    Args:
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.
    """

    super().__init__(**kwargs)

    self.frozen: bool = False
    self.is_bayesian: bool = True

9.1.2 `freeze()` `abstractmethod`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Notes

Must be implemented by all subclasses.

Source code in illia/nn/tf/base.py

@abstractmethod
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.

    Notes:
        Must be implemented by all subclasses.
    """

9.1.3 `kl_cost()` `abstractmethod`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[tf.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Notes

Must be implemented by all subclasses.

Source code in illia/nn/tf/base.py

@abstractmethod
def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[tf.Tensor, int]: A tuple containing the KL divergence
            cost and the total number of parameters in the layer.

    Notes:
        Must be implemented by all subclasses.
    """

9.1.4 `unfreeze()`

Unfreeze the module by setting its frozen flag to False. Allows parameters to be sampled and updated again.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/base.py

def unfreeze(self) -> None:
    """
    Unfreeze the module by setting its `frozen` flag to False.
    Allows parameters to be sampled and updated again.

    Returns:
        None.
    """

    self.frozen = False

9.2 `Conv1d`

Bayesian 1D convolutional layer with optional weight and bias priors. Behaves like a standard Conv1d but treats weights and bias as random variables sampled from specified distributions. Parameters become fixed when the layer is frozen.

Source code in illia/nn/tf/conv1d.py

@saving.register_keras_serializable(package="illia", name="Conv1d")
class Conv1d(BayesianModule):
    """
    Bayesian 1D convolutional layer with optional weight and bias priors.
    Behaves like a standard Conv1d but treats weights and bias as random
    variables sampled from specified distributions. Parameters become fixed
    when the layer is frozen.
    """

    bias_distribution: Optional[GaussianDistribution] = None

    def __init__(
        self,
        input_channels: int,
        output_channels: int,
        kernel_size: int,
        stride: int = 1,
        padding: str = "VALID",
        dilation: int = 1,
        groups: int = 1,
        data_format: Optional[str] = "NWC",
        weights_distribution: Optional[GaussianDistribution] = None,
        bias_distribution: Optional[GaussianDistribution] = None,
        use_bias: bool = True,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Bayesian 1D convolutional layer.

        Args:
            input_channels: Number of channels in the input.
            output_channels: Number of channels produced by the conv.
            kernel_size: Size of the convolution kernel.
            stride: Stride of the convolution.
            padding: Padding type, 'VALID' or 'SAME'.
            dilation: Spacing between kernel elements.
            groups: Number of blocked connections between input/output.
            data_format: 'NWC' or 'NCW' format for input data.
            weights_distribution: Distribution for weights sampling.
            bias_distribution: Distribution for bias sampling.
            use_bias: Whether to include a bias term.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        # Check data format
        self._check_params(kernel_size, groups, stride, dilation, data_format)

        self.input_channels = input_channels
        self.output_channels = output_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.use_bias = use_bias

        # Adjust the weights distribution based on the channel format
        self.data_format = (
            "NWC" if data_format is None or data_format == "NWC" else "NCW"
        )

        # Get the weights distribution shape, needs to be channel last
        self._weights_distribution_shape = (
            input_channels // groups,
            kernel_size,
            output_channels,
        )

        # Set weights distribution
        if weights_distribution is None:
            self.weights_distribution = GaussianDistribution(
                self._weights_distribution_shape
            )
        else:
            self.weights_distribution = weights_distribution

        # Set bias distribution
        if self.use_bias:
            if bias_distribution is None:
                self.bias_distribution = GaussianDistribution((output_channels,))
            else:
                self.bias_distribution = bias_distribution
        else:
            self.bias_distribution = None

    def _check_params(
        self,
        kernel_size: int,
        groups: int,
        stride: int,
        dilation: int,
        data_format: Optional[str],
    ) -> None:
        """
        Validates convolution parameters for correctness.

        Args:
            kernel_size: Convolution kernel size.
            groups: Number of blocked connections.
            stride: Convolution stride.
            dilation: Spacing between kernel elements.
            data_format: 'NWC' or 'NCW' for input tensor.

        Raises:
            ValueError: If any parameter is invalid.
        """

        if kernel_size is not None and (kernel_size <= 0 or kernel_size % groups != 0):
            raise ValueError(
                f"Invalid `kernel_size`: {kernel_size}. Must be > 0 "
                f"and divisible by `groups` {groups}."
            )
        if groups <= 0:
            raise ValueError(f"Invalid `groups`: {groups}. Must be > 0.")
        if isinstance(stride, list):
            if any(s == 0 for s in stride):
                raise ValueError(f"`stride` {stride} cannot contain 0.")
            if max(stride) > 1 and isinstance(dilation, list) and max(dilation) > 1:
                raise ValueError(
                    f"`stride` {stride} > 1 not allowed with `dilation` {dilation} > 1."
                )
        if data_format not in {"NWC", "NCW"}:
            raise ValueError(
                f"Invalid `data_format`: {data_format}. Must be 'NWC' or 'NCW'."
            )

    def build(self, input_shape: tf.TensorShape) -> None:
        """
        Build trainable and non-trainable parameters.

        Args:
            input_shape: Input shape used to trigger layer build.

        Returns:
            None
        """

        # Register non-trainable variables
        self.w = self.add_weight(
            name="weights",
            initializer=tf.constant_initializer(
                self.weights_distribution.sample().numpy()
            ),
            shape=self._weights_distribution_shape,
            trainable=False,
        )

        if self.use_bias and self.bias_distribution is not None:
            self.b = self.add_weight(
                name="bias",
                initializer=tf.constant_initializer(
                    self.bias_distribution.sample().numpy()
                ),
                shape=(self.output_channels,),
                trainable=False,
            )

        super().build(input_shape)

    def get_config(self) -> dict:
        """
        Return the configuration dictionary for serialization.

        Returns:
            dict: Dictionary with the layer configuration.
        """

        base_config = super().get_config()

        custom_config = {
            "input_channels": self.input_channels,
            "output_channels": self.output_channels,
            "kernel_size": self.kernel_size,
            "stride": self.stride,
            "padding": self.padding,
            "dilation": self.dilation,
            "groups": self.groups,
            "data_format": self.data_format,
        }

        return {**base_config, **custom_config}

    def _conv1d(
        self,
        inputs: tf.Tensor,
        weight: tf.Tensor,
        stride: int | list[int],
        padding: str,
        data_format: Optional[str] = "NWC",
        dilation: Optional[int | list[int]] = None,
    ) -> tf.Tensor:
        """
        Performs a 1D convolution using provided weights.

        Args:
            inputs: Input tensor.
            weight: Convolutional kernel tensor.
            stride: Convolution stride.
            padding: Padding strategy 'VALID' or 'SAME'.
            data_format: 'NWC' or 'NCW' input format.
            dilation: Spacing between kernel elements.

        Returns:
            Tensor after 1D convolution.
        """

        output: tf.Tensor = tf.nn.conv1d(
            input=inputs,
            filters=weight,
            stride=stride,
            padding=padding,
            data_format=data_format,
            dilations=dilation,
        )

        return output

    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.w is None:
            self.w = self.weights_distribution.sample()

        # Sample bias is they are undefined
        if self.use_bias and self.b is None and self.bias_distribution is not None:
            self.b = self.bias_distribution.sample()

        # Stop gradient computation
        self.w = tf.stop_gradient(self.w)
        if self.use_bias:
            self.b = tf.stop_gradient(self.b)

    def kl_cost(self) -> tuple[tf.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[tf.Tensor, int]: A tuple containing the KL divergence
                cost and the total number of parameters in the layer.
        """

        # Compute log probs
        log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

        # Add bias log probs only if using bias
        if self.use_bias and self.b is not None and self.bias_distribution is not None:
            log_probs += self.bias_distribution.log_prob(self.b)

        # Compute number of parameters
        num_params: int = self.weights_distribution.num_params
        if self.use_bias and self.bias_distribution is not None:
            num_params += self.bias_distribution.num_params

        return log_probs, num_params

    def call(self, inputs: tf.Tensor) -> tf.Tensor:
        """
        Performs a forward pass through the Bayesian Convolution 1D
        layer. If the layer is not frozen, it samples weights and bias
        from their respective distributions. If the layer is frozen
        and the weights or bias are not initialized, it also performs
        sampling.

        Args:
            inputs: Input tensor to the layer with shape
                (batch, length, output_channels) if 'data_format' is
                'NWC' or (batch, output_channels, length) if
                'data_format' is 'NCW'

        Returns:
            Output tensor after convolution with optional bias added.

        Raises:
            ValueError: If the layer is frozen but weights or bias are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.w = self.weights_distribution.sample()

            # Sample bias only if using bias
            if self.use_bias and self.bias_distribution is not None:
                self.b = self.bias_distribution.sample()
        elif self.w is None or (self.use_bias and self.b is None):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Compute outputs
        outputs: tf.Tensor = self._conv1d(
            inputs=inputs,
            weight=self.w,
            stride=self.stride,
            padding=self.padding,
            data_format=self.data_format,
            dilation=self.dilation,
        )

        # Add bias only if using bias
        if self.use_bias and self.b is not None:
            outputs = tf.nn.bias_add(
                value=outputs,
                bias=self.b,
                data_format="N..C" if self.data_format == "NWC" else "NC..",
            )

        return outputs

9.2.1 `init(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=1, groups=1, data_format='NWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

Initializes a Bayesian 1D convolutional layer.

Parameters:

Name	Type	Description	Default
`input_channels`	`int`	Number of channels in the input.	required
`output_channels`	`int`	Number of channels produced by the conv.	required
`kernel_size`	`int`	Size of the convolution kernel.	required
`stride`	`int`	Stride of the convolution.	`1`
`padding`	`str`	Padding type, 'VALID' or 'SAME'.	`'VALID'`
`dilation`	`int`	Spacing between kernel elements.	`1`
`groups`	`int`	Number of blocked connections between input/output.	`1`
`data_format`	`Optional[str]`	'NWC' or 'NCW' format for input data.	`'NWC'`
`weights_distribution`	`Optional[GaussianDistribution]`	Distribution for weights sampling.	`None`
`bias_distribution`	`Optional[GaussianDistribution]`	Distribution for bias sampling.	`None`
`use_bias`	`bool`	Whether to include a bias term.	`True`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/tf/conv1d.py

def __init__(
    self,
    input_channels: int,
    output_channels: int,
    kernel_size: int,
    stride: int = 1,
    padding: str = "VALID",
    dilation: int = 1,
    groups: int = 1,
    data_format: Optional[str] = "NWC",
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian 1D convolutional layer.

    Args:
        input_channels: Number of channels in the input.
        output_channels: Number of channels produced by the conv.
        kernel_size: Size of the convolution kernel.
        stride: Stride of the convolution.
        padding: Padding type, 'VALID' or 'SAME'.
        dilation: Spacing between kernel elements.
        groups: Number of blocked connections between input/output.
        data_format: 'NWC' or 'NCW' format for input data.
        weights_distribution: Distribution for weights sampling.
        bias_distribution: Distribution for bias sampling.
        use_bias: Whether to include a bias term.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    # Check data format
    self._check_params(kernel_size, groups, stride, dilation, data_format)

    self.input_channels = input_channels
    self.output_channels = output_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.groups = groups
    self.use_bias = use_bias

    # Adjust the weights distribution based on the channel format
    self.data_format = (
        "NWC" if data_format is None or data_format == "NWC" else "NCW"
    )

    # Get the weights distribution shape, needs to be channel last
    self._weights_distribution_shape = (
        input_channels // groups,
        kernel_size,
        output_channels,
    )

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            self._weights_distribution_shape
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((output_channels,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None

9.2.2 `call(inputs)`

Performs a forward pass through the Bayesian Convolution 1D layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor to the layer with shape (batch, length, output_channels) if 'data_format' is 'NWC' or (batch, output_channels, length) if 'data_format' is 'NCW'	required

Returns:

Type	Description
`Tensor`	Output tensor after convolution with optional bias added.

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights or bias are undefined.

Source code in illia/nn/tf/conv1d.py

def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs a forward pass through the Bayesian Convolution 1D
    layer. If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.

    Args:
        inputs: Input tensor to the layer with shape
            (batch, length, output_channels) if 'data_format' is
            'NWC' or (batch, output_channels, length) if
            'data_format' is 'NCW'

    Returns:
        Output tensor after convolution with optional bias added.

    Raises:
        ValueError: If the layer is frozen but weights or bias are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution is not None:
            self.b = self.bias_distribution.sample()
    elif self.w is None or (self.use_bias and self.b is None):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    outputs: tf.Tensor = self._conv1d(
        inputs=inputs,
        weight=self.w,
        stride=self.stride,
        padding=self.padding,
        data_format=self.data_format,
        dilation=self.dilation,
    )

    # Add bias only if using bias
    if self.use_bias and self.b is not None:
        outputs = tf.nn.bias_add(
            value=outputs,
            bias=self.b,
            data_format="N..C" if self.data_format == "NWC" else "NC..",
        )

    return outputs

9.2.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/conv1d.py

def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Sample bias is they are undefined
    if self.use_bias and self.b is None and self.bias_distribution is not None:
        self.b = self.bias_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)
    if self.use_bias:
        self.b = tf.stop_gradient(self.b)

9.2.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[tf.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/tf/conv1d.py

def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[tf.Tensor, int]: A tuple containing the KL divergence
            cost and the total number of parameters in the layer.
    """

    # Compute log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Add bias log probs only if using bias
    if self.use_bias and self.b is not None and self.bias_distribution is not None:
        log_probs += self.bias_distribution.log_prob(self.b)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params
    if self.use_bias and self.bias_distribution is not None:
        num_params += self.bias_distribution.num_params

    return log_probs, num_params

9.3 `Conv2d`

Bayesian 2D convolutional layer with optional weight and bias priors. Behaves like a standard Conv2d but treats weights and bias as random variables sampled from specified distributions. Parameters become fixed when the layer is frozen.

Source code in illia/nn/tf/conv2d.py

@saving.register_keras_serializable(package="illia", name="Conv2d")
class Conv2d(BayesianModule):
    """
    Bayesian 2D convolutional layer with optional weight and bias priors.
    Behaves like a standard Conv2d but treats weights and bias as random
    variables sampled from specified distributions. Parameters become fixed
    when the layer is frozen.
    """

    bias_distribution: Optional[GaussianDistribution] = None

    def __init__(
        self,
        input_channels: int,
        output_channels: int,
        kernel_size: int | list[int],
        stride: int | list[int] = 1,
        padding: str | list[int] = "VALID",
        dilation: Optional[int | list[int]] = None,
        groups: int = 1,
        data_format: Optional[str] = "NHWC",
        weights_distribution: Optional[GaussianDistribution] = None,
        bias_distribution: Optional[GaussianDistribution] = None,
        use_bias: bool = True,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Bayesian 2D convolutional layer.

        Args:
            input_channels: Number of channels in the input image.
            output_channels: Number of channels produced by the conv.
            kernel_size: Convolution kernel size as int or list.
            stride: Convolution stride as int or list.
            padding: Padding type 'VALID', 'SAME', or list of ints.
            dilation: Spacing between kernel elements as int or list.
            groups: Number of blocked connections between input/output.
            data_format: 'NHWC' or 'NCHW' format for input data.
            weights_distribution: Distribution for weights sampling.
            bias_distribution: Distribution for bias sampling.
            use_bias: Whether to include a bias term.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        # Check data format
        self._check_params(kernel_size, groups, stride, dilation, data_format)

        self.input_channels = input_channels
        self.output_channels = output_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.use_bias = use_bias

        # Check if kernel_size is a list and unpack it if necessary
        kernel_shape = (
            kernel_size if isinstance(kernel_size, list) else [kernel_size, kernel_size]
        )

        # Adjust the weights distribution based on the channel format
        self.data_format = (
            "NHWC" if data_format is None or data_format == "NHWC" else "NCHW"
        )

        # Set the weights distribution shape
        self._weights_distribution_shape = (
            input_channels // groups,
            *kernel_shape,
            output_channels,
        )

        # Set weights distribution
        if weights_distribution is None:
            self.weights_distribution = GaussianDistribution(
                shape=self._weights_distribution_shape
            )
        else:
            self.weights_distribution = weights_distribution

        # Set bias distribution
        if self.use_bias:
            if bias_distribution is None:
                self.bias_distribution = GaussianDistribution((output_channels,))
            else:
                self.bias_distribution = bias_distribution
        else:
            self.bias_distribution = None

    def _check_params(
        self,
        kernel_size: int | list[int],
        groups: int,
        stride: int | list[int],
        dilation: Optional[int | list[int]],
        data_format: Optional[str],
    ) -> None:
        """
        Validates parameters for the 2D convolution operation.

        Args:
            kernel_size: Convolution kernel size.
            groups: Number of blocked connections.
            stride: Convolution stride as int or list.
            dilation: Kernel spacing as int or list.
            data_format: 'NHWC' or 'NCHW' for input tensor.

        Raises:
            ValueError: If any parameter is invalid.
        """

        if kernel_size is not None and isinstance(kernel_size, int):
            if kernel_size <= 0 or kernel_size % groups != 0:
                raise ValueError(
                    f"Invalid `kernel_size`: {kernel_size}. Must "
                    f"be > 0 and divisible by `groups` {groups}."
                )
        if groups <= 0:
            raise ValueError(f"Invalid `groups`: {groups}. Must be > 0.")
        if isinstance(stride, list):
            if any(s == 0 for s in stride):
                raise ValueError(f"`stride` {stride} cannot contain 0.")
            if max(stride) > 1 and isinstance(dilation, list) and max(dilation) > 1:
                raise ValueError(
                    f"`stride` {stride} > 1 not allowed with "
                    f"`dilation` {dilation} > 1."
                )
        if data_format not in {"NHWC", "NCHW"}:
            raise ValueError(
                f"Invalid `data_format`: {data_format}. Must be 'NHWC' or 'NCHW'."
            )

    def build(self, input_shape: tf.TensorShape) -> None:
        """
        Build trainable and non-trainable parameters.

        Args:
            input_shape: Input shape used to trigger layer build.

        Returns:
            None
        """

        # Register non-trainable variables
        self.w = self.add_weight(
            name="weights",
            initializer=tf.constant_initializer(
                self.weights_distribution.sample().numpy()
            ),
            shape=self._weights_distribution_shape,
            trainable=False,
        )

        if self.use_bias and self.bias_distribution is not None:
            self.b = self.add_weight(
                name="bias",
                initializer=tf.constant_initializer(
                    self.bias_distribution.sample().numpy()
                ),
                shape=(self.output_channels,),
                trainable=False,
            )

        super().build(input_shape)

    def get_config(self) -> dict:
        """
        Return the configuration dictionary for serialization.

        Returns:
            dict: Dictionary with the layer configuration.
        """

        base_config = super().get_config()

        custom_config = {
            "input_channels": self.input_channels,
            "output_channels": self.output_channels,
            "kernel_size": self.kernel_size,
            "stride": self.stride,
            "padding": self.padding,
            "dilation": self.dilation,
            "groups": self.groups,
            "data_format": self.data_format,
        }

        return {**base_config, **custom_config}

    def _conv2d(
        self,
        inputs: tf.Tensor,
        weight: tf.Tensor,
        stride: int | list[int],
        padding: str | list[int],
        data_format: Optional[str] = "NHWC",
        dilation: Optional[int | list[int]] = None,
    ) -> tf.Tensor:
        """
        Performs a 2D convolution using provided weights.

        Args:
            inputs: Input tensor.
            weight: Convolutional kernel tensor.
            stride: Convolution stride as int or list.
            padding: Padding type 'VALID', 'SAME', or list of ints.
            data_format: 'NHWC' or 'NCHW' input format.
            dilation: Kernel spacing as int or list.

        Returns:
            Tensor after 2D convolution.
        """

        output: tf.Tensor = tf.nn.conv2d(
            input=inputs,
            filters=weight,
            strides=stride,
            padding=padding,
            data_format=data_format,
            dilations=dilation,
        )

        return output

    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.w is None:
            self.w = self.weights_distribution.sample()

        # Sample bias is they are undefined
        if self.use_bias and self.b is None and self.bias_distribution is not None:
            self.b = self.bias_distribution.sample()

        # Stop gradient computation
        self.w = tf.stop_gradient(self.w)
        if self.use_bias:
            self.b = tf.stop_gradient(self.b)

    def kl_cost(self) -> tuple[tf.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[tf.Tensor, int]: A tuple containing the KL divergence
                cost and the total number of parameters in the layer.
        """

        # Compute log probs
        log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

        # Add bias log probs only if using bias
        if self.use_bias and self.b is not None and self.bias_distribution is not None:
            log_probs += self.bias_distribution.log_prob(self.b)

        # Compute number of parameters
        num_params: int = self.weights_distribution.num_params
        if self.use_bias and self.bias_distribution is not None:
            num_params += self.bias_distribution.num_params

        return log_probs, num_params

    def call(self, inputs: tf.Tensor) -> tf.Tensor:
        """
        Performs a forward pass through the Bayesian Convolution 2D
        layer. If the layer is not frozen, it samples weights and bias
        from their respective distributions. If the layer is frozen
        and the weights or bias are not initialized, it also performs
        sampling.

        Args:
            inputs: Input tensor with shape [batch, height, width,
                channels] if NHWC or [batch, channels, height, width] if NCHW.

        Returns:
            Output tensor after convolution with optional bias added.

        Raises:
            ValueError: If the layer is frozen but weights or bias are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.w = self.weights_distribution.sample()

            # Sample bias only if using bias
            if self.use_bias and self.bias_distribution is not None:
                self.b = self.bias_distribution.sample()
        elif self.w is None or (self.use_bias and self.b is None):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Compute outputs
        outputs: tf.Tensor = self._conv2d(
            inputs=inputs,
            weight=self.w,
            stride=self.stride,
            padding=self.padding,
            data_format=self.data_format,
            dilation=self.dilation,
        )

        # Add bias only if using bias
        if self.use_bias and self.b is not None:
            outputs = tf.nn.bias_add(
                value=outputs,
                bias=self.b,
                data_format="N..C" if self.data_format == "NHWC" else "NC..",
            )

        return outputs

9.3.1 `init(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=None, groups=1, data_format='NHWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

Initializes a Bayesian 2D convolutional layer.

Parameters:

Name	Type	Description	Default
`input_channels`	`int`	Number of channels in the input image.	required
`output_channels`	`int`	Number of channels produced by the conv.	required
`kernel_size`	`int \| list[int]`	Convolution kernel size as int or list.	required
`stride`	`int \| list[int]`	Convolution stride as int or list.	`1`
`padding`	`str \| list[int]`	Padding type 'VALID', 'SAME', or list of ints.	`'VALID'`
`dilation`	`Optional[int \| list[int]]`	Spacing between kernel elements as int or list.	`None`
`groups`	`int`	Number of blocked connections between input/output.	`1`
`data_format`	`Optional[str]`	'NHWC' or 'NCHW' format for input data.	`'NHWC'`
`weights_distribution`	`Optional[GaussianDistribution]`	Distribution for weights sampling.	`None`
`bias_distribution`	`Optional[GaussianDistribution]`	Distribution for bias sampling.	`None`
`use_bias`	`bool`	Whether to include a bias term.	`True`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/tf/conv2d.py

def __init__(
    self,
    input_channels: int,
    output_channels: int,
    kernel_size: int | list[int],
    stride: int | list[int] = 1,
    padding: str | list[int] = "VALID",
    dilation: Optional[int | list[int]] = None,
    groups: int = 1,
    data_format: Optional[str] = "NHWC",
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian 2D convolutional layer.

    Args:
        input_channels: Number of channels in the input image.
        output_channels: Number of channels produced by the conv.
        kernel_size: Convolution kernel size as int or list.
        stride: Convolution stride as int or list.
        padding: Padding type 'VALID', 'SAME', or list of ints.
        dilation: Spacing between kernel elements as int or list.
        groups: Number of blocked connections between input/output.
        data_format: 'NHWC' or 'NCHW' format for input data.
        weights_distribution: Distribution for weights sampling.
        bias_distribution: Distribution for bias sampling.
        use_bias: Whether to include a bias term.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    # Check data format
    self._check_params(kernel_size, groups, stride, dilation, data_format)

    self.input_channels = input_channels
    self.output_channels = output_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.groups = groups
    self.use_bias = use_bias

    # Check if kernel_size is a list and unpack it if necessary
    kernel_shape = (
        kernel_size if isinstance(kernel_size, list) else [kernel_size, kernel_size]
    )

    # Adjust the weights distribution based on the channel format
    self.data_format = (
        "NHWC" if data_format is None or data_format == "NHWC" else "NCHW"
    )

    # Set the weights distribution shape
    self._weights_distribution_shape = (
        input_channels // groups,
        *kernel_shape,
        output_channels,
    )

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            shape=self._weights_distribution_shape
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((output_channels,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None

9.3.2 `call(inputs)`

Performs a forward pass through the Bayesian Convolution 2D layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor with shape [batch, height, width, channels] if NHWC or [batch, channels, height, width] if NCHW.	required

Returns:

Type	Description
`Tensor`	Output tensor after convolution with optional bias added.

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights or bias are undefined.

Source code in illia/nn/tf/conv2d.py

def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs a forward pass through the Bayesian Convolution 2D
    layer. If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.

    Args:
        inputs: Input tensor with shape [batch, height, width,
            channels] if NHWC or [batch, channels, height, width] if NCHW.

    Returns:
        Output tensor after convolution with optional bias added.

    Raises:
        ValueError: If the layer is frozen but weights or bias are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution is not None:
            self.b = self.bias_distribution.sample()
    elif self.w is None or (self.use_bias and self.b is None):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    outputs: tf.Tensor = self._conv2d(
        inputs=inputs,
        weight=self.w,
        stride=self.stride,
        padding=self.padding,
        data_format=self.data_format,
        dilation=self.dilation,
    )

    # Add bias only if using bias
    if self.use_bias and self.b is not None:
        outputs = tf.nn.bias_add(
            value=outputs,
            bias=self.b,
            data_format="N..C" if self.data_format == "NHWC" else "NC..",
        )

    return outputs

9.3.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/conv2d.py

def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Sample bias is they are undefined
    if self.use_bias and self.b is None and self.bias_distribution is not None:
        self.b = self.bias_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)
    if self.use_bias:
        self.b = tf.stop_gradient(self.b)

9.3.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[tf.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/tf/conv2d.py

def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[tf.Tensor, int]: A tuple containing the KL divergence
            cost and the total number of parameters in the layer.
    """

    # Compute log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Add bias log probs only if using bias
    if self.use_bias and self.b is not None and self.bias_distribution is not None:
        log_probs += self.bias_distribution.log_prob(self.b)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params
    if self.use_bias and self.bias_distribution is not None:
        num_params += self.bias_distribution.num_params

    return log_probs, num_params

9.4 `Embedding`

Bayesian embedding layer with optional padding and max-norm. Each embedding vector is sampled from a specified distribution. Can be frozen to fix embeddings and stop gradients.

Source code in illia/nn/tf/embedding.py

@saving.register_keras_serializable(package="illia", name="Embedding")
class Embedding(BayesianModule):
    """
    Bayesian embedding layer with optional padding and max-norm. Each
    embedding vector is sampled from a specified distribution. Can be
    frozen to fix embeddings and stop gradients.
    """

    def __init__(
        self,
        num_embeddings: int,
        embeddings_dim: int,
        padding_idx: Optional[int] = None,
        max_norm: Optional[float] = None,
        norm_type: float = 2.0,
        scale_grad_by_freq: bool = False,
        sparse: bool = False,
        weights_distribution: Optional[GaussianDistribution] = None,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Bayesian Embedding layer.

        Args:
            num_embeddings: Size of the embedding dictionary.
            embeddings_dim: Dimensionality of each embedding vector.
            padding_idx: Index to exclude from gradient computation.
            max_norm: Maximum norm for embedding vectors.
            norm_type: p of the p-norm for max_norm.
            scale_grad_by_freq: Scale gradient by inverse frequency.
            sparse: Use sparse gradient updates.
            weights_distribution: Distribution for embedding weights.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        # Set atributtes
        self.num_embeddings = num_embeddings
        self.embeddings_dim = embeddings_dim
        self.padding_idx = padding_idx
        self.max_norm = max_norm
        self.norm_type = norm_type
        self.scale_grad_by_freq = scale_grad_by_freq
        self.sparse = sparse

        # Set weights distribution
        if weights_distribution is None:
            self.weights_distribution = GaussianDistribution(
                (num_embeddings, embeddings_dim)
            )
        else:
            self.weights_distribution = weights_distribution

    def build(self, input_shape: tf.TensorShape) -> None:
        """
        Build trainable and non-trainable parameters.

        Args:
            input_shape: Input shape used to trigger layer build.

        Returns:
            None
        """

        # Create a variable for weights
        self.w = self.add_weight(
            name="weights",
            initializer=tf.constant_initializer(
                self.weights_distribution.sample().numpy()
            ),
            shape=(self.num_embeddings, self.embeddings_dim),
            trainable=False,
        )

        super().build(input_shape)

    def get_config(self) -> dict:
        """
        Return the configuration dictionary for serialization.

        Returns:
            dict: Dictionary with the layer configuration.
        """

        base_config = super().get_config()

        config = {
            "num_embeddings": self.num_embeddings,
            "embeddings_dim": self.embeddings_dim,
            "padding_idx": self.padding_idx,
            "max_norm": self.max_norm,
            "norm_type": self.norm_type,
            "scale_grad_by_freq": self.scale_grad_by_freq,
            "sparse": self.sparse,
        }

        return {**base_config, **config}

    def _embedding(
        self,
        inputs: tf.Tensor,
        weight: tf.Tensor,
        padding_idx: Optional[int] = None,
        max_norm: Optional[float] = None,
        norm_type: Optional[float] = 2.0,
        sparse: bool = False,
    ) -> tf.Tensor:
        """
        Computes embedding lookup with optional padding and normalization.

        Args:
            inputs: Input tensor of indices.
            weight: Embedding weight tensor.
            padding_idx: Index to mask out.
            max_norm: Maximum norm for embeddings.
            norm_type: Norm type for max_norm.
            sparse: Use sparse lookup if True.

        Returns:
            Tensor of embeddings.
        """

        inputs = tf.cast(inputs, tf.int32)
        if sparse is not None:
            embeddings = tf.nn.embedding_lookup(weight, inputs)
        else:
            embeddings = tf.nn.embedding_lookup_sparse(weight, inputs, sp_weights=None)

        if padding_idx is not None:
            padding_mask = tf.not_equal(inputs, padding_idx)
            embeddings = tf.where(
                tf.expand_dims(padding_mask, -1), embeddings, tf.zeros_like(embeddings)
            )

        if max_norm is not None:
            norms = tf.norm(embeddings, ord=norm_type, axis=-1, keepdims=True)
            desired = tf.clip_by_value(norms, clip_value_min=0, clip_value_max=max_norm)
            scale = desired / (tf.maximum(norms, 1e-7))
            embeddings = embeddings * scale

        return embeddings

    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.w is None:
            self.w = self.weights_distribution.sample()

        # Stop gradient computation
        self.w = tf.stop_gradient(self.w)

    def kl_cost(self) -> tuple[tf.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[tf.Tensor, int]: A tuple containing the KL divergence
                cost and the total number of parameters in the layer.
        """

        # Get log probs
        log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

        # Get number of parameters
        num_params: int = self.weights_distribution.num_params

        return log_probs, num_params

    def call(self, inputs: tf.Tensor) -> tf.Tensor:
        """
        Performs embedding lookup using current weights.

        Args:
            inputs: Input tensor of indices with shape [batch, *].

        Returns:
            Tensor of embeddings.

        Raises:
            ValueError: If the layer is frozen but weights are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.w = self.weights_distribution.sample()
        elif self.w is None:
            raise ValueError("Module has been frozen with undefined weights.")

        # Compute outputs
        outputs: tf.Tensor = self._embedding(
            inputs,
            self.w,
            self.padding_idx,
            self.max_norm,
            self.norm_type,
            self.sparse,
        )

        return outputs

9.4.1 `init(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)`

Initializes a Bayesian Embedding layer.

Parameters:

Name	Type	Description	Default
`num_embeddings`	`int`	Size of the embedding dictionary.	required
`embeddings_dim`	`int`	Dimensionality of each embedding vector.	required
`padding_idx`	`Optional[int]`	Index to exclude from gradient computation.	`None`
`max_norm`	`Optional[float]`	Maximum norm for embedding vectors.	`None`
`norm_type`	`float`	p of the p-norm for max_norm.	`2.0`
`scale_grad_by_freq`	`bool`	Scale gradient by inverse frequency.	`False`
`sparse`	`bool`	Use sparse gradient updates.	`False`
`weights_distribution`	`Optional[GaussianDistribution]`	Distribution for embedding weights.	`None`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/tf/embedding.py

def __init__(
    self,
    num_embeddings: int,
    embeddings_dim: int,
    padding_idx: Optional[int] = None,
    max_norm: Optional[float] = None,
    norm_type: float = 2.0,
    scale_grad_by_freq: bool = False,
    sparse: bool = False,
    weights_distribution: Optional[GaussianDistribution] = None,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian Embedding layer.

    Args:
        num_embeddings: Size of the embedding dictionary.
        embeddings_dim: Dimensionality of each embedding vector.
        padding_idx: Index to exclude from gradient computation.
        max_norm: Maximum norm for embedding vectors.
        norm_type: p of the p-norm for max_norm.
        scale_grad_by_freq: Scale gradient by inverse frequency.
        sparse: Use sparse gradient updates.
        weights_distribution: Distribution for embedding weights.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    # Set atributtes
    self.num_embeddings = num_embeddings
    self.embeddings_dim = embeddings_dim
    self.padding_idx = padding_idx
    self.max_norm = max_norm
    self.norm_type = norm_type
    self.scale_grad_by_freq = scale_grad_by_freq
    self.sparse = sparse

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            (num_embeddings, embeddings_dim)
        )
    else:
        self.weights_distribution = weights_distribution

9.4.2 `call(inputs)`

Performs embedding lookup using current weights.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor of indices with shape [batch, *].	required

Returns:

Type	Description
`Tensor`	Tensor of embeddings.

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights are undefined.

Source code in illia/nn/tf/embedding.py

def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs embedding lookup using current weights.

    Args:
        inputs: Input tensor of indices with shape [batch, *].

    Returns:
        Tensor of embeddings.

    Raises:
        ValueError: If the layer is frozen but weights are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()
    elif self.w is None:
        raise ValueError("Module has been frozen with undefined weights.")

    # Compute outputs
    outputs: tf.Tensor = self._embedding(
        inputs,
        self.w,
        self.padding_idx,
        self.max_norm,
        self.norm_type,
        self.sparse,
    )

    return outputs

9.4.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/embedding.py

def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)

9.4.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[tf.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/tf/embedding.py

def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[tf.Tensor, int]: A tuple containing the KL divergence
            cost and the total number of parameters in the layer.
    """

    # Get log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Get number of parameters
    num_params: int = self.weights_distribution.num_params

    return log_probs, num_params

9.5 `Linear`

Bayesian linear layer (fully connected) with optional weight and bias distributions. Can be frozen to stop gradient updates and fix parameters.

Source code in illia/nn/tf/linear.py

@saving.register_keras_serializable(package="illia", name="Linear")
class Linear(BayesianModule):
    """
    Bayesian linear layer (fully connected) with optional weight and bias
    distributions. Can be frozen to stop gradient updates and fix
    parameters.
    """

    bias_distribution: Optional[GaussianDistribution] = None

    def __init__(
        self,
        input_size: int,
        output_size: int,
        weights_distribution: Optional[GaussianDistribution] = None,
        bias_distribution: Optional[GaussianDistribution] = None,
        use_bias: bool = True,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Bayesian Linear layer.

        Args:
            input_size: Number of input features.
            output_size: Number of output features.
            weights_distribution: Distribution for the weights.
            bias_distribution: Distribution for the bias.
            use_bias: Whether to include a bias term.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        # Set parameters
        self.input_size = input_size
        self.output_size = output_size
        self.use_bias = use_bias

        # Set weights distribution
        if weights_distribution is None:
            self.weights_distribution = GaussianDistribution((output_size, input_size))
        else:
            self.weights_distribution = weights_distribution

        # Set bias distribution
        if self.use_bias:
            if bias_distribution is None:
                self.bias_distribution = GaussianDistribution((output_size,))
            else:
                self.bias_distribution = bias_distribution
        else:
            self.bias_distribution = None

    def build(self, input_shape: tf.TensorShape) -> None:
        """
        Build trainable and non-trainable parameters.

        Args:
            input_shape: Input shape used to trigger layer build.

        Returns:
            None
        """

        # Register non-trainable variables
        self.w = self.add_weight(
            name="weights",
            initializer=tf.constant_initializer(
                self.weights_distribution.sample().numpy()
            ),
            shape=(self.output_size, self.input_size),
            trainable=False,
        )

        if self.use_bias and self.bias_distribution is not None:
            self.b = self.add_weight(
                name="bias",
                initializer=tf.constant_initializer(
                    self.bias_distribution.sample().numpy()
                ),
                shape=(self.output_size,),
                trainable=False,
            )

        super().build(input_shape)

    def get_config(self) -> dict:
        """
        Return the configuration dictionary for serialization.

        Returns:
            dict: Dictionary with the layer configuration.
        """

        base_config = super().get_config()

        custom_config = {
            "input_size": self.input_size,
            "output_size": self.output_size,
        }

        return {**base_config, **custom_config}

    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.w is None:
            self.w = self.weights_distribution.sample()

        # Sample bias is they are undefined
        if self.use_bias and self.b is None and self.bias_distribution is not None:
            self.b = self.bias_distribution.sample()

        # Stop gradient computation
        self.w = tf.stop_gradient(self.w)
        if self.use_bias:
            self.b = tf.stop_gradient(self.b)

    def kl_cost(self) -> tuple[tf.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[tf.Tensor, int]: A tuple containing the KL divergence
                cost and the total number of parameters in the layer.
        """

        # Compute log probs
        log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

        # Add bias log probs only if using bias
        if self.use_bias and self.b is not None and self.bias_distribution is not None:
            log_probs += self.bias_distribution.log_prob(self.b)

        # Compute number of parameters
        num_params: int = self.weights_distribution.num_params
        if self.use_bias and self.bias_distribution is not None:
            num_params += self.bias_distribution.num_params

        return log_probs, num_params

    def call(self, inputs: tf.Tensor) -> tf.Tensor:
        """
        Performs forward pass using current weights and bias.

        Samples parameters if layer is not frozen. Raises an error if
        frozen weights are undefined.

        Args:
            inputs: Input tensor of shape [batch, features].

        Returns:
            Output tensor after linear transformation.

        Raises:
            ValueError: If the layer is frozen but weights or bias are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.w = self.weights_distribution.sample()

            # Sample bias only if using bias
            if self.use_bias and self.bias_distribution is not None:
                self.b = self.bias_distribution.sample()
        elif self.w is None or (self.use_bias and self.b is None):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Compute outputs
        outputs: tf.Tensor = tf.linalg.matmul(inputs, self.w, transpose_b=True)

        # Add bias only if using bias
        if self.use_bias and self.b is not None:
            outputs = tf.nn.bias_add(outputs, self.b)

        return outputs

9.5.1 `init(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

Initializes a Bayesian Linear layer.

Parameters:

Name	Type	Description	Default
`input_size`	`int`	Number of input features.	required
`output_size`	`int`	Number of output features.	required
`weights_distribution`	`Optional[GaussianDistribution]`	Distribution for the weights.	`None`
`bias_distribution`	`Optional[GaussianDistribution]`	Distribution for the bias.	`None`
`use_bias`	`bool`	Whether to include a bias term.	`True`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/tf/linear.py

def __init__(
    self,
    input_size: int,
    output_size: int,
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian Linear layer.

    Args:
        input_size: Number of input features.
        output_size: Number of output features.
        weights_distribution: Distribution for the weights.
        bias_distribution: Distribution for the bias.
        use_bias: Whether to include a bias term.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    # Set parameters
    self.input_size = input_size
    self.output_size = output_size
    self.use_bias = use_bias

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution((output_size, input_size))
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((output_size,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None

9.5.2 `call(inputs)`

Performs forward pass using current weights and bias.

Samples parameters if layer is not frozen. Raises an error if frozen weights are undefined.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor of shape [batch, features].	required

Returns:

Type	Description
`Tensor`	Output tensor after linear transformation.

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights or bias are undefined.

Source code in illia/nn/tf/linear.py

def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs forward pass using current weights and bias.

    Samples parameters if layer is not frozen. Raises an error if
    frozen weights are undefined.

    Args:
        inputs: Input tensor of shape [batch, features].

    Returns:
        Output tensor after linear transformation.

    Raises:
        ValueError: If the layer is frozen but weights or bias are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution is not None:
            self.b = self.bias_distribution.sample()
    elif self.w is None or (self.use_bias and self.b is None):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    outputs: tf.Tensor = tf.linalg.matmul(inputs, self.w, transpose_b=True)

    # Add bias only if using bias
    if self.use_bias and self.b is not None:
        outputs = tf.nn.bias_add(outputs, self.b)

    return outputs

9.5.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/linear.py

def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Sample bias is they are undefined
    if self.use_bias and self.b is None and self.bias_distribution is not None:
        self.b = self.bias_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)
    if self.use_bias:
        self.b = tf.stop_gradient(self.b)

9.5.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[tf.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/tf/linear.py

def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[tf.Tensor, int]: A tuple containing the KL divergence
            cost and the total number of parameters in the layer.
    """

    # Compute log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Add bias log probs only if using bias
    if self.use_bias and self.b is not None and self.bias_distribution is not None:
        log_probs += self.bias_distribution.log_prob(self.b)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params
    if self.use_bias and self.bias_distribution is not None:
        num_params += self.bias_distribution.num_params

    return log_probs, num_params

9.6 `LSTM`

Bayesian LSTM layer with embedding and probabilistic weights. All weights and biases are sampled from Gaussian distributions. Freezing the layer fixes parameters and stops gradient computation.

Source code in illia/nn/tf/lstm.py

@saving.register_keras_serializable(package="illia", name="LSTM")
class LSTM(BayesianModule):
    """
    Bayesian LSTM layer with embedding and probabilistic weights.
    All weights and biases are sampled from Gaussian distributions.
    Freezing the layer fixes parameters and stops gradient computation.
    """

    def __init__(
        self,
        num_embeddings: int,
        embeddings_dim: int,
        hidden_size: int,
        output_size: int,
        padding_idx: Optional[int] = None,
        max_norm: Optional[float] = None,
        norm_type: float = 2.0,
        scale_grad_by_freq: bool = False,
        sparse: bool = False,
        **kwargs: Any,
    ) -> None:
        """
        Initializes the Bayesian LSTM layer.

        Args:
            num_embeddings: Size of the embedding dictionary.
            embeddings_dim: Dimensionality of each embedding vector.
            hidden_size: Number of hidden units in the LSTM.
            output_size: Size of the final output.
            padding_idx: Index to ignore in embeddings.
            max_norm: Maximum norm for embedding vectors.
            norm_type: Norm type used for max_norm.
            scale_grad_by_freq: Scale gradient by inverse frequency.
            sparse: Use sparse embedding updates.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        self.num_embeddings = num_embeddings
        self.embeddings_dim = embeddings_dim
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.padding_idx = padding_idx
        self.max_norm = max_norm
        self.norm_type = norm_type
        self.scale_grad_by_freq = scale_grad_by_freq
        self.sparse = sparse

        # Define the Embedding layer
        self.embedding = Embedding(
            num_embeddings=self.num_embeddings,
            embeddings_dim=self.embeddings_dim,
            padding_idx=self.padding_idx,
            max_norm=self.max_norm,
            norm_type=self.norm_type,
            scale_grad_by_freq=self.scale_grad_by_freq,
            sparse=self.sparse,
        )

        # Initialize weight distributions
        # Forget gate
        self.wf_distribution = GaussianDistribution(
            (self.embeddings_dim + self.hidden_size, self.hidden_size)
        )
        self.bf_distribution = GaussianDistribution((self.hidden_size,))

        # Input gate
        self.wi_distribution = GaussianDistribution(
            (self.embeddings_dim + self.hidden_size, self.hidden_size)
        )
        self.bi_distribution = GaussianDistribution((self.hidden_size,))

        # Candidate gate
        self.wc_distribution = GaussianDistribution(
            (self.embeddings_dim + self.hidden_size, self.hidden_size)
        )
        self.bc_distribution = GaussianDistribution((self.hidden_size,))

        # Output gate
        self.wo_distribution = GaussianDistribution(
            (self.embeddings_dim + self.hidden_size, self.hidden_size)
        )
        self.bo_distribution = GaussianDistribution((self.hidden_size,))

        # Final output layer
        self.wv_distribution = GaussianDistribution(
            (self.hidden_size, self.output_size)
        )
        self.bv_distribution = GaussianDistribution((self.output_size,))

    def build(self, input_shape: tf.TensorShape) -> None:
        """
        Build trainable and non-trainable parameters.

        Args:
            input_shape: Input shape used to trigger layer build.

        Returns:
            None
        """

        # Forget gate weights and bias
        self.wf = self.add_weight(
            name="forget_gate_weights",
            initializer=tf.constant_initializer(self.wf_distribution.sample().numpy()),
            shape=(self.embeddings_dim + self.hidden_size, self.hidden_size),
            trainable=False,
        )

        self.bf = self.add_weight(
            name="forget_gate_bias",
            initializer=tf.constant_initializer(self.bf_distribution.sample().numpy()),
            shape=(self.hidden_size,),
            trainable=False,
        )

        # Input gate weights and bias
        self.wi = self.add_weight(
            name="input_gate_weights",
            initializer=tf.constant_initializer(self.wi_distribution.sample().numpy()),
            shape=(self.embeddings_dim + self.hidden_size, self.hidden_size),
            trainable=False,
        )

        self.bi = self.add_weight(
            name="input_gate_bias",
            initializer=tf.constant_initializer(self.bi_distribution.sample().numpy()),
            shape=(self.hidden_size,),
            trainable=False,
        )

        # Candidate gate weights and bias
        self.wc = self.add_weight(
            name="candidate_gate_weights",
            initializer=tf.constant_initializer(self.wc_distribution.sample().numpy()),
            shape=(self.embeddings_dim + self.hidden_size, self.hidden_size),
            trainable=False,
        )

        self.bc = self.add_weight(
            name="candidate_gate_bias",
            initializer=tf.constant_initializer(self.bc_distribution.sample().numpy()),
            shape=(self.hidden_size,),
            trainable=False,
        )

        # Output gate weights and bias
        self.wo = self.add_weight(
            name="output_gate_weights",
            initializer=tf.constant_initializer(self.wo_distribution.sample().numpy()),
            shape=(self.embeddings_dim + self.hidden_size, self.hidden_size),
            trainable=False,
        )

        self.bo = self.add_weight(
            name="output_gate_bias",
            initializer=tf.constant_initializer(self.bo_distribution.sample().numpy()),
            shape=(self.hidden_size,),
            trainable=False,
        )

        # Final output layer weights and bias
        self.wv = self.add_weight(
            name="final_output_weights",
            initializer=tf.constant_initializer(self.wv_distribution.sample().numpy()),
            shape=(self.hidden_size, self.output_size),
            trainable=False,
        )

        self.bv = self.add_weight(
            name="final_output_bias",
            initializer=tf.constant_initializer(self.bv_distribution.sample().numpy()),
            shape=(self.output_size,),
            trainable=False,
        )

        super().build(input_shape)

    def get_config(self) -> dict:
        """
        Return the configuration dictionary for serialization.

        Returns:
            dict: Dictionary with the layer configuration.
        """

        base_config = super().get_config()

        custom_config = {
            "num_embeddings": self.num_embeddings,
            "embeddings_dim": self.embeddings_dim,
            "hidden_size": self.hidden_size,
            "output_size": self.output_size,
            "padding_idx": self.padding_idx,
            "max_norm": self.max_norm,
            "norm_type": self.norm_type,
            "scale_grad_by_freq": self.scale_grad_by_freq,
            "sparse": self.sparse,
        }

        return {**base_config, **custom_config}

    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Freeze embedding layer
        self.embedding.freeze()

        # Forget gate
        if self.wf is None:
            self.wf = self.wf_distribution.sample()
        if self.bf is None:
            self.bf = self.bf_distribution.sample()
        self.wf = tf.stop_gradient(self.wf)
        self.bf = tf.stop_gradient(self.bf)

        # Input gate
        if self.wi is None:
            self.wi = self.wi_distribution.sample()
        if self.bi is None:
            self.bi = self.bi_distribution.sample()
        self.wi = tf.stop_gradient(self.wi)
        self.bi = tf.stop_gradient(self.bi)

        # Candidate gate
        if self.wc is None:
            self.wc = self.wc_distribution.sample()
        if self.bc is None:
            self.bc = self.bc_distribution.sample()
        self.wc = tf.stop_gradient(self.wc)
        self.bc = tf.stop_gradient(self.bc)

        # Output gate
        if self.wo is None:
            self.wo = self.wo_distribution.sample()
        if self.bo is None:
            self.bo = self.bo_distribution.sample()
        self.wo = tf.stop_gradient(self.wo)
        self.bo = tf.stop_gradient(self.bo)

        # Final output layer
        if self.wv is None:
            self.wv = self.wv_distribution.sample()
        if self.bv is None:
            self.bv = self.bv_distribution.sample()
        self.wv = tf.stop_gradient(self.wv)
        self.bv = tf.stop_gradient(self.bv)

    def kl_cost(self) -> tuple[tf.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[tf.Tensor, int]: A tuple containing the KL divergence
                cost and the total number of parameters in the layer.
        """

        # Compute log probs for each pair of weights and bias
        log_probs_f = self.wf_distribution.log_prob(
            self.wf
        ) + self.bf_distribution.log_prob(self.bf)

        log_probs_i = self.wi_distribution.log_prob(
            self.wi
        ) + self.bi_distribution.log_prob(self.bi)

        log_probs_c = self.wc_distribution.log_prob(
            self.wc
        ) + self.bc_distribution.log_prob(self.bc)

        log_probs_o = self.wo_distribution.log_prob(
            self.wo
        ) + self.bo_distribution.log_prob(self.bo)

        log_probs_v = self.wv_distribution.log_prob(
            self.wv
        ) + self.bv_distribution.log_prob(self.bv)

        # Compute the total loss
        log_probs = log_probs_f + log_probs_i + log_probs_c + log_probs_o + log_probs_v

        # Compute number of parameters
        num_params = (
            self.wf_distribution.num_params
            + self.bf_distribution.num_params
            + self.wi_distribution.num_params
            + self.bi_distribution.num_params
            + self.wc_distribution.num_params
            + self.bc_distribution.num_params
            + self.wo_distribution.num_params
            + self.bo_distribution.num_params
            + self.wv_distribution.num_params
            + self.bv_distribution.num_params
        )

        return log_probs, num_params

    def call(
        self,
        inputs: tf.Tensor,
        init_states: Optional[tuple[tf.Tensor, tf.Tensor]] = None,
    ) -> tuple[tf.Tensor, tuple[tf.Tensor, tf.Tensor]]:
        """
        Performs a forward pass through the Bayesian LSTM.

        Args:
            inputs: Input tensor of token indices. Shape: [batch, seq_len, 1].
            init_states: Optional tuple of initial (hidden, cell) states.

        Returns:
            Tuple containing:
                - Output tensor after final linear transformation.
                - Tuple of final hidden and cell states.

        Raises:
            ValueError: If the layer is frozen but weights are
                undefined.
        """

        # Sample weights if not frozen
        if not self.frozen:
            self.wf = self.wf_distribution.sample()
            self.bf = self.bf_distribution.sample()
            self.wi = self.wi_distribution.sample()
            self.bi = self.bi_distribution.sample()
            self.wc = self.wc_distribution.sample()
            self.bc = self.bc_distribution.sample()
            self.wo = self.wo_distribution.sample()
            self.bo = self.bo_distribution.sample()
            self.wv = self.wv_distribution.sample()
            self.bv = self.bv_distribution.sample()
        elif any(
            p is None
            for p in [
                self.wf,
                self.bf,
                self.wi,
                self.bi,
                self.wc,
                self.bc,
                self.wo,
                self.bo,
                self.wv,
                self.bv,
            ]
        ):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Apply embedding layer to input indices
        inputs = tf.squeeze(inputs, axis=-1)
        inputs = self.embedding(inputs)
        batch_size = tf.shape(inputs)[0]
        seq_len = tf.shape(inputs)[1]

        # Initialize h_t and c_t if init_states is None
        if init_states is None:
            h_t = tf.zeros([batch_size, self.hidden_size], dtype=inputs.dtype)
            c_t = tf.zeros([batch_size, self.hidden_size], dtype=inputs.dtype)
        else:
            h_t, c_t = init_states[0], init_states[1]

        # Process sequence
        for t in range(seq_len):
            # Shape: (batch_size, embedding_dim)
            x_t = inputs[:, t, :]

            # Concatenate input and hidden state
            # Shape: (batch_size, embedding_dim + hidden_size)
            z_t = tf.concat([x_t, h_t], axis=1)

            # Forget gate
            ft = tf.sigmoid(tf.matmul(z_t, self.wf) + self.bf)

            # Input gate
            it = tf.sigmoid(tf.matmul(z_t, self.wi) + self.bi)

            # Candidate cell state
            can = tf.tanh(tf.matmul(z_t, self.wc) + self.bc)

            # Output gate
            ot = tf.sigmoid(tf.matmul(z_t, self.wo) + self.bo)

            # Update cell state
            c_t = c_t * ft + can * it

            # Update hidden state
            h_t = ot * tf.tanh(c_t)

        # Compute final output
        y_t = tf.matmul(h_t, self.wv) + self.bv

        return y_t, (h_t, c_t)

9.6.1 `init(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)`

Initializes the Bayesian LSTM layer.

Parameters:

Name	Type	Description	Default
`num_embeddings`	`int`	Size of the embedding dictionary.	required
`embeddings_dim`	`int`	Dimensionality of each embedding vector.	required
`hidden_size`	`int`	Number of hidden units in the LSTM.	required
`output_size`	`int`	Size of the final output.	required
`padding_idx`	`Optional[int]`	Index to ignore in embeddings.	`None`
`max_norm`	`Optional[float]`	Maximum norm for embedding vectors.	`None`
`norm_type`	`float`	Norm type used for max_norm.	`2.0`
`scale_grad_by_freq`	`bool`	Scale gradient by inverse frequency.	`False`
`sparse`	`bool`	Use sparse embedding updates.	`False`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/tf/lstm.py

def __init__(
    self,
    num_embeddings: int,
    embeddings_dim: int,
    hidden_size: int,
    output_size: int,
    padding_idx: Optional[int] = None,
    max_norm: Optional[float] = None,
    norm_type: float = 2.0,
    scale_grad_by_freq: bool = False,
    sparse: bool = False,
    **kwargs: Any,
) -> None:
    """
    Initializes the Bayesian LSTM layer.

    Args:
        num_embeddings: Size of the embedding dictionary.
        embeddings_dim: Dimensionality of each embedding vector.
        hidden_size: Number of hidden units in the LSTM.
        output_size: Size of the final output.
        padding_idx: Index to ignore in embeddings.
        max_norm: Maximum norm for embedding vectors.
        norm_type: Norm type used for max_norm.
        scale_grad_by_freq: Scale gradient by inverse frequency.
        sparse: Use sparse embedding updates.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    self.num_embeddings = num_embeddings
    self.embeddings_dim = embeddings_dim
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.padding_idx = padding_idx
    self.max_norm = max_norm
    self.norm_type = norm_type
    self.scale_grad_by_freq = scale_grad_by_freq
    self.sparse = sparse

    # Define the Embedding layer
    self.embedding = Embedding(
        num_embeddings=self.num_embeddings,
        embeddings_dim=self.embeddings_dim,
        padding_idx=self.padding_idx,
        max_norm=self.max_norm,
        norm_type=self.norm_type,
        scale_grad_by_freq=self.scale_grad_by_freq,
        sparse=self.sparse,
    )

    # Initialize weight distributions
    # Forget gate
    self.wf_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bf_distribution = GaussianDistribution((self.hidden_size,))

    # Input gate
    self.wi_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bi_distribution = GaussianDistribution((self.hidden_size,))

    # Candidate gate
    self.wc_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bc_distribution = GaussianDistribution((self.hidden_size,))

    # Output gate
    self.wo_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bo_distribution = GaussianDistribution((self.hidden_size,))

    # Final output layer
    self.wv_distribution = GaussianDistribution(
        (self.hidden_size, self.output_size)
    )
    self.bv_distribution = GaussianDistribution((self.output_size,))

9.6.2 `call(inputs, init_states=None)`

Performs a forward pass through the Bayesian LSTM.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor of token indices. Shape: [batch, seq_len, 1].	required
`init_states`	`Optional[tuple[Tensor, Tensor]]`	Optional tuple of initial (hidden, cell) states.	`None`

Returns:

Type	Description
`tuple[Tensor, tuple[Tensor, Tensor]]`	Tuple containing: - Output tensor after final linear transformation. - Tuple of final hidden and cell states.

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights are undefined.

Source code in illia/nn/tf/lstm.py

def call(
    self,
    inputs: tf.Tensor,
    init_states: Optional[tuple[tf.Tensor, tf.Tensor]] = None,
) -> tuple[tf.Tensor, tuple[tf.Tensor, tf.Tensor]]:
    """
    Performs a forward pass through the Bayesian LSTM.

    Args:
        inputs: Input tensor of token indices. Shape: [batch, seq_len, 1].
        init_states: Optional tuple of initial (hidden, cell) states.

    Returns:
        Tuple containing:
            - Output tensor after final linear transformation.
            - Tuple of final hidden and cell states.

    Raises:
        ValueError: If the layer is frozen but weights are
            undefined.
    """

    # Sample weights if not frozen
    if not self.frozen:
        self.wf = self.wf_distribution.sample()
        self.bf = self.bf_distribution.sample()
        self.wi = self.wi_distribution.sample()
        self.bi = self.bi_distribution.sample()
        self.wc = self.wc_distribution.sample()
        self.bc = self.bc_distribution.sample()
        self.wo = self.wo_distribution.sample()
        self.bo = self.bo_distribution.sample()
        self.wv = self.wv_distribution.sample()
        self.bv = self.bv_distribution.sample()
    elif any(
        p is None
        for p in [
            self.wf,
            self.bf,
            self.wi,
            self.bi,
            self.wc,
            self.bc,
            self.wo,
            self.bo,
            self.wv,
            self.bv,
        ]
    ):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Apply embedding layer to input indices
    inputs = tf.squeeze(inputs, axis=-1)
    inputs = self.embedding(inputs)
    batch_size = tf.shape(inputs)[0]
    seq_len = tf.shape(inputs)[1]

    # Initialize h_t and c_t if init_states is None
    if init_states is None:
        h_t = tf.zeros([batch_size, self.hidden_size], dtype=inputs.dtype)
        c_t = tf.zeros([batch_size, self.hidden_size], dtype=inputs.dtype)
    else:
        h_t, c_t = init_states[0], init_states[1]

    # Process sequence
    for t in range(seq_len):
        # Shape: (batch_size, embedding_dim)
        x_t = inputs[:, t, :]

        # Concatenate input and hidden state
        # Shape: (batch_size, embedding_dim + hidden_size)
        z_t = tf.concat([x_t, h_t], axis=1)

        # Forget gate
        ft = tf.sigmoid(tf.matmul(z_t, self.wf) + self.bf)

        # Input gate
        it = tf.sigmoid(tf.matmul(z_t, self.wi) + self.bi)

        # Candidate cell state
        can = tf.tanh(tf.matmul(z_t, self.wc) + self.bc)

        # Output gate
        ot = tf.sigmoid(tf.matmul(z_t, self.wo) + self.bo)

        # Update cell state
        c_t = c_t * ft + can * it

        # Update hidden state
        h_t = ot * tf.tanh(c_t)

    # Compute final output
    y_t = tf.matmul(h_t, self.wv) + self.bv

    return y_t, (h_t, c_t)

9.6.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/tf/lstm.py

def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Freeze embedding layer
    self.embedding.freeze()

    # Forget gate
    if self.wf is None:
        self.wf = self.wf_distribution.sample()
    if self.bf is None:
        self.bf = self.bf_distribution.sample()
    self.wf = tf.stop_gradient(self.wf)
    self.bf = tf.stop_gradient(self.bf)

    # Input gate
    if self.wi is None:
        self.wi = self.wi_distribution.sample()
    if self.bi is None:
        self.bi = self.bi_distribution.sample()
    self.wi = tf.stop_gradient(self.wi)
    self.bi = tf.stop_gradient(self.bi)

    # Candidate gate
    if self.wc is None:
        self.wc = self.wc_distribution.sample()
    if self.bc is None:
        self.bc = self.bc_distribution.sample()
    self.wc = tf.stop_gradient(self.wc)
    self.bc = tf.stop_gradient(self.bc)

    # Output gate
    if self.wo is None:
        self.wo = self.wo_distribution.sample()
    if self.bo is None:
        self.bo = self.bo_distribution.sample()
    self.wo = tf.stop_gradient(self.wo)
    self.bo = tf.stop_gradient(self.bo)

    # Final output layer
    if self.wv is None:
        self.wv = self.wv_distribution.sample()
    if self.bv is None:
        self.bv = self.bv_distribution.sample()
    self.wv = tf.stop_gradient(self.wv)
    self.bv = tf.stop_gradient(self.bv)

9.6.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[tf.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/tf/lstm.py

def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[tf.Tensor, int]: A tuple containing the KL divergence
            cost and the total number of parameters in the layer.
    """

    # Compute log probs for each pair of weights and bias
    log_probs_f = self.wf_distribution.log_prob(
        self.wf
    ) + self.bf_distribution.log_prob(self.bf)

    log_probs_i = self.wi_distribution.log_prob(
        self.wi
    ) + self.bi_distribution.log_prob(self.bi)

    log_probs_c = self.wc_distribution.log_prob(
        self.wc
    ) + self.bc_distribution.log_prob(self.bc)

    log_probs_o = self.wo_distribution.log_prob(
        self.wo
    ) + self.bo_distribution.log_prob(self.bo)

    log_probs_v = self.wv_distribution.log_prob(
        self.wv
    ) + self.bv_distribution.log_prob(self.bv)

    # Compute the total loss
    log_probs = log_probs_f + log_probs_i + log_probs_c + log_probs_o + log_probs_v

    # Compute number of parameters
    num_params = (
        self.wf_distribution.num_params
        + self.bf_distribution.num_params
        + self.wi_distribution.num_params
        + self.bi_distribution.num_params
        + self.wc_distribution.num_params
        + self.bc_distribution.num_params
        + self.wo_distribution.num_params
        + self.bo_distribution.num_params
        + self.wv_distribution.num_params
        + self.bv_distribution.num_params
    )

    return log_probs, num_params

9. Neural Network Layers

9.1 BayesianModule

9.1.1 __init__(**kwargs)

9.1.2 freeze() abstractmethod

9.1.3 kl_cost() abstractmethod

9.1.4 unfreeze()

9.2 Conv1d

9.2.1 __init__(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=1, groups=1, data_format='NWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

9.2.2 call(inputs)

9.2.3 freeze()

9.2.4 kl_cost()

9.3 Conv2d

9.3.1 __init__(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=None, groups=1, data_format='NHWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

9.3.2 call(inputs)

9.3.3 freeze()

9.3.4 kl_cost()

9.4 Embedding

9.4.1 __init__(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)

9.4.2 call(inputs)

9.4.3 freeze()

9.4.4 kl_cost()

9.5 Linear

9.5.1 __init__(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

9.5.2 call(inputs)

9.5.3 freeze()

9.5.4 kl_cost()

9.6 LSTM

9.6.1 __init__(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)

9.6.2 call(inputs, init_states=None)

9.6.3 freeze()

9.6.4 kl_cost()

9.1 `BayesianModule`

9.1.1 `init(**kwargs)`

9.1.2 `freeze()` `abstractmethod`

9.1.3 `kl_cost()` `abstractmethod`

9.1.4 `unfreeze()`

9.2 `Conv1d`

9.2.1 `init(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=1, groups=1, data_format='NWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

9.2.2 `call(inputs)`

9.2.3 `freeze()`

9.2.4 `kl_cost()`

9.3 `Conv2d`

9.3.1 `init(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=None, groups=1, data_format='NHWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

9.3.2 `call(inputs)`

9.3.3 `freeze()`

9.3.4 `kl_cost()`

9.4 `Embedding`

9.4.1 `init(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)`

9.4.2 `call(inputs)`

9.4.3 `freeze()`

9.4.4 `kl_cost()`

9.5 `Linear`

9.5.1 `init(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

9.5.2 `call(inputs)`

9.5.3 `freeze()`

9.5.4 `kl_cost()`

9.6 `LSTM`

9.6.1 `init(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)`

9.6.2 `call(inputs, init_states=None)`

9.6.3 `freeze()`

9.6.4 `kl_cost()`