6. Neural Network Layers

6.1 `BayesianModule`

Abstract base for Bayesian-aware modules in PyTorch. Provides mechanisms to track if a module is Bayesian and control parameter updates through freezing/unfreezing.

Notes

All derived classes must implement freeze and kl_cost to handle parameter management and compute the KL divergence cost.

Source code in illia/nn/torch/base.py

class BayesianModule(torch.nn.Module, ABC):
    """
    Abstract base for Bayesian-aware modules in PyTorch.
    Provides mechanisms to track if a module is Bayesian and control
    parameter updates through freezing/unfreezing.

    Notes:
        All derived classes must implement `freeze` and `kl_cost` to
        handle parameter management and compute the KL divergence cost.
    """

    def __init__(self, **kwargs: Any) -> None:
        """
        Initialize the Bayesian module with default flags.
        Sets `frozen` to False and `is_bayesian` to True.

        Args:
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.
        """

        super().__init__(**kwargs)

        self.frozen: bool = False
        self.is_bayesian: bool = True

    @torch.jit.export
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.

        Notes:
            Must be implemented by all subclasses.
        """

        # Set frozen indicator to true for current layer
        self.frozen = True

    @torch.jit.export
    def unfreeze(self) -> None:
        """
        Unfreeze the module by setting its `frozen` flag to False.
        Allows parameters to be sampled and updated again.

        Returns:
            None.
        """

        # Set frozen indicator to false for current layer
        self.frozen = False

    @torch.jit.export
    def kl_cost(self) -> tuple[torch.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[torch.Tensor, int]: A tuple containing the KL
                divergence cost and the total number of parameters in
                the layer.

        Notes:
            Must be implemented by all subclasses.
        """

        return torch.tensor(0.0), 0

6.1.1 `init(**kwargs)`

Initialize the Bayesian module with default flags. Sets frozen to False and is_bayesian to True.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/base.py

def __init__(self, **kwargs: Any) -> None:
    """
    Initialize the Bayesian module with default flags.
    Sets `frozen` to False and `is_bayesian` to True.

    Args:
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.
    """

    super().__init__(**kwargs)

    self.frozen: bool = False
    self.is_bayesian: bool = True

6.1.2 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Notes

Must be implemented by all subclasses.

Source code in illia/nn/torch/base.py

@torch.jit.export
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.

    Notes:
        Must be implemented by all subclasses.
    """

    # Set frozen indicator to true for current layer
    self.frozen = True

6.1.3 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[torch.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Notes

Must be implemented by all subclasses.

Source code in illia/nn/torch/base.py

@torch.jit.export
def kl_cost(self) -> tuple[torch.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[torch.Tensor, int]: A tuple containing the KL
            divergence cost and the total number of parameters in
            the layer.

    Notes:
        Must be implemented by all subclasses.
    """

    return torch.tensor(0.0), 0

6.1.4 `unfreeze()`

Unfreeze the module by setting its frozen flag to False. Allows parameters to be sampled and updated again.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/base.py

@torch.jit.export
def unfreeze(self) -> None:
    """
    Unfreeze the module by setting its `frozen` flag to False.
    Allows parameters to be sampled and updated again.

    Returns:
        None.
    """

    # Set frozen indicator to false for current layer
    self.frozen = False

6.2 `Conv1d`

Bayesian 1D convolutional layer with optional weight and bias priors. Behaves like a standard Conv1d but treats weights and bias as random variables sampled from specified distributions. Parameters become fixed when the layer is frozen.

Source code in illia/nn/torch/conv1d.py

class Conv1d(BayesianModule):
    """
    Bayesian 1D convolutional layer with optional weight and bias priors.
    Behaves like a standard Conv1d but treats weights and bias as random
    variables sampled from specified distributions. Parameters become fixed
    when the layer is frozen.
    """

    weights: torch.Tensor
    bias: torch.Tensor

    def __init__(
        self,
        input_channels: int,
        output_channels: int,
        kernel_size: int,
        stride: int = 1,
        padding: int = 0,
        dilation: int = 1,
        groups: int = 1,
        weights_distribution: Optional[GaussianDistribution] = None,
        bias_distribution: Optional[GaussianDistribution] = None,
        use_bias: bool = True,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Bayesian 1D convolutional layer.

        Args:
            input_channels: Number of input channels.
            output_channels: Number of output channels.
            kernel_size: Size of the convolution kernel.
            stride: Stride of the convolution.
            padding: Padding added to both sides of the input.
            dilation: Spacing between kernel elements.
            groups: Number of blocked connections.
            weights_distribution: Distribution for the weights.
            bias_distribution: Distribution for the bias.
            use_bias: Whether to include a bias term.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        self.input_channels = input_channels
        self.output_channels = output_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.use_bias = use_bias

        # Set weights distribution
        if weights_distribution is None:
            # Define weights distribution
            self.weights_distribution = GaussianDistribution(
                (
                    self.output_channels,
                    self.input_channels // self.groups,
                    self.kernel_size,
                )
            )
        else:
            self.weights_distribution = weights_distribution

        # Set bias distribution
        if self.use_bias:
            if bias_distribution is None:
                self.bias_distribution = GaussianDistribution((self.output_channels,))
            else:
                self.bias_distribution = bias_distribution
        else:
            self.bias_distribution = None  # type: ignore

        # Sample initial weights
        weights = self.weights_distribution.sample()

        # Register buffers
        self.register_buffer("weights", weights)

        if self.use_bias and self.bias_distribution is not None:
            bias = self.bias_distribution.sample()
            self.register_buffer("bias", bias)
        else:
            self.bias = None  # type: ignore

    @torch.jit.export
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.weights is None:
            self.weights = self.weights_distribution.sample()

        # Sample bias if they are undefined and bias is used
        if self.use_bias and self.bias_distribution is not None:
            if not hasattr(self, "bias") or self.bias is None:
                self.bias = self.bias_distribution.sample()
            self.bias = self.bias.detach()

        # Detach weights and bias
        self.weights = self.weights.detach()

    @torch.jit.export
    def kl_cost(self) -> tuple[torch.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[torch.Tensor, int]: A tuple containing the KL
                divergence cost and the total number of parameters in
                the layer.
        """

        # Compute log probs
        log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

        # Add bias log probs if bias is used
        if self.use_bias and self.bias_distribution is not None:
            log_probs += self.bias_distribution.log_prob(self.bias)

        # Compute number of parameters
        num_params: int = self.weights_distribution.num_params()
        if self.use_bias and self.bias_distribution is not None:
            num_params += self.bias_distribution.num_params()

        return log_probs, num_params

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
        """
        Performs a forward pass through the Bayesian Convolution 1D
        layer. If the layer is not frozen, it samples weights and bias
        from their respective distributions. If the layer is frozen
        and the weights or bias are not initialized, it also performs
        sampling.

        Args:
            inputs: Input tensor to the layer with shape (batch,
                input channels, input width, input height).

        Returns:
            Output tensor after passing through the layer with shape
                (batch, output channels, output width, output height).

        Raises:
            ValueError: If the layer is frozen but weights or bias are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.weights = self.weights_distribution.sample()

            # Sample bias only if using bias
            if self.use_bias and self.bias_distribution is not None:
                self.bias = self.bias_distribution.sample()
        elif self.weights is None or (self.use_bias and self.bias is None):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Compute outputs
        # pylint: disable=E1102
        outputs: torch.Tensor = F.conv1d(
            input=inputs,
            weight=self.weights,
            stride=self.stride,
            padding=self.padding,
            dilation=self.dilation,
            groups=self.groups,
        )

        # Add bias only if using bias
        if self.use_bias and self.bias is not None:
            outputs += torch.reshape(
                input=self.bias, shape=(1, self.output_channels, 1)
            )

        return outputs

6.2.1 `init(input_channels, output_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

Initializes a Bayesian 1D convolutional layer.

Parameters:

Name	Type	Description	Default
`input_channels`	`int`	Number of input channels.	required
`output_channels`	`int`	Number of output channels.	required
`kernel_size`	`int`	Size of the convolution kernel.	required
`stride`	`int`	Stride of the convolution.	`1`
`padding`	`int`	Padding added to both sides of the input.	`0`
`dilation`	`int`	Spacing between kernel elements.	`1`
`groups`	`int`	Number of blocked connections.	`1`
`weights_distribution`	`Optional[GaussianDistribution]`	Distribution for the weights.	`None`
`bias_distribution`	`Optional[GaussianDistribution]`	Distribution for the bias.	`None`
`use_bias`	`bool`	Whether to include a bias term.	`True`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/torch/conv1d.py

def __init__(
    self,
    input_channels: int,
    output_channels: int,
    kernel_size: int,
    stride: int = 1,
    padding: int = 0,
    dilation: int = 1,
    groups: int = 1,
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian 1D convolutional layer.

    Args:
        input_channels: Number of input channels.
        output_channels: Number of output channels.
        kernel_size: Size of the convolution kernel.
        stride: Stride of the convolution.
        padding: Padding added to both sides of the input.
        dilation: Spacing between kernel elements.
        groups: Number of blocked connections.
        weights_distribution: Distribution for the weights.
        bias_distribution: Distribution for the bias.
        use_bias: Whether to include a bias term.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    self.input_channels = input_channels
    self.output_channels = output_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.groups = groups
    self.use_bias = use_bias

    # Set weights distribution
    if weights_distribution is None:
        # Define weights distribution
        self.weights_distribution = GaussianDistribution(
            (
                self.output_channels,
                self.input_channels // self.groups,
                self.kernel_size,
            )
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((self.output_channels,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None  # type: ignore

    # Sample initial weights
    weights = self.weights_distribution.sample()

    # Register buffers
    self.register_buffer("weights", weights)

    if self.use_bias and self.bias_distribution is not None:
        bias = self.bias_distribution.sample()
        self.register_buffer("bias", bias)
    else:
        self.bias = None  # type: ignore

6.2.2 `forward(inputs)`

Performs a forward pass through the Bayesian Convolution 1D layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor to the layer with shape (batch, input channels, input width, input height).	required

Returns:

Type	Description
`Tensor`	Output tensor after passing through the layer with shape (batch, output channels, output width, output height).

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights or bias are undefined.

Source code in illia/nn/torch/conv1d.py

def forward(self, inputs: torch.Tensor) -> torch.Tensor:
    """
    Performs a forward pass through the Bayesian Convolution 1D
    layer. If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.

    Args:
        inputs: Input tensor to the layer with shape (batch,
            input channels, input width, input height).

    Returns:
        Output tensor after passing through the layer with shape
            (batch, output channels, output width, output height).

    Raises:
        ValueError: If the layer is frozen but weights or bias are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.weights = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution is not None:
            self.bias = self.bias_distribution.sample()
    elif self.weights is None or (self.use_bias and self.bias is None):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    # pylint: disable=E1102
    outputs: torch.Tensor = F.conv1d(
        input=inputs,
        weight=self.weights,
        stride=self.stride,
        padding=self.padding,
        dilation=self.dilation,
        groups=self.groups,
    )

    # Add bias only if using bias
    if self.use_bias and self.bias is not None:
        outputs += torch.reshape(
            input=self.bias, shape=(1, self.output_channels, 1)
        )

    return outputs

6.2.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/conv1d.py

@torch.jit.export
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.weights is None:
        self.weights = self.weights_distribution.sample()

    # Sample bias if they are undefined and bias is used
    if self.use_bias and self.bias_distribution is not None:
        if not hasattr(self, "bias") or self.bias is None:
            self.bias = self.bias_distribution.sample()
        self.bias = self.bias.detach()

    # Detach weights and bias
    self.weights = self.weights.detach()

6.2.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[torch.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/torch/conv1d.py

@torch.jit.export
def kl_cost(self) -> tuple[torch.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[torch.Tensor, int]: A tuple containing the KL
            divergence cost and the total number of parameters in
            the layer.
    """

    # Compute log probs
    log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

    # Add bias log probs if bias is used
    if self.use_bias and self.bias_distribution is not None:
        log_probs += self.bias_distribution.log_prob(self.bias)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params()
    if self.use_bias and self.bias_distribution is not None:
        num_params += self.bias_distribution.num_params()

    return log_probs, num_params

6.3 `Conv2d`

Bayesian 2D convolutional layer with optional weight and bias priors. Behaves like a standard Conv2d but treats weights and bias as random variables sampled from specified distributions. Parameters become fixed when the layer is frozen.

Source code in illia/nn/torch/conv2d.py

class Conv2d(BayesianModule):
    """
    Bayesian 2D convolutional layer with optional weight and bias priors.
    Behaves like a standard Conv2d but treats weights and bias as random
    variables sampled from specified distributions. Parameters become fixed
    when the layer is frozen.
    """

    weights: torch.Tensor
    bias: torch.Tensor

    def __init__(
        self,
        input_channels: int,
        output_channels: int,
        kernel_size: int | tuple[int, int],
        stride: int | tuple[int, int] = 1,
        padding: int | tuple[int, int] = 0,
        dilation: int | tuple[int, int] = 1,
        groups: int = 1,
        weights_distribution: Optional[GaussianDistribution] = None,
        bias_distribution: Optional[GaussianDistribution] = None,
        use_bias: bool = True,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Bayesian 2D convolutional layer.

        Args:
            kernel_size: Size of the convolving kernel.
            stride: Stride of the convolution. Deafults to 1.
            padding: Padding added to all four sides of the input.
            dilation: Spacing between kernel elements.
            groups: Number of blocked connections from input channels
                to output channels.
            weights_distribution: The distribution for the weights.
            bias_distribution: The distribution for the bias.
            use_bias: Whether to include a bias term.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        self.input_channels = input_channels
        self.output_channels = output_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.use_bias = use_bias

        # Set weights distribution
        if weights_distribution is None:
            # Extend kernel if we only have 1 value
            if isinstance(self.kernel_size, int):
                self.kernel_size = (self.kernel_size, self.kernel_size)

            # Define weights distribution
            self.weights_distribution = GaussianDistribution(
                (
                    self.output_channels,
                    self.input_channels // self.groups,
                    *self.kernel_size,
                )
            )
        else:
            self.weights_distribution = weights_distribution

        # Set bias distribution
        if self.use_bias:
            if bias_distribution is None:
                self.bias_distribution = GaussianDistribution((self.output_channels,))
            else:
                self.bias_distribution = bias_distribution
        else:
            self.bias_distribution = None  # type: ignore

        # Sample initial weights
        weights = self.weights_distribution.sample()

        # Register buffers
        self.register_buffer("weights", weights)

        if self.use_bias and self.bias_distribution is not None:
            bias = self.bias_distribution.sample()
            self.register_buffer("bias", bias)
        else:
            self.bias = None  # type: ignore

    @torch.jit.export
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.weights is None:
            self.weights = self.weights_distribution.sample()

        # Sample bias if they are undefined and bias is used
        if self.use_bias and self.bias_distribution is not None:
            if not hasattr(self, "bias") or self.bias is None:
                self.bias = self.bias_distribution.sample()
            self.bias = self.bias.detach()

        # Detach weights and bias
        self.weights = self.weights.detach()

    @torch.jit.export
    def kl_cost(self) -> tuple[torch.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[torch.Tensor, int]: A tuple containing the KL
                divergence cost and the total number of parameters in
                the layer.
        """

        # Compute log probs
        log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

        # Add bias log probs if bias is used
        if self.use_bias and self.bias_distribution is not None:
            log_probs += self.bias_distribution.log_prob(self.bias)

        # Compute number of parameters
        num_params: int = self.weights_distribution.num_params()
        if self.use_bias and self.bias_distribution is not None:
            num_params += self.bias_distribution.num_params()

        return log_probs, num_params

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
        """
        Performs a forward pass through the Bayesian Convolution 2D
        layer. If the layer is not frozen, it samples weights and bias
        from their respective distributions. If the layer is frozen
        and the weights or bias are not initialized, it also performs
        sampling.

        Args:
            inputs: Input tensor to the layer with shape (batch,
                input channels, input width, input height).

        Returns:
            Output tensor after passing through the layer with shape
                (batch, output channels, output width, output height).

        Raises:
            ValueError: If the layer is frozen but weights or bias are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.weights = self.weights_distribution.sample()

            # Sample bias only if using bias
            if self.use_bias and self.bias_distribution is not None:
                self.bias = self.bias_distribution.sample()
        elif self.weights is None or (self.use_bias and self.bias is None):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Compute outputs
        # pylint: disable=E1102
        outputs: torch.Tensor = F.conv2d(
            input=inputs,
            weight=self.weights,
            bias=self.bias,
            padding=self.padding,
            dilation=self.dilation,
            groups=self.groups,
        )

        # Add bias only if using bias
        if self.use_bias and self.bias is not None:
            outputs += torch.reshape(
                input=self.bias, shape=(1, self.output_channels, 1, 1)
            )

        return outputs

6.3.1 `init(input_channels, output_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

Initializes a Bayesian 2D convolutional layer.

Parameters:

Name	Type	Description	Default
`kernel_size`	`int \| tuple[int, int]`	Size of the convolving kernel.	required
`stride`	`int \| tuple[int, int]`	Stride of the convolution. Deafults to 1.	`1`
`padding`	`int \| tuple[int, int]`	Padding added to all four sides of the input.	`0`
`dilation`	`int \| tuple[int, int]`	Spacing between kernel elements.	`1`
`groups`	`int`	Number of blocked connections from input channels to output channels.	`1`
`weights_distribution`	`Optional[GaussianDistribution]`	The distribution for the weights.	`None`
`bias_distribution`	`Optional[GaussianDistribution]`	The distribution for the bias.	`None`
`use_bias`	`bool`	Whether to include a bias term.	`True`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/torch/conv2d.py

def __init__(
    self,
    input_channels: int,
    output_channels: int,
    kernel_size: int | tuple[int, int],
    stride: int | tuple[int, int] = 1,
    padding: int | tuple[int, int] = 0,
    dilation: int | tuple[int, int] = 1,
    groups: int = 1,
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian 2D convolutional layer.

    Args:
        kernel_size: Size of the convolving kernel.
        stride: Stride of the convolution. Deafults to 1.
        padding: Padding added to all four sides of the input.
        dilation: Spacing between kernel elements.
        groups: Number of blocked connections from input channels
            to output channels.
        weights_distribution: The distribution for the weights.
        bias_distribution: The distribution for the bias.
        use_bias: Whether to include a bias term.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    self.input_channels = input_channels
    self.output_channels = output_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.groups = groups
    self.use_bias = use_bias

    # Set weights distribution
    if weights_distribution is None:
        # Extend kernel if we only have 1 value
        if isinstance(self.kernel_size, int):
            self.kernel_size = (self.kernel_size, self.kernel_size)

        # Define weights distribution
        self.weights_distribution = GaussianDistribution(
            (
                self.output_channels,
                self.input_channels // self.groups,
                *self.kernel_size,
            )
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((self.output_channels,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None  # type: ignore

    # Sample initial weights
    weights = self.weights_distribution.sample()

    # Register buffers
    self.register_buffer("weights", weights)

    if self.use_bias and self.bias_distribution is not None:
        bias = self.bias_distribution.sample()
        self.register_buffer("bias", bias)
    else:
        self.bias = None  # type: ignore

6.3.2 `forward(inputs)`

Performs a forward pass through the Bayesian Convolution 2D layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor to the layer with shape (batch, input channels, input width, input height).	required

Returns:

Type	Description
`Tensor`	Output tensor after passing through the layer with shape (batch, output channels, output width, output height).

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights or bias are undefined.

Source code in illia/nn/torch/conv2d.py

def forward(self, inputs: torch.Tensor) -> torch.Tensor:
    """
    Performs a forward pass through the Bayesian Convolution 2D
    layer. If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.

    Args:
        inputs: Input tensor to the layer with shape (batch,
            input channels, input width, input height).

    Returns:
        Output tensor after passing through the layer with shape
            (batch, output channels, output width, output height).

    Raises:
        ValueError: If the layer is frozen but weights or bias are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.weights = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution is not None:
            self.bias = self.bias_distribution.sample()
    elif self.weights is None or (self.use_bias and self.bias is None):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    # pylint: disable=E1102
    outputs: torch.Tensor = F.conv2d(
        input=inputs,
        weight=self.weights,
        bias=self.bias,
        padding=self.padding,
        dilation=self.dilation,
        groups=self.groups,
    )

    # Add bias only if using bias
    if self.use_bias and self.bias is not None:
        outputs += torch.reshape(
            input=self.bias, shape=(1, self.output_channels, 1, 1)
        )

    return outputs

6.3.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/conv2d.py

@torch.jit.export
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.weights is None:
        self.weights = self.weights_distribution.sample()

    # Sample bias if they are undefined and bias is used
    if self.use_bias and self.bias_distribution is not None:
        if not hasattr(self, "bias") or self.bias is None:
            self.bias = self.bias_distribution.sample()
        self.bias = self.bias.detach()

    # Detach weights and bias
    self.weights = self.weights.detach()

6.3.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[torch.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/torch/conv2d.py

@torch.jit.export
def kl_cost(self) -> tuple[torch.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[torch.Tensor, int]: A tuple containing the KL
            divergence cost and the total number of parameters in
            the layer.
    """

    # Compute log probs
    log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

    # Add bias log probs if bias is used
    if self.use_bias and self.bias_distribution is not None:
        log_probs += self.bias_distribution.log_prob(self.bias)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params()
    if self.use_bias and self.bias_distribution is not None:
        num_params += self.bias_distribution.num_params()

    return log_probs, num_params

6.4 `Embedding`

This class is the bayesian implementation of the Embedding class.

Source code in illia/nn/torch/embedding.py

class Embedding(BayesianModule):
    """
    This class is the bayesian implementation of the Embedding class.
    """

    weights: torch.Tensor

    def __init__(
        self,
        num_embeddings: int,
        embeddings_dim: int,
        padding_idx: Optional[int] = None,
        max_norm: Optional[float] = None,
        norm_type: float = 2.0,
        scale_grad_by_freq: bool = False,
        sparse: bool = False,
        weights_distribution: Optional[GaussianDistribution] = None,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Embedding layer.

        Args:
            num_embeddings: size of the dictionary of embeddings.
            embeddings_dim: the size of each embedding vector.
            padding_idx: If specified, the entries at padding_idx do
                not contribute to the gradient.
            max_norm: If given, each embedding vector with norm larger
                than max_norm is renormalized to have norm max_norm.
            norm_type: The p of the p-norm to compute for the max_norm
                option.
            scale_grad_by_freq: If given, this will scale gradients by
                the inverse of frequency of the words in the
                mini-batch.
            sparse: If True, gradient w.r.t. weight matrix will be a
                sparse tensor.
            weights_distribution: distribution for the weights of the
                layer.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        # Set embeddings atributtes
        self.num_embeddings = num_embeddings
        self.embeddings_dim = embeddings_dim
        self.padding_idx = padding_idx
        self.max_norm = max_norm
        self.norm_type = norm_type
        self.scale_grad_by_freq = scale_grad_by_freq
        self.sparse = sparse

        # Set weights distribution
        if weights_distribution is None:
            self.weights_distribution = GaussianDistribution(
                (self.num_embeddings, self.embeddings_dim)
            )
        else:
            self.weights_distribution = weights_distribution

        # Sample initial weights
        weights = self.weights_distribution.sample()

        # Register buffers
        self.register_buffer("weights", weights)

    @torch.jit.export
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # set indicator
        self.frozen = True

        # sample weights if they are undefined
        if self.weights is None:
            self.weights = self.weights_distribution.sample()

        # detach weights
        self.weights = self.weights.detach()

    @torch.jit.export
    def kl_cost(self) -> tuple[torch.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[torch.Tensor, int]: A tuple containing the KL
                divergence cost and the total number of parameters in
                the layer.
        """

        # get log posterior and log prior
        log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

        # get number of parameters
        num_params: int = self.weights_distribution.num_params()

        return log_probs, num_params

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
        """
        This method is the forward pass of the layer.

        Args:
            inputs: input tensor. Dimensions: [*].

        Returns:
            outputs tensor. Dimension: [*, embedding dim].

        Raises:
            ValueError: If the layer is frozen but weights are
                undefined.
        """

        # Forward depeding of frozen state
        if not self.frozen:
            self.weights = self.weights_distribution.sample()
        elif self.weights is None:
            raise ValueError("Module has been frozen with undefined weights")

        # Run torch forward
        return F.embedding(
            input=inputs,
            weight=self.weights,
            padding_idx=self.padding_idx,
            max_norm=self.max_norm,
            norm_type=self.norm_type,
            scale_grad_by_freq=self.scale_grad_by_freq,
            sparse=self.sparse,
        )

6.4.1 `init(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)`

Initializes a Embedding layer.

Parameters:

Name	Type	Description	Default
`num_embeddings`	`int`	size of the dictionary of embeddings.	required
`embeddings_dim`	`int`	the size of each embedding vector.	required
`padding_idx`	`Optional[int]`	If specified, the entries at padding_idx do not contribute to the gradient.	`None`
`max_norm`	`Optional[float]`	If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.	`None`
`norm_type`	`float`	The p of the p-norm to compute for the max_norm option.	`2.0`
`scale_grad_by_freq`	`bool`	If given, this will scale gradients by the inverse of frequency of the words in the mini-batch.	`False`
`sparse`	`bool`	If True, gradient w.r.t. weight matrix will be a sparse tensor.	`False`
`weights_distribution`	`Optional[GaussianDistribution]`	distribution for the weights of the layer.	`None`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/torch/embedding.py

def __init__(
    self,
    num_embeddings: int,
    embeddings_dim: int,
    padding_idx: Optional[int] = None,
    max_norm: Optional[float] = None,
    norm_type: float = 2.0,
    scale_grad_by_freq: bool = False,
    sparse: bool = False,
    weights_distribution: Optional[GaussianDistribution] = None,
    **kwargs: Any,
) -> None:
    """
    Initializes a Embedding layer.

    Args:
        num_embeddings: size of the dictionary of embeddings.
        embeddings_dim: the size of each embedding vector.
        padding_idx: If specified, the entries at padding_idx do
            not contribute to the gradient.
        max_norm: If given, each embedding vector with norm larger
            than max_norm is renormalized to have norm max_norm.
        norm_type: The p of the p-norm to compute for the max_norm
            option.
        scale_grad_by_freq: If given, this will scale gradients by
            the inverse of frequency of the words in the
            mini-batch.
        sparse: If True, gradient w.r.t. weight matrix will be a
            sparse tensor.
        weights_distribution: distribution for the weights of the
            layer.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    # Set embeddings atributtes
    self.num_embeddings = num_embeddings
    self.embeddings_dim = embeddings_dim
    self.padding_idx = padding_idx
    self.max_norm = max_norm
    self.norm_type = norm_type
    self.scale_grad_by_freq = scale_grad_by_freq
    self.sparse = sparse

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            (self.num_embeddings, self.embeddings_dim)
        )
    else:
        self.weights_distribution = weights_distribution

    # Sample initial weights
    weights = self.weights_distribution.sample()

    # Register buffers
    self.register_buffer("weights", weights)

6.4.2 `forward(inputs)`

This method is the forward pass of the layer.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	input tensor. Dimensions: [*].	required

Returns:

Type	Description
`Tensor`	outputs tensor. Dimension: [*, embedding dim].

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights are undefined.

Source code in illia/nn/torch/embedding.py

def forward(self, inputs: torch.Tensor) -> torch.Tensor:
    """
    This method is the forward pass of the layer.

    Args:
        inputs: input tensor. Dimensions: [*].

    Returns:
        outputs tensor. Dimension: [*, embedding dim].

    Raises:
        ValueError: If the layer is frozen but weights are
            undefined.
    """

    # Forward depeding of frozen state
    if not self.frozen:
        self.weights = self.weights_distribution.sample()
    elif self.weights is None:
        raise ValueError("Module has been frozen with undefined weights")

    # Run torch forward
    return F.embedding(
        input=inputs,
        weight=self.weights,
        padding_idx=self.padding_idx,
        max_norm=self.max_norm,
        norm_type=self.norm_type,
        scale_grad_by_freq=self.scale_grad_by_freq,
        sparse=self.sparse,
    )

6.4.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/embedding.py

@torch.jit.export
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # set indicator
    self.frozen = True

    # sample weights if they are undefined
    if self.weights is None:
        self.weights = self.weights_distribution.sample()

    # detach weights
    self.weights = self.weights.detach()

6.4.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[torch.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/torch/embedding.py

@torch.jit.export
def kl_cost(self) -> tuple[torch.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[torch.Tensor, int]: A tuple containing the KL
            divergence cost and the total number of parameters in
            the layer.
    """

    # get log posterior and log prior
    log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

    # get number of parameters
    num_params: int = self.weights_distribution.num_params()

    return log_probs, num_params

6.5 `Linear`

This class is the bayesian implementation of the torch Linear layer.

Source code in illia/nn/torch/linear.py

class Linear(BayesianModule):
    """
    This class is the bayesian implementation of the torch Linear layer.
    """

    weights: torch.Tensor
    bias: torch.Tensor

    def __init__(
        self,
        input_size: int,
        output_size: int,
        weights_distribution: Optional[GaussianDistribution] = None,
        bias_distribution: Optional[GaussianDistribution] = None,
        use_bias: bool = True,
        **kwargs: Any,
    ) -> None:
        """
        Initializes a Linear layer.

        Args:
            input_size: Input size of the linear layer.
            output_size: Output size of the linear layer.
            weights_distribution: GaussianDistribution for the weights of the
                layer. Defaults to None.
            bias_distribution: GaussianDistribution for the bias of the layer.
                Defaults to None.
            use_bias: Whether to include a bias term.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        self.input_size = input_size
        self.output_size = output_size
        self.use_bias = use_bias

        # Set weights distribution
        if weights_distribution is None:
            self.weights_distribution = GaussianDistribution(
                (self.output_size, self.input_size)
            )
        else:
            self.weights_distribution = weights_distribution

        # Set bias distribution
        if self.use_bias:
            if bias_distribution is None:
                self.bias_distribution = GaussianDistribution((self.output_size,))
            else:
                self.bias_distribution = bias_distribution
        else:
            self.bias_distribution = None  # type: ignore

        # Sample initial weights
        weights = self.weights_distribution.sample()

        # Register buffers
        self.register_buffer("weights", weights)

        if self.use_bias and self.bias_distribution is not None:
            bias = self.bias_distribution.sample()
            self.register_buffer("bias", bias)
        else:
            self.bias = None  # type: ignore

    @torch.jit.export
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Sample weights if they are undefined
        if self.weights is None:
            self.weights = self.weights_distribution.sample()

        # Sample bias if they are undefined and bias is used
        if self.use_bias and self.bias_distribution is not None:
            if not hasattr(self, "bias") or self.bias is None:
                self.bias = self.bias_distribution.sample()
            self.bias = self.bias.detach()

        # Detach weights and bias
        self.weights = self.weights.detach()

    @torch.jit.export
    def kl_cost(self) -> tuple[torch.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[torch.Tensor, int]: A tuple containing the KL
                divergence cost and the total number of parameters in
                the layer.
        """

        # Compute log probs
        log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

        # Add bias log probs if bias is used
        if self.use_bias and self.bias_distribution is not None:
            log_probs += self.bias_distribution.log_prob(self.bias)

        # Compute number of parameters
        num_params: int = self.weights_distribution.num_params()
        if self.use_bias and self.bias_distribution is not None:
            num_params += self.bias_distribution.num_params()

        return log_probs, num_params

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
        """
        This method is the forward pass of the layer.

        Args:
            inputs: input tensor. Dimensions: [batch, *].

        Returns:
            outputs tensor. Dimensions: [batch, *].

        Raises:
            ValueError: If the layer is frozen but weights or bias are
                undefined.
        """

        # Check if layer is frozen
        if not self.frozen:
            self.weights = self.weights_distribution.sample()

            # Sample bias only if using bias
            if self.use_bias and self.bias_distribution is not None:
                self.bias = self.bias_distribution.sample()
        elif self.weights is None or (self.use_bias and self.bias is None):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Compute outputs
        # pylint: disable=E1102
        outputs: torch.Tensor = F.linear(input=inputs, weight=self.weights)

        # Add bias only if using bias
        if self.use_bias and self.bias is not None:
            outputs += torch.reshape(input=self.bias, shape=(1, self.output_size))

        return outputs

6.5.1 `init(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

Initializes a Linear layer.

Parameters:

Name	Type	Description	Default
`input_size`	`int`	Input size of the linear layer.	required
`output_size`	`int`	Output size of the linear layer.	required
`weights_distribution`	`Optional[GaussianDistribution]`	GaussianDistribution for the weights of the layer. Defaults to None.	`None`
`bias_distribution`	`Optional[GaussianDistribution]`	GaussianDistribution for the bias of the layer. Defaults to None.	`None`
`use_bias`	`bool`	Whether to include a bias term.	`True`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/torch/linear.py

def __init__(
    self,
    input_size: int,
    output_size: int,
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Linear layer.

    Args:
        input_size: Input size of the linear layer.
        output_size: Output size of the linear layer.
        weights_distribution: GaussianDistribution for the weights of the
            layer. Defaults to None.
        bias_distribution: GaussianDistribution for the bias of the layer.
            Defaults to None.
        use_bias: Whether to include a bias term.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    self.input_size = input_size
    self.output_size = output_size
    self.use_bias = use_bias

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            (self.output_size, self.input_size)
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((self.output_size,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None  # type: ignore

    # Sample initial weights
    weights = self.weights_distribution.sample()

    # Register buffers
    self.register_buffer("weights", weights)

    if self.use_bias and self.bias_distribution is not None:
        bias = self.bias_distribution.sample()
        self.register_buffer("bias", bias)
    else:
        self.bias = None  # type: ignore

6.5.2 `forward(inputs)`

This method is the forward pass of the layer.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	input tensor. Dimensions: [batch, *].	required

Returns:

Type	Description
`Tensor`	outputs tensor. Dimensions: [batch, *].

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights or bias are undefined.

Source code in illia/nn/torch/linear.py

def forward(self, inputs: torch.Tensor) -> torch.Tensor:
    """
    This method is the forward pass of the layer.

    Args:
        inputs: input tensor. Dimensions: [batch, *].

    Returns:
        outputs tensor. Dimensions: [batch, *].

    Raises:
        ValueError: If the layer is frozen but weights or bias are
            undefined.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.weights = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution is not None:
            self.bias = self.bias_distribution.sample()
    elif self.weights is None or (self.use_bias and self.bias is None):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    # pylint: disable=E1102
    outputs: torch.Tensor = F.linear(input=inputs, weight=self.weights)

    # Add bias only if using bias
    if self.use_bias and self.bias is not None:
        outputs += torch.reshape(input=self.bias, shape=(1, self.output_size))

    return outputs

6.5.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/linear.py

@torch.jit.export
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.weights is None:
        self.weights = self.weights_distribution.sample()

    # Sample bias if they are undefined and bias is used
    if self.use_bias and self.bias_distribution is not None:
        if not hasattr(self, "bias") or self.bias is None:
            self.bias = self.bias_distribution.sample()
        self.bias = self.bias.detach()

    # Detach weights and bias
    self.weights = self.weights.detach()

6.5.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[torch.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/torch/linear.py

@torch.jit.export
def kl_cost(self) -> tuple[torch.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[torch.Tensor, int]: A tuple containing the KL
            divergence cost and the total number of parameters in
            the layer.
    """

    # Compute log probs
    log_probs: torch.Tensor = self.weights_distribution.log_prob(self.weights)

    # Add bias log probs if bias is used
    if self.use_bias and self.bias_distribution is not None:
        log_probs += self.bias_distribution.log_prob(self.bias)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params()
    if self.use_bias and self.bias_distribution is not None:
        num_params += self.bias_distribution.num_params()

    return log_probs, num_params

6.6 `LSTM`

Bayesian LSTM layer with embedding and probabilistic weights. All weights and biases are sampled from Gaussian distributions. Freezing the layer fixes parameters and stops gradient computation.

Source code in illia/nn/torch/lstm.py

class LSTM(BayesianModule):
    """
    Bayesian LSTM layer with embedding and probabilistic weights.
    All weights and biases are sampled from Gaussian distributions.
    Freezing the layer fixes parameters and stops gradient computation.
    """

    # Forget gate
    wf: torch.Tensor
    bf: torch.Tensor

    # Input gate
    wi: torch.Tensor
    bi: torch.Tensor

    # Candidate gate
    wc: torch.Tensor
    bc: torch.Tensor

    # Output gate
    wo: torch.Tensor
    bo: torch.Tensor

    # Final output layer
    wv: torch.Tensor
    bv: torch.Tensor

    def __init__(
        self,
        num_embeddings: int,
        embeddings_dim: int,
        hidden_size: int,
        output_size: int,
        padding_idx: Optional[int] = None,
        max_norm: Optional[float] = None,
        norm_type: float = 2.0,
        scale_grad_by_freq: bool = False,
        sparse: bool = False,
        **kwargs: Any,
    ) -> None:
        """
        Initializes the Bayesian LSTM layer.

        Args:
            num_embeddings: Size of the embedding dictionary.
            embeddings_dim: Dimensionality of each embedding vector.
            hidden_size: Number of hidden units in the LSTM.
            output_size: Size of the final output.
            padding_idx: Index to ignore in embeddings.
            max_norm: Maximum norm for embedding vectors.
            norm_type: Norm type used for max_norm.
            scale_grad_by_freq: Scale gradient by inverse frequency.
            sparse: Use sparse embedding updates.
            **kwargs: Extra arguments passed to the base class.

        Returns:
            None.

        Notes:
            Gaussian distributions are used by default if none are
            provided.
        """

        super().__init__(**kwargs)

        self.num_embeddings = num_embeddings
        self.embeddings_dim = embeddings_dim
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.padding_idx = padding_idx
        self.max_norm = max_norm
        self.norm_type = norm_type
        self.scale_grad_by_freq = scale_grad_by_freq
        self.sparse = sparse

        # Define the Embedding layer
        self.embedding = Embedding(
            num_embeddings=self.num_embeddings,
            embeddings_dim=self.embeddings_dim,
            padding_idx=self.padding_idx,
            max_norm=self.max_norm,
            norm_type=self.norm_type,
            scale_grad_by_freq=self.scale_grad_by_freq,
            sparse=self.sparse,
        )

        # Initialize weights
        # Forget gate
        self.wf_distribution = GaussianDistribution(
            (self.hidden_size, self.embeddings_dim + self.hidden_size)
        )
        self.bf_distribution = GaussianDistribution((self.hidden_size,))

        # Input gate
        self.wi_distribution = GaussianDistribution(
            (self.hidden_size, self.embeddings_dim + self.hidden_size)
        )
        self.bi_distribution = GaussianDistribution((self.hidden_size,))

        # Candidate gate
        self.wc_distribution = GaussianDistribution(
            (self.hidden_size, self.embeddings_dim + self.hidden_size)
        )
        self.bc_distribution = GaussianDistribution((self.hidden_size,))

        # Output gate
        self.wo_distribution = GaussianDistribution(
            (self.hidden_size, self.embeddings_dim + self.hidden_size)
        )
        self.bo_distribution = GaussianDistribution((self.hidden_size,))

        # Final gate
        self.wv_distribution = GaussianDistribution(
            (self.output_size, self.hidden_size)
        )
        self.bv_distribution = GaussianDistribution((self.output_size,))

        # Sample initial weights and register buffers
        # Forget gate
        wf = self.wf_distribution.sample()
        bf = self.bf_distribution.sample()
        self.register_buffer("wf", wf)
        self.register_buffer("bf", bf)

        # Input gate
        wi = self.wi_distribution.sample()
        bi = self.bi_distribution.sample()
        self.register_buffer("wi", wi)
        self.register_buffer("bi", bi)

        # Candidate gate
        wc = self.wc_distribution.sample()
        bc = self.bc_distribution.sample()
        self.register_buffer("wc", wc)
        self.register_buffer("bc", bc)

        # Output gate
        wo = self.wo_distribution.sample()
        bo = self.bo_distribution.sample()
        self.register_buffer("wo", wo)
        self.register_buffer("bo", bo)

        # Final output layer
        wv = self.wv_distribution.sample()
        bv = self.bv_distribution.sample()
        self.register_buffer("wv", wv)
        self.register_buffer("bv", bv)

    @torch.jit.export
    def freeze(self) -> None:
        """
        Freeze the module's parameters to stop gradient computation.
        If weights or biases are not sampled yet, they are sampled first.
        Once frozen, parameters are not resampled or updated.

        Returns:
            None.
        """

        # Set indicator
        self.frozen = True

        # Freeze embedding layer
        self.embedding.freeze()

        # Forget gate
        if self.wf is None:
            self.wf = self.wf_distribution.sample()
        if self.bf is None:
            self.bf = self.bf_distribution.sample()
        self.wf = self.wf.detach()
        self.bf = self.bf.detach()

        # Input gate
        if self.wi is None:
            self.wi = self.wi_distribution.sample()
        if self.bi is None:
            self.bi = self.bi_distribution.sample()
        self.wi = self.wi.detach()
        self.bi = self.bi.detach()

        # Candidate gate
        if self.wc is None:
            self.wc = self.wc_distribution.sample()
        if self.bc is None:
            self.bc = self.bc_distribution.sample()
        self.wc = self.wc.detach()
        self.bc = self.bc.detach()

        # Output gate
        if self.wo is None:
            self.wo = self.wo_distribution.sample()
        if self.bo is None:
            self.bo = self.bo_distribution.sample()
        self.wo = self.wo.detach()
        self.bo = self.bo.detach()

        # Final output layer
        if self.wv is None:
            self.wv = self.wv_distribution.sample()
        if self.bv is None:
            self.bv = self.bv_distribution.sample()
        self.wv = self.wv.detach()
        self.bv = self.bv.detach()

    @torch.jit.export
    def kl_cost(self) -> tuple[torch.Tensor, int]:
        """
        Compute the KL divergence cost for all Bayesian parameters.

        Returns:
            tuple[torch.Tensor, int]: A tuple containing the KL
                divergence cost and the total number of parameters in
                the layer.
        """

        # Compute log probs for each pair of weights and bias
        # Forget gate
        log_probs_f: torch.Tensor = self.wf_distribution.log_prob(
            self.wf
        ) + self.bf_distribution.log_prob(self.bf)
        # Input gate
        log_probs_i: torch.Tensor = self.wi_distribution.log_prob(
            self.wi
        ) + self.bi_distribution.log_prob(self.bi)
        # Candidate gate
        log_probs_c: torch.Tensor = self.wc_distribution.log_prob(
            self.wc
        ) + self.bc_distribution.log_prob(self.bc)
        # Output gate
        log_probs_o: torch.Tensor = self.wo_distribution.log_prob(
            self.wo
        ) + self.bo_distribution.log_prob(self.bo)
        # Final output layer
        log_probs_v: torch.Tensor = self.wv_distribution.log_prob(
            self.wv
        ) + self.bv_distribution.log_prob(self.bv)

        # Compute the total loss
        log_probs = log_probs_f + log_probs_i + log_probs_c + log_probs_o + log_probs_v

        # Compute number of parameters
        num_params: int = (
            self.wf_distribution.num_params()
            + self.bf_distribution.num_params()
            + self.wi_distribution.num_params()
            + self.bi_distribution.num_params()
            + self.wc_distribution.num_params()
            + self.bc_distribution.num_params()
            + self.wo_distribution.num_params()
            + self.bo_distribution.num_params()
            + self.wv_distribution.num_params()
            + self.bv_distribution.num_params()
        )

        return log_probs, num_params

    def forward(
        self,
        inputs: torch.Tensor,
        init_states: Optional[tuple[torch.Tensor, torch.Tensor]] = None,
    ) -> tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]]:
        """
        Performs a forward pass through the Bayesian LSTM layer.
        If the layer is not frozen, it samples weights and bias
        from their respective distributions. If the layer is frozen
        and the weights or bias are not initialized, it also performs
        sampling.

        Args:
            inputs: Input tensor to the layer with shape [batch,
                input channels, input width, input height].

        Returns:
            Output tensor after passing through the layer with shape
                [batch, output channels, output width, output height].

        Raises:
            ValueError: If the layer is frozen but weights are
                undefined.
        """

        # Sample weights if not frozen
        if not self.frozen:
            self.wf = self.wf_distribution.sample()
            self.bf = self.bf_distribution.sample()
            self.wi = self.wi_distribution.sample()
            self.bi = self.bi_distribution.sample()
            self.wc = self.wc_distribution.sample()
            self.bc = self.bc_distribution.sample()
            self.wo = self.wo_distribution.sample()
            self.bo = self.bo_distribution.sample()
            self.wv = self.wv_distribution.sample()
            self.bv = self.bv_distribution.sample()
        elif any(
            p is None
            for p in [
                self.wf,
                self.bf,
                self.wi,
                self.bi,
                self.wc,
                self.bc,
                self.wo,
                self.bo,
                self.wv,
                self.bv,
            ]
        ):
            raise ValueError(
                "Module has been frozen with undefined weights and/or bias."
            )

        # Apply embedding layer to input indices
        inputs = inputs.squeeze(dim=-1)
        inputs = self.embedding(inputs)
        batch_size, seq_len, _ = inputs.size()

        # Initialize h_t and c_t if init_states is None
        if init_states is None:
            device = inputs.device
            h_t = torch.zeros(batch_size, self.hidden_size, device=device)
            c_t = torch.zeros(batch_size, self.hidden_size, device=device)
        else:
            h_t, c_t = init_states[0], init_states[1]

        for t in range(seq_len):
            # Shape: (batch_size, embedding_dim)
            x_t = inputs[:, t, :]

            # Concatenate input and hidden state
            # Shape: (batch_size, embedding_dim + hidden_size)
            z_t = torch.cat([x_t, h_t], dim=1)

            # Forget gate
            ft = torch.sigmoid(z_t @ self.wf.t() + self.bf)

            # Input gate
            it = torch.sigmoid(z_t @ self.wi.t() + self.bi)

            # Candidate cell state
            can = torch.tanh(z_t @ self.wc.t() + self.bc)

            # Output gate
            ot = torch.sigmoid(z_t @ self.wo.t() + self.bo)

            # Update cell state
            c_t = c_t * ft + can * it

            # Update hidden state
            h_t = ot * torch.tanh(c_t)

        # Compute final output
        y_t = h_t @ self.wv.t() + self.bv

        return y_t, (h_t, c_t)

6.6.1 `init(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)`

Initializes the Bayesian LSTM layer.

Parameters:

Name	Type	Description	Default
`num_embeddings`	`int`	Size of the embedding dictionary.	required
`embeddings_dim`	`int`	Dimensionality of each embedding vector.	required
`hidden_size`	`int`	Number of hidden units in the LSTM.	required
`output_size`	`int`	Size of the final output.	required
`padding_idx`	`Optional[int]`	Index to ignore in embeddings.	`None`
`max_norm`	`Optional[float]`	Maximum norm for embedding vectors.	`None`
`norm_type`	`float`	Norm type used for max_norm.	`2.0`
`scale_grad_by_freq`	`bool`	Scale gradient by inverse frequency.	`False`
`sparse`	`bool`	Use sparse embedding updates.	`False`
`**kwargs`	`Any`	Extra arguments passed to the base class.	`{}`

Returns:

Type	Description
`None`	None.

Notes

Gaussian distributions are used by default if none are provided.

Source code in illia/nn/torch/lstm.py

def __init__(
    self,
    num_embeddings: int,
    embeddings_dim: int,
    hidden_size: int,
    output_size: int,
    padding_idx: Optional[int] = None,
    max_norm: Optional[float] = None,
    norm_type: float = 2.0,
    scale_grad_by_freq: bool = False,
    sparse: bool = False,
    **kwargs: Any,
) -> None:
    """
    Initializes the Bayesian LSTM layer.

    Args:
        num_embeddings: Size of the embedding dictionary.
        embeddings_dim: Dimensionality of each embedding vector.
        hidden_size: Number of hidden units in the LSTM.
        output_size: Size of the final output.
        padding_idx: Index to ignore in embeddings.
        max_norm: Maximum norm for embedding vectors.
        norm_type: Norm type used for max_norm.
        scale_grad_by_freq: Scale gradient by inverse frequency.
        sparse: Use sparse embedding updates.
        **kwargs: Extra arguments passed to the base class.

    Returns:
        None.

    Notes:
        Gaussian distributions are used by default if none are
        provided.
    """

    super().__init__(**kwargs)

    self.num_embeddings = num_embeddings
    self.embeddings_dim = embeddings_dim
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.padding_idx = padding_idx
    self.max_norm = max_norm
    self.norm_type = norm_type
    self.scale_grad_by_freq = scale_grad_by_freq
    self.sparse = sparse

    # Define the Embedding layer
    self.embedding = Embedding(
        num_embeddings=self.num_embeddings,
        embeddings_dim=self.embeddings_dim,
        padding_idx=self.padding_idx,
        max_norm=self.max_norm,
        norm_type=self.norm_type,
        scale_grad_by_freq=self.scale_grad_by_freq,
        sparse=self.sparse,
    )

    # Initialize weights
    # Forget gate
    self.wf_distribution = GaussianDistribution(
        (self.hidden_size, self.embeddings_dim + self.hidden_size)
    )
    self.bf_distribution = GaussianDistribution((self.hidden_size,))

    # Input gate
    self.wi_distribution = GaussianDistribution(
        (self.hidden_size, self.embeddings_dim + self.hidden_size)
    )
    self.bi_distribution = GaussianDistribution((self.hidden_size,))

    # Candidate gate
    self.wc_distribution = GaussianDistribution(
        (self.hidden_size, self.embeddings_dim + self.hidden_size)
    )
    self.bc_distribution = GaussianDistribution((self.hidden_size,))

    # Output gate
    self.wo_distribution = GaussianDistribution(
        (self.hidden_size, self.embeddings_dim + self.hidden_size)
    )
    self.bo_distribution = GaussianDistribution((self.hidden_size,))

    # Final gate
    self.wv_distribution = GaussianDistribution(
        (self.output_size, self.hidden_size)
    )
    self.bv_distribution = GaussianDistribution((self.output_size,))

    # Sample initial weights and register buffers
    # Forget gate
    wf = self.wf_distribution.sample()
    bf = self.bf_distribution.sample()
    self.register_buffer("wf", wf)
    self.register_buffer("bf", bf)

    # Input gate
    wi = self.wi_distribution.sample()
    bi = self.bi_distribution.sample()
    self.register_buffer("wi", wi)
    self.register_buffer("bi", bi)

    # Candidate gate
    wc = self.wc_distribution.sample()
    bc = self.bc_distribution.sample()
    self.register_buffer("wc", wc)
    self.register_buffer("bc", bc)

    # Output gate
    wo = self.wo_distribution.sample()
    bo = self.bo_distribution.sample()
    self.register_buffer("wo", wo)
    self.register_buffer("bo", bo)

    # Final output layer
    wv = self.wv_distribution.sample()
    bv = self.bv_distribution.sample()
    self.register_buffer("wv", wv)
    self.register_buffer("bv", bv)

6.6.2 `forward(inputs, init_states=None)`

Performs a forward pass through the Bayesian LSTM layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	Input tensor to the layer with shape [batch, input channels, input width, input height].	required

Returns:

Type	Description
`tuple[Tensor, tuple[Tensor, Tensor]]`	Output tensor after passing through the layer with shape [batch, output channels, output width, output height].

Raises:

Type	Description
`ValueError`	If the layer is frozen but weights are undefined.

Source code in illia/nn/torch/lstm.py

def forward(
    self,
    inputs: torch.Tensor,
    init_states: Optional[tuple[torch.Tensor, torch.Tensor]] = None,
) -> tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]]:
    """
    Performs a forward pass through the Bayesian LSTM layer.
    If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.

    Args:
        inputs: Input tensor to the layer with shape [batch,
            input channels, input width, input height].

    Returns:
        Output tensor after passing through the layer with shape
            [batch, output channels, output width, output height].

    Raises:
        ValueError: If the layer is frozen but weights are
            undefined.
    """

    # Sample weights if not frozen
    if not self.frozen:
        self.wf = self.wf_distribution.sample()
        self.bf = self.bf_distribution.sample()
        self.wi = self.wi_distribution.sample()
        self.bi = self.bi_distribution.sample()
        self.wc = self.wc_distribution.sample()
        self.bc = self.bc_distribution.sample()
        self.wo = self.wo_distribution.sample()
        self.bo = self.bo_distribution.sample()
        self.wv = self.wv_distribution.sample()
        self.bv = self.bv_distribution.sample()
    elif any(
        p is None
        for p in [
            self.wf,
            self.bf,
            self.wi,
            self.bi,
            self.wc,
            self.bc,
            self.wo,
            self.bo,
            self.wv,
            self.bv,
        ]
    ):
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Apply embedding layer to input indices
    inputs = inputs.squeeze(dim=-1)
    inputs = self.embedding(inputs)
    batch_size, seq_len, _ = inputs.size()

    # Initialize h_t and c_t if init_states is None
    if init_states is None:
        device = inputs.device
        h_t = torch.zeros(batch_size, self.hidden_size, device=device)
        c_t = torch.zeros(batch_size, self.hidden_size, device=device)
    else:
        h_t, c_t = init_states[0], init_states[1]

    for t in range(seq_len):
        # Shape: (batch_size, embedding_dim)
        x_t = inputs[:, t, :]

        # Concatenate input and hidden state
        # Shape: (batch_size, embedding_dim + hidden_size)
        z_t = torch.cat([x_t, h_t], dim=1)

        # Forget gate
        ft = torch.sigmoid(z_t @ self.wf.t() + self.bf)

        # Input gate
        it = torch.sigmoid(z_t @ self.wi.t() + self.bi)

        # Candidate cell state
        can = torch.tanh(z_t @ self.wc.t() + self.bc)

        # Output gate
        ot = torch.sigmoid(z_t @ self.wo.t() + self.bo)

        # Update cell state
        c_t = c_t * ft + can * it

        # Update hidden state
        h_t = ot * torch.tanh(c_t)

    # Compute final output
    y_t = h_t @ self.wv.t() + self.bv

    return y_t, (h_t, c_t)

6.6.3 `freeze()`

Freeze the module's parameters to stop gradient computation. If weights or biases are not sampled yet, they are sampled first. Once frozen, parameters are not resampled or updated.

Returns:

Type	Description
`None`	None.

Source code in illia/nn/torch/lstm.py

@torch.jit.export
def freeze(self) -> None:
    """
    Freeze the module's parameters to stop gradient computation.
    If weights or biases are not sampled yet, they are sampled first.
    Once frozen, parameters are not resampled or updated.

    Returns:
        None.
    """

    # Set indicator
    self.frozen = True

    # Freeze embedding layer
    self.embedding.freeze()

    # Forget gate
    if self.wf is None:
        self.wf = self.wf_distribution.sample()
    if self.bf is None:
        self.bf = self.bf_distribution.sample()
    self.wf = self.wf.detach()
    self.bf = self.bf.detach()

    # Input gate
    if self.wi is None:
        self.wi = self.wi_distribution.sample()
    if self.bi is None:
        self.bi = self.bi_distribution.sample()
    self.wi = self.wi.detach()
    self.bi = self.bi.detach()

    # Candidate gate
    if self.wc is None:
        self.wc = self.wc_distribution.sample()
    if self.bc is None:
        self.bc = self.bc_distribution.sample()
    self.wc = self.wc.detach()
    self.bc = self.bc.detach()

    # Output gate
    if self.wo is None:
        self.wo = self.wo_distribution.sample()
    if self.bo is None:
        self.bo = self.bo_distribution.sample()
    self.wo = self.wo.detach()
    self.bo = self.bo.detach()

    # Final output layer
    if self.wv is None:
        self.wv = self.wv_distribution.sample()
    if self.bv is None:
        self.bv = self.bv_distribution.sample()
    self.wv = self.wv.detach()
    self.bv = self.bv.detach()

6.6.4 `kl_cost()`

Compute the KL divergence cost for all Bayesian parameters.

Returns:

Type	Description
`tuple[Tensor, int]`	tuple[torch.Tensor, int]: A tuple containing the KL divergence cost and the total number of parameters in the layer.

Source code in illia/nn/torch/lstm.py

@torch.jit.export
def kl_cost(self) -> tuple[torch.Tensor, int]:
    """
    Compute the KL divergence cost for all Bayesian parameters.

    Returns:
        tuple[torch.Tensor, int]: A tuple containing the KL
            divergence cost and the total number of parameters in
            the layer.
    """

    # Compute log probs for each pair of weights and bias
    # Forget gate
    log_probs_f: torch.Tensor = self.wf_distribution.log_prob(
        self.wf
    ) + self.bf_distribution.log_prob(self.bf)
    # Input gate
    log_probs_i: torch.Tensor = self.wi_distribution.log_prob(
        self.wi
    ) + self.bi_distribution.log_prob(self.bi)
    # Candidate gate
    log_probs_c: torch.Tensor = self.wc_distribution.log_prob(
        self.wc
    ) + self.bc_distribution.log_prob(self.bc)
    # Output gate
    log_probs_o: torch.Tensor = self.wo_distribution.log_prob(
        self.wo
    ) + self.bo_distribution.log_prob(self.bo)
    # Final output layer
    log_probs_v: torch.Tensor = self.wv_distribution.log_prob(
        self.wv
    ) + self.bv_distribution.log_prob(self.bv)

    # Compute the total loss
    log_probs = log_probs_f + log_probs_i + log_probs_c + log_probs_o + log_probs_v

    # Compute number of parameters
    num_params: int = (
        self.wf_distribution.num_params()
        + self.bf_distribution.num_params()
        + self.wi_distribution.num_params()
        + self.bi_distribution.num_params()
        + self.wc_distribution.num_params()
        + self.bc_distribution.num_params()
        + self.wo_distribution.num_params()
        + self.bo_distribution.num_params()
        + self.wv_distribution.num_params()
        + self.bv_distribution.num_params()
    )

    return log_probs, num_params

6. Neural Network Layers

6.1 BayesianModule

6.1.1 __init__(**kwargs)

6.1.2 freeze()

6.1.3 kl_cost()

6.1.4 unfreeze()

6.2 Conv1d

6.2.1 __init__(input_channels, output_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

6.2.2 forward(inputs)

6.2.3 freeze()

6.2.4 kl_cost()

6.3 Conv2d

6.3.1 __init__(input_channels, output_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

6.3.2 forward(inputs)

6.3.3 freeze()

6.3.4 kl_cost()

6.4 Embedding

6.4.1 __init__(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)

6.4.2 forward(inputs)

6.4.3 freeze()

6.4.4 kl_cost()

6.5 Linear

6.5.1 __init__(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

6.5.2 forward(inputs)

6.5.3 freeze()

6.5.4 kl_cost()

6.6 LSTM

6.6.1 __init__(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)

6.6.2 forward(inputs, init_states=None)

6.6.3 freeze()

6.6.4 kl_cost()

6.1 `BayesianModule`

6.1.1 `init(**kwargs)`

6.1.2 `freeze()`

6.1.3 `kl_cost()`

6.1.4 `unfreeze()`

6.2 `Conv1d`

6.2.1 `init(input_channels, output_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

6.2.2 `forward(inputs)`

6.2.3 `freeze()`

6.2.4 `kl_cost()`

6.3 `Conv2d`

6.3.1 `init(input_channels, output_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

6.3.2 `forward(inputs)`

6.3.3 `freeze()`

6.3.4 `kl_cost()`

6.4 `Embedding`

6.4.1 `init(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)`

6.4.2 `forward(inputs)`

6.4.3 `freeze()`

6.4.4 `kl_cost()`

6.5 `Linear`

6.5.1 `init(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)`

6.5.2 `forward(inputs)`

6.5.3 `freeze()`

6.5.4 `kl_cost()`

6.6 `LSTM`

6.6.1 `init(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)`

6.6.2 `forward(inputs, init_states=None)`

6.6.3 `freeze()`

6.6.4 `kl_cost()`