Skip to content

9. Neural Network

This module contains the code for the bayesian Conv1d.

9.1 Conv1d(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=1, groups=1, data_format='NWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

This class is the bayesian implementation of the Conv1d class.

Initializes a Bayesian Conv1d layer.

Parameters:

Name Type Description Default
input_channels int

The number of channels in the input image.

required
output_channels int

The number of channels produced by the convolution.

required
kernel_size int

The size of the convolving kernel.

required
stride int

The stride of the convolution.

1
padding str

The padding added to all four sides of the input. Can be 'VALID' or 'SAME'.

'VALID'
dilation int

The spacing between kernel elements.

1
groups int

The number of blocked connections from input channels to output channels.

1
data_format Optional[str]

The data format for the convolution, either 'NWC' or 'NCW'.

'NWC'
weights_distribution Optional[GaussianDistribution]

The Gaussian distribution for the weights, if applicable.

None
bias_distribution Optional[GaussianDistribution]

The Gaussian distribution for the bias, if applicable.

None
use_bias bool

Whether to include a bias term.

True
**kwargs Any

Additional keyword arguments.

{}
Source code in illia/nn/tf/conv1d.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def __init__(
    self,
    input_channels: int,
    output_channels: int,
    kernel_size: int,
    stride: int = 1,
    padding: str = "VALID",
    dilation: int = 1,
    groups: int = 1,
    data_format: Optional[str] = "NWC",
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian Conv1d layer.

    Args:
        input_channels: The number of channels in the input image.
        output_channels: The number of channels produced by the
            convolution.
        kernel_size: The size of the convolving kernel.
        stride: The stride of the convolution.
        padding: The padding added to all four sides of the input.
            Can be 'VALID' or 'SAME'.
        dilation: The spacing between kernel elements.
        groups: The number of blocked connections from input
            channels to output channels.
        data_format: The data format for the convolution, either
            'NWC' or 'NCW'.
        weights_distribution: The Gaussian distribution for the
            weights, if applicable.
        bias_distribution: The Gaussian distribution for the bias,
            if applicable.
        use_bias: Whether to include a bias term.
        **kwargs: Additional keyword arguments.
    """

    # Call super class constructor
    super().__init__(**kwargs)

    # Check data format
    self._check_params(kernel_size, groups, stride, dilation, data_format)

    # Set attributes
    self.input_channels = input_channels
    self.output_channels = output_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.groups = groups
    self.use_bias = use_bias

    # Adjust the weights distribution based on the channel format
    self.data_format = (
        "NWC" if data_format is None or data_format == "NWC" else "NCW"
    )

    # Get the weights distribution shape, needs to be channel last
    self._weights_distribution_shape = (
        input_channels // groups,
        kernel_size,
        output_channels,
    )

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            self._weights_distribution_shape
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((output_channels,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None

9.1.1 call(inputs)

Performs a forward pass through the Bayesian Convolution 1D layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.str

Parameters:

Name Type Description Default
inputs Tensor

Input tensor to the layer.

required

Returns:

Type Description
Tensor

Output tensor after passing through the layer.

Source code in illia/nn/tf/conv1d.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs a forward pass through the Bayesian Convolution 1D
    layer. If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.str

    Args:
        inputs: Input tensor to the layer.

    Returns:
        Output tensor after passing through the layer.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.b is None and self.use_bias and self.bias_distribution:
            self.b = self.bias_distribution.sample()

    elif self.w is None or self.b is None:
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    outputs: tf.Tensor = self._conv1d(
        inputs=inputs,
        weight=self.w,
        stride=self.stride,
        padding=self.padding,
        data_format=self.data_format,
        dilation=self.dilation,
    )

    # Add bias only if using bias
    if self.use_bias is not None:
        outputs = tf.nn.bias_add(
            value=outputs,
            bias=self.b,
            data_format="N..C" if self.data_format == "NWC" else "NC..",
        )

    return outputs

9.1.2 freeze()

Freezes the current module and all submodules that are instances of BayesianModule. Sets the frozen state to True.

Source code in illia/nn/tf/conv1d.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
def freeze(self) -> None:
    """
    Freezes the current module and all submodules that are instances
    of BayesianModule. Sets the frozen state to True.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Sample bias is they are undefined
    if self.use_bias and self.bias_distribution:
        self.b = self.bias_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)
    if self.use_bias:
        self.b = tf.stop_gradient(self.b)

9.1.3 kl_cost()

Computes the Kullback-Leibler (KL) divergence cost for the layer's weights and bias.

Returns:

Type Description
Tensor

Tuple containing KL divergence cost and total number of

int

parameters.

Source code in illia/nn/tf/conv1d.py
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Computes the Kullback-Leibler (KL) divergence cost for the
    layer's weights and bias.

    Returns:
        Tuple containing KL divergence cost and total number of
        parameters.
    """

    # Compute log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Add bias log probs only if using bias
    if self.use_bias and self.bias_distribution:
        log_probs += self.bias_distribution.log_prob(self.b)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params
    if self.use_bias and self.bias_distribution:
        num_params += self.bias_distribution.num_params

    return log_probs, num_params

This module contains the code for the bayesian Conv2d.

9.2 Conv2d(input_channels, output_channels, kernel_size, stride=1, padding='VALID', dilation=None, groups=1, data_format='NHWC', weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

This class is the bayesian implementation of the Conv2d class.

Initializes a Bayesian Conv2d layer.

Parameters:

Name Type Description Default
input_channels int

The number of channels in the input image.

required
output_channels int

The number of channels produced by the convolution.

required
kernel_size int | list[int]

The size of the convolving kernel.

required
stride int | list[int]

The stride of the convolution.

1
padding str | list[int]

The padding added to all four sides of the input. Can be 'VALID' or 'SAME'.

'VALID'
dilation Optional[int | list[int]]

The spacing between kernel elements.

None
groups int

The number of blocked connections from input channels to output channels.

1
data_format Optional[str]

The data format for the convolution, either 'NHWC' or 'NCHW'.

'NHWC'
weights_distribution Optional[GaussianDistribution]

The Gaussian distribution for the weights, if applicable.

None
bias_distribution Optional[GaussianDistribution]

The Gaussian distribution for the bias, if applicable.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in illia/nn/tf/conv2d.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def __init__(
    self,
    input_channels: int,
    output_channels: int,
    kernel_size: int | list[int],
    stride: int | list[int] = 1,
    padding: str | list[int] = "VALID",
    dilation: Optional[int | list[int]] = None,
    groups: int = 1,
    data_format: Optional[str] = "NHWC",
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    Initializes a Bayesian Conv2d layer.

    Args:
        input_channels: The number of channels in the input image.
        output_channels: The number of channels produced by the
            convolution.
        kernel_size: The size of the convolving kernel.
        stride: The stride of the convolution.
        padding: The padding added to all four sides of the input.
            Can be 'VALID' or 'SAME'.
        dilation: The spacing between kernel elements.
        groups: The number of blocked connections from input channels
            to output channels.
        data_format: The data format for the convolution, either
            'NHWC' or 'NCHW'.
        weights_distribution: The Gaussian distribution for the
            weights, if applicable.
        bias_distribution: The Gaussian distribution for the bias,
            if applicable.
        **kwargs: Additional keyword arguments.
    """

    # Call super class constructor
    super().__init__(**kwargs)

    # Check data format
    self._check_params(kernel_size, groups, stride, dilation, data_format)

    # Set attributes
    self.input_channels = input_channels
    self.output_channels = output_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.groups = groups
    self.use_bias = use_bias

    # Check if kernel_size is a list and unpack it if necessary
    kernel_shape = (
        kernel_size if isinstance(kernel_size, list) else [kernel_size, kernel_size]
    )

    # Adjust the weights distribution based on the channel format
    self.data_format = (
        "NHWC" if data_format is None or data_format == "NHWC" else "NCHW"
    )

    # Set the weights distribution shape
    self._weights_distribution_shape = (
        input_channels // groups,
        *kernel_shape,
        output_channels,
    )

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            shape=self._weights_distribution_shape
        )
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((output_channels,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None

9.2.1 call(inputs)

Performs a forward pass through the Bayesian Convolution 2D layer. If the layer is not frozen, it samples weights and bias from their respective distributions. If the layer is frozen and the weights or bias are not initialized, it also performs sampling.

Parameters:

Name Type Description Default
inputs Tensor

Input tensor to the layer. Dimensions: [batch, input channels, input width, input height].

required

Returns:

Type Description
Tensor

Output tensor after passing through the layer. Dimensions: [batch, output channels, output width, output height].

Source code in illia/nn/tf/conv2d.py
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs a forward pass through the Bayesian Convolution 2D
    layer. If the layer is not frozen, it samples weights and bias
    from their respective distributions. If the layer is frozen
    and the weights or bias are not initialized, it also performs
    sampling.

    Args:
        inputs: Input tensor to the layer. Dimensions: [batch,
            input channels, input width, input height].

    Returns:
        Output tensor after passing through the layer. Dimensions:
            [batch, output channels, output width, output height].
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution:
            self.b = self.bias_distribution.sample()
    elif self.w is None or self.b is None:
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    outputs: tf.Tensor = self._conv2d(
        inputs=inputs,
        weight=self.w,
        stride=self.stride,
        padding=self.padding,
        data_format=self.data_format,
        dilation=self.dilation,
    )

    # Add bias only if using bias
    if self.use_bias is not None:
        outputs = tf.nn.bias_add(
            value=outputs,
            bias=self.b,
            data_format="N..C" if self.data_format == "NHWC" else "NC..",
        )

    return outputs

9.2.2 freeze()

Freezes the current module and all submodules that are instances of BayesianModule. Sets the frozen state to True.

Source code in illia/nn/tf/conv2d.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
def freeze(self) -> None:
    """
    Freezes the current module and all submodules that are instances
    of BayesianModule. Sets the frozen state to True.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Sample bias is they are undefined
    if self.use_bias is None and self.bias_distribution:
        self.b = self.bias_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)
    if self.use_bias:
        self.b = tf.stop_gradient(self.b)

9.2.3 kl_cost()

Computes the Kullback-Leibler (KL) divergence cost for the layer's weights and bias.

Returns:

Type Description
Tensor

Tuple containing KL divergence cost and total number of

int

parameters.

Source code in illia/nn/tf/conv2d.py
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Computes the Kullback-Leibler (KL) divergence cost for the
    layer's weights and bias.

    Returns:
        Tuple containing KL divergence cost and total number of
        parameters.
    """

    # Compute log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Add bias log probs only if using bias
    if self.use_bias and self.bias_distribution:
        log_probs += self.bias_distribution.log_prob(self.b)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params
    if self.use_bias and self.bias_distribution:
        num_params += self.bias_distribution.num_params

    return log_probs, num_params

This module contains the code for Embedding Bayesian layer.

9.3 Embedding(num_embeddings, embeddings_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, weights_distribution=None, **kwargs)

This class is the bayesian implementation of the Embedding class.

This method is the constructor of the embedding class.

Parameters:

Name Type Description Default
num_embeddings int

Size of the dictionary of embeddings.

required
embeddings_dim int

The size of each embedding vector.

required
padding_idx Optional[int]

If specified, the entries at padding_idx do not contribute to the gradient.

None
max_norm Optional[float]

If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.

None
norm_type float

The p of the p-norm to compute for the max_norm option.

2.0
scale_grad_by_freq bool

If given, this will scale gradients by the inverse of frequency of the words in the mini-batch.

False
sparse bool

If True, gradient w.r.t. weight matrix will be a sparse tensor.

False
weights_distribution Optional[GaussianDistribution]

The Gaussian distribution for the weights, if applicable.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in illia/nn/tf/embedding.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def __init__(
    self,
    num_embeddings: int,
    embeddings_dim: int,
    padding_idx: Optional[int] = None,
    max_norm: Optional[float] = None,
    norm_type: float = 2.0,
    scale_grad_by_freq: bool = False,
    sparse: bool = False,
    weights_distribution: Optional[GaussianDistribution] = None,
    **kwargs: Any,
) -> None:
    """
    This method is the constructor of the embedding class.

    Args:
        num_embeddings: Size of the dictionary of embeddings.
        embeddings_dim: The size of each embedding vector.
        padding_idx: If specified, the entries at padding_idx do
            not contribute to the gradient.
        max_norm: If given, each embedding vector with norm larger
            than max_norm is renormalized to have norm max_norm.
        norm_type: The p of the p-norm to compute for the max_norm
            option.
        scale_grad_by_freq: If given, this will scale gradients by
            the inverse of frequency of the words in the
            mini-batch.
        sparse: If True, gradient w.r.t. weight matrix will be a
            sparse tensor.
        weights_distribution: The Gaussian distribution for the
            weights, if applicable.
        **kwargs: Additional keyword arguments.
    """

    # Call super class constructor
    super().__init__(**kwargs)

    # Set atributtes
    self.num_embeddings = num_embeddings
    self.embeddings_dim = embeddings_dim
    self.padding_idx = padding_idx
    self.max_norm = max_norm
    self.norm_type = norm_type
    self.scale_grad_by_freq = scale_grad_by_freq
    self.sparse = sparse

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution(
            (num_embeddings, embeddings_dim)
        )
    else:
        self.weights_distribution = weights_distribution

9.3.1 call(inputs)

Performs a forward pass through the Bayesian Embedding layer.

Samples weights from their posterior distributions if the layer is not frozen. If frozen and not initialized, samples them once.

Parameters:

Name Type Description Default
inputs Tensor

input tensor. Dimensions: [batch, *].

required

Raises:

Type Description
ValueError

Module has been frozen with undefined weights.

Returns:

Type Description
Tensor

Output tensor after linear transformation.

Source code in illia/nn/tf/embedding.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs a forward pass through the Bayesian Embedding layer.

    Samples weights from their posterior distributions if
    the layer is not frozen. If frozen and not initialized, samples
    them once.

    Args:
        inputs: input tensor. Dimensions: [batch, *].

    Raises:
        ValueError: Module has been frozen with undefined weights.

    Returns:
        Output tensor after linear transformation.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()
    elif self.w is None:
        raise ValueError("Module has been frozen with undefined weights.")

    # Compute outputs
    outputs: tf.Tensor = self._embedding(
        inputs,
        self.w,
        self.padding_idx,
        self.max_norm,
        self.norm_type,
        self.sparse,
    )

    return outputs

9.3.2 freeze()

Freezes the current module and all submodules that are instances of BayesianModule. Sets the frozen state to True.

Source code in illia/nn/tf/embedding.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def freeze(self) -> None:
    """
    Freezes the current module and all submodules that are instances
    of BayesianModule. Sets the frozen state to True.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Stop gradient computation
    self.w = tf.stop_gradient(self.w)

9.3.3 kl_cost()

Computes the Kullback-Leibler (KL) divergence cost for the layer's weights.

Returns:

Type Description
Tensor

Tuple containing KL divergence cost and total number of

int

parameters.

Source code in illia/nn/tf/embedding.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Computes the Kullback-Leibler (KL) divergence cost for the
    layer's weights.

    Returns:
        Tuple containing KL divergence cost and total number of
        parameters.
    """

    # Get log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Get number of parameters
    num_params: int = self.weights_distribution.num_params

    return log_probs, num_params

This module contains the code for Linear Bayesian layer.

9.4 Linear(input_size, output_size, weights_distribution=None, bias_distribution=None, use_bias=True, **kwargs)

This class is the bayesian implementation of the Linear class.

This is the constructor of the Linear class.

Parameters:

Name Type Description Default
input_size int

Input size of the linear layer.

required
output_size int

Output size of the linear layer.

required
weights_distribution Optional[GaussianDistribution]

The Gaussian distribution for the weights, if applicable.

None
bias_distribution Optional[GaussianDistribution]

The Gaussian distribution for the bias, if applicable.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in illia/nn/tf/linear.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init__(
    self,
    input_size: int,
    output_size: int,
    weights_distribution: Optional[GaussianDistribution] = None,
    bias_distribution: Optional[GaussianDistribution] = None,
    use_bias: bool = True,
    **kwargs: Any,
) -> None:
    """
    This is the constructor of the Linear class.

    Args:
        input_size: Input size of the linear layer.
        output_size: Output size of the linear layer.
        weights_distribution: The Gaussian distribution for the
            weights, if applicable.
        bias_distribution: The Gaussian distribution for the bias,
            if applicable.
        **kwargs: Additional keyword arguments.
    """

    # Call super-class constructor
    super().__init__(**kwargs)

    # Set parameters
    self.input_size = input_size
    self.output_size = output_size
    self.use_bias = use_bias

    # Set weights distribution
    if weights_distribution is None:
        self.weights_distribution = GaussianDistribution((output_size, input_size))
    else:
        self.weights_distribution = weights_distribution

    # Set bias distribution
    if self.use_bias:
        if bias_distribution is None:
            self.bias_distribution = GaussianDistribution((output_size,))
        else:
            self.bias_distribution = bias_distribution
    else:
        self.bias_distribution = None

9.4.1 call(inputs)

Performs a forward pass through the Bayesian Linear layer.

Samples weights and bias from their posterior distributions if the layer is not frozen. If frozen and not initialized, samples them once.

Parameters:

Name Type Description Default
inputs Tensor

input tensor. Dimensions: [batch, *].

required

Raises:

Type Description
ValueError

Module has been frozen with undefined weights.

Returns:

Type Description
Tensor

Output tensor after linear transformation.

Source code in illia/nn/tf/linear.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
def call(self, inputs: tf.Tensor) -> tf.Tensor:
    """
    Performs a forward pass through the Bayesian Linear layer.

    Samples weights and bias from their posterior distributions if
    the layer is not frozen. If frozen and not initialized, samples
    them once.

    Args:
        inputs: input tensor. Dimensions: [batch, *].

    Raises:
        ValueError: Module has been frozen with undefined weights.

    Returns:
        Output tensor after linear transformation.
    """

    # Check if layer is frozen
    if not self.frozen:
        self.w = self.weights_distribution.sample()

        # Sample bias only if using bias
        if self.use_bias and self.bias_distribution:
            self.b = self.bias_distribution.sample()
    elif self.w is None or self.b is None:
        raise ValueError(
            "Module has been frozen with undefined weights and/or bias."
        )

    # Compute outputs
    outputs: tf.Tensor = tf.linalg.matmul(inputs, self.w, transpose_b=True)

    # Add bias only if using bias
    if self.use_bias is not None:
        outputs = tf.nn.bias_add(outputs, self.b)

    return outputs

9.4.2 freeze()

Freezes the current module and all submodules that are instances of BayesianModule. Sets the frozen state to True.

Source code in illia/nn/tf/linear.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def freeze(self) -> None:
    """
    Freezes the current module and all submodules that are instances
    of BayesianModule. Sets the frozen state to True.
    """

    # Set indicator
    self.frozen = True

    # Sample weights if they are undefined
    if self.w is None:
        self.w = self.weights_distribution.sample()

    # Sample bias is they are undefined
    if self.use_bias and self.bias_distribution:
        self.b = self.bias_distribution.sample()

    # Stop gradient computation (more similar to detach) weights and bias
    self.w = tf.stop_gradient(self.w)
    if self.use_bias:
        self.b = tf.stop_gradient(self.b)

9.4.3 kl_cost()

Computes the Kullback-Leibler (KL) divergence cost for the layer's weights and bias.

Returns:

Type Description
Tensor

Tuple containing KL divergence cost and total number of

int

parameters.

Source code in illia/nn/tf/linear.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Computes the Kullback-Leibler (KL) divergence cost for the
    layer's weights and bias.

    Returns:
        Tuple containing KL divergence cost and total number of
        parameters.
    """

    # Compute log probs
    log_probs: tf.Tensor = self.weights_distribution.log_prob(self.w)

    # Add bias log probs only if using bias
    if self.use_bias and self.bias_distribution:
        log_probs += self.bias_distribution.log_prob(self.b)

    # Compute number of parameters
    num_params: int = self.weights_distribution.num_params
    if self.use_bias and self.bias_distribution:
        num_params += self.bias_distribution.num_params

    return log_probs, num_params

This module contains the code for the bayesian LSTM.

9.5 LSTM(num_embeddings, embeddings_dim, hidden_size, output_size, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, **kwargs)

This class is the bayesian implementation of the TensorFlow LSTM layer.

Source code in illia/nn/tf/lstm.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
def __init__(
    self,
    num_embeddings: int,
    embeddings_dim: int,
    hidden_size: int,
    output_size: int,
    padding_idx: Optional[int] = None,
    max_norm: Optional[float] = None,
    norm_type: float = 2.0,
    scale_grad_by_freq: bool = False,
    sparse: bool = False,
    **kwargs: Any,
) -> None:

    # Call super-class constructor
    super().__init__(**kwargs)

    # Set attributes
    self.num_embeddings = num_embeddings
    self.embeddings_dim = embeddings_dim
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.padding_idx = padding_idx
    self.max_norm = max_norm
    self.norm_type = norm_type
    self.scale_grad_by_freq = scale_grad_by_freq
    self.sparse = sparse

    # Define the Embedding layer
    self.embedding = Embedding(
        num_embeddings=self.num_embeddings,
        embeddings_dim=self.embeddings_dim,
        padding_idx=self.padding_idx,
        max_norm=self.max_norm,
        norm_type=self.norm_type,
        scale_grad_by_freq=self.scale_grad_by_freq,
        sparse=self.sparse,
    )

    # Initialize weight distributions
    # Forget gate
    self.wf_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bf_distribution = GaussianDistribution((self.hidden_size,))

    # Input gate
    self.wi_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bi_distribution = GaussianDistribution((self.hidden_size,))

    # Candidate gate
    self.wc_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bc_distribution = GaussianDistribution((self.hidden_size,))

    # Output gate
    self.wo_distribution = GaussianDistribution(
        (self.embeddings_dim + self.hidden_size, self.hidden_size)
    )
    self.bo_distribution = GaussianDistribution((self.hidden_size,))

    # Final output layer
    self.wv_distribution = GaussianDistribution(
        (self.hidden_size, self.output_size)
    )
    self.bv_distribution = GaussianDistribution((self.output_size,))

9.5.1 call(inputs, init_states=None)

Performs a forward pass through the Bayesian LSTM layer. If the layer is not frozen, it samples weights and bias from their respective distributions.

Parameters:

Name Type Description Default
inputs Tensor

Input tensor with token indices. Shape: [batch, seq_len, 1]

required
init_states Optional[tuple[Tensor, Tensor]]

Optional initial hidden and cell states

None

Returns:

Type Description
tuple[Tensor, tuple[Tensor, Tensor]]

Tuple of (output, (hidden_state, cell_state))

Source code in illia/nn/tf/lstm.py
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
def call(
    self,
    inputs: tf.Tensor,
    init_states: Optional[tuple[tf.Tensor, tf.Tensor]] = None,
) -> tuple[tf.Tensor, tuple[tf.Tensor, tf.Tensor]]:
    """
    Performs a forward pass through the Bayesian LSTM layer.
    If the layer is not frozen, it samples weights and bias
    from their respective distributions.

    Args:
        inputs: Input tensor with token indices. Shape: [batch, seq_len, 1]
        init_states: Optional initial hidden and cell states

    Returns:
        Tuple of (output, (hidden_state, cell_state))
    """

    # Sample weights if not frozen
    if not self.frozen:
        self.wf = self.wf_distribution.sample()
        self.bf = self.bf_distribution.sample()
        self.wi = self.wi_distribution.sample()
        self.bi = self.bi_distribution.sample()
        self.wc = self.wc_distribution.sample()
        self.bc = self.bc_distribution.sample()
        self.wo = self.wo_distribution.sample()
        self.bo = self.bo_distribution.sample()
        self.wv = self.wv_distribution.sample()
        self.bv = self.bv_distribution.sample()
    else:
        if any(w is None for w in [self.wf, self.wi, self.wc, self.wo, self.wv]):
            self.wf = self.wf_distribution.sample()
            self.bf = self.bf_distribution.sample()
            self.wi = self.wi_distribution.sample()
            self.bi = self.bi_distribution.sample()
            self.wc = self.wc_distribution.sample()
            self.bc = self.bc_distribution.sample()
            self.wo = self.wo_distribution.sample()
            self.bo = self.bo_distribution.sample()
            self.wv = self.wv_distribution.sample()
            self.bv = self.bv_distribution.sample()

    # Apply embedding layer to input indices
    inputs = tf.squeeze(inputs, axis=-1)
    inputs = self.embedding(inputs)
    batch_size = tf.shape(inputs)[0]
    seq_len = tf.shape(inputs)[1]

    # Initialize h_t and c_t if init_states is None
    if init_states is None:
        h_t = tf.zeros([batch_size, self.hidden_size], dtype=inputs.dtype)
        c_t = tf.zeros([batch_size, self.hidden_size], dtype=inputs.dtype)
    else:
        h_t, c_t = init_states[0], init_states[1]

    # Process sequence
    for t in range(seq_len):
        # Shape: (batch_size, embedding_dim)
        x_t = inputs[:, t, :]

        # Concatenate input and hidden state
        # Shape: (batch_size, embedding_dim + hidden_size)
        z_t = tf.concat([x_t, h_t], axis=1)

        # Forget gate
        ft = tf.sigmoid(tf.matmul(z_t, self.wf) + self.bf)

        # Input gate
        it = tf.sigmoid(tf.matmul(z_t, self.wi) + self.bi)

        # Candidate cell state
        can = tf.tanh(tf.matmul(z_t, self.wc) + self.bc)

        # Output gate
        ot = tf.sigmoid(tf.matmul(z_t, self.wo) + self.bo)

        # Update cell state
        c_t = c_t * ft + can * it

        # Update hidden state
        h_t = ot * tf.tanh(c_t)

    # Compute final output
    y_t = tf.matmul(h_t, self.wv) + self.bv

    return y_t, (h_t, c_t)

9.5.2 freeze()

Freezes the current module and all submodules that are instances of BayesianModule. Sets the frozen state to True.

Source code in illia/nn/tf/lstm.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
def freeze(self) -> None:
    """
    Freezes the current module and all submodules that are instances
    of BayesianModule. Sets the frozen state to True.
    """

    # Set indicator
    self.frozen = True

    # Freeze embedding layer
    self.embedding.freeze()

    # Forget gate
    if self.wf is None:
        self.wf = self.wf_distribution.sample()
    if self.bf is None:
        self.bf = self.bf_distribution.sample()
    self.wf = tf.stop_gradient(self.wf)
    self.bf = tf.stop_gradient(self.bf)

    # Input gate
    if self.wi is None:
        self.wi = self.wi_distribution.sample()
    if self.bi is None:
        self.bi = self.bi_distribution.sample()
    self.wi = tf.stop_gradient(self.wi)
    self.bi = tf.stop_gradient(self.bi)

    # Candidate gate
    if self.wc is None:
        self.wc = self.wc_distribution.sample()
    if self.bc is None:
        self.bc = self.bc_distribution.sample()
    self.wc = tf.stop_gradient(self.wc)
    self.bc = tf.stop_gradient(self.bc)

    # Output gate
    if self.wo is None:
        self.wo = self.wo_distribution.sample()
    if self.bo is None:
        self.bo = self.bo_distribution.sample()
    self.wo = tf.stop_gradient(self.wo)
    self.bo = tf.stop_gradient(self.bo)

    # Final output layer
    if self.wv is None:
        self.wv = self.wv_distribution.sample()
    if self.bv is None:
        self.bv = self.bv_distribution.sample()
    self.wv = tf.stop_gradient(self.wv)
    self.bv = tf.stop_gradient(self.bv)

9.5.3 kl_cost()

Computes the Kullback-Leibler (KL) divergence cost for the layer's weights and bias.

Returns:

Type Description
Tensor

tuple containing KL divergence cost and total number of

int

parameters.

Source code in illia/nn/tf/lstm.py
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
def kl_cost(self) -> tuple[tf.Tensor, int]:
    """
    Computes the Kullback-Leibler (KL) divergence cost for the
    layer's weights and bias.

    Returns:
        tuple containing KL divergence cost and total number of
        parameters.
    """

    # Compute log probs for each pair of weights and bias
    log_probs_f = self.wf_distribution.log_prob(
        self.wf
    ) + self.bf_distribution.log_prob(self.bf)

    log_probs_i = self.wi_distribution.log_prob(
        self.wi
    ) + self.bi_distribution.log_prob(self.bi)

    log_probs_c = self.wc_distribution.log_prob(
        self.wc
    ) + self.bc_distribution.log_prob(self.bc)

    log_probs_o = self.wo_distribution.log_prob(
        self.wo
    ) + self.bo_distribution.log_prob(self.bo)

    log_probs_v = self.wv_distribution.log_prob(
        self.wv
    ) + self.bv_distribution.log_prob(self.bv)

    # Compute the total loss
    log_probs = log_probs_f + log_probs_i + log_probs_c + log_probs_o + log_probs_v

    # Compute number of parameters
    num_params = (
        self.wf_distribution.num_params
        + self.bf_distribution.num_params
        + self.wi_distribution.num_params
        + self.bi_distribution.num_params
        + self.wc_distribution.num_params
        + self.bc_distribution.num_params
        + self.wo_distribution.num_params
        + self.bo_distribution.num_params
        + self.wv_distribution.num_params
        + self.bv_distribution.num_params
    )

    return log_probs, num_params