Integration

The bertblocks.integration package converts HuggingFace models into BertBlocks equivalents.

Loading Models

bertblocks.integration.from_huggingface(
pretrained_model_name_or_path: str,
load_weights: bool = True,
add_pooling_layer: bool = False,
) BertBlocksModel[source]

Instantiate an equivalent BertBlocksModel from HuggingFace pretrained models.

Automatically detects the model type and routes to the appropriate conversion function. Supports BERT-like encoder models available on HuggingFace Hub.

Parameters:
  • pretrained_model_name_or_path (str) – HuggingFace model identifier (e.g., “bert-base-uncased”, “modernbert-base”) or local path to a pretrained model directory.

  • load_weights (bool, optional) – Whether to transfer weights from the pretrained HuggingFace model. If True, copies all layer parameters. If False, only loads the configuration and initializes a fresh model with random weights. Defaults to True.

  • add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.

Returns:

A BertBlocks model with architecture matched to the source HuggingFace model,

optionally loaded with pretrained weights.

Return type:

BertBlocksModel

Raises:
  • ValueError – If the model type is not supported or cannot be detected.

  • OSError – If the model path does not exist or is not accessible.

  • ImportError – If required HuggingFace transformers models are not installed.

Supported Models:
  • BERT and variants (bert-base-uncased, bert-large-uncased, etc.)

  • ModernBERT (modernbert-base, modernbert-large, etc.)

  • Other BERT-like encoder models compatible with HuggingFace

Example

>>> from bertblocks.integration import from_huggingface
>>> # Load BERT model
>>> bert_model = from_huggingface("bert-base-uncased")
>>> # Load ModernBERT model
>>> modernbert_model = from_huggingface("modernbert-base")
>>> # Load without transferring weights
>>> fresh_model = from_huggingface("bert-base-uncased", load_weights=False)

References

Model-Specific Loaders

bertblocks.integration.load_modernbert.from_modernbert_model(
pretrained_model_name_or_path: str,
load_weights: bool = True,
add_pooling_layer: bool = False,
attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'flash_attention_2',
) BertBlocksModel[source]

Instantiate an equivalent BertBlocks model from pretrained HuggingFace ModernBERT weights and config.

Converts a HuggingFace ModernBERT model to BertBlocks architecture with optional weight transfer. ModernBERT uses ALiBi positional encodings, GLU feed-forward layers, and supports local attention, which are all supported by BertBlocks.

Parameters:
  • pretrained_model_name_or_path (str) – HuggingFace model identifier or local path to a pretrained ModernBERT model (e.g., “modernbert-base”, “./path/to/model”).

  • load_weights (bool, optional) – Whether to transfer weights from the pretrained ModernBERT model. If True, copies all embeddings, attention, feed-forward, and normalization layer weights. If False, only loads the configuration and initializes a fresh model. Defaults to True.

  • add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.

  • attn_implementation (Literal["flash_attention_2", "sdpa", "eager"], optional) – Attention implementation backend to use: - “flash_attention_2”: Use FlashAttention-2 for faster inference (requires flash-attn package) - “sdpa”: Use PyTorch’s scaled-dot-product attention (default, recommended for most cases) - “eager”: Use manual attention implementation (slower, for compatibility) Defaults to “flash_attention_2”.

Returns:

A BertBlocks model with architecture matched to ModernBERT, optionally

loaded with pretrained weights.

Return type:

BertBlocksModel

Raises:
  • ValueError – If the model config cannot be loaded or if the model type is not ModernBERT.

  • OSError – If the model path does not exist or is not accessible.

Note

  • The weight transfer is exact and lossless; no approximation is used.

  • All layer parameters (embeddings, QKV projections, GLU layers, norms) are copied directly.

  • The pooling layer (if added) is initialized with new random weights.

  • Final normalization layer weights are transferred if included in the model.

Example

>>> from bertblocks.integration import from_modernbert_model
>>> # Load and convert a pretrained ModernBERT model with FlashAttention
>>> model = from_modernbert_model("modernbert-base", load_weights=True)
>>> # Load with SDPA backend for broader compatibility
>>> model = from_modernbert_model("modernbert-base", attn_implementation="sdpa")

References

bertblocks.integration.load_bert.from_bert_model(
pretrained_model_name_or_path: str,
load_weights: bool = True,
add_pooling_layer: bool = False,
) BertBlocksModel[source]

Instantiate an equivalent BertBlocks model from pretrained HuggingFace BERT weights and config.

Converts a HuggingFace BERT model to BertBlocks architecture with optional weight transfer. The BertBlocks model uses post-normalization and standard MLP architecture to match BERT.

Parameters:
  • pretrained_model_name_or_path (str) – HuggingFace model identifier or local path to a pretrained BERT model (e.g., “bert-base-uncased”, “./path/to/model”).

  • load_weights (bool, optional) – Whether to transfer weights from the pretrained BERT model. If True, copies all embeddings, attention, feed-forward, and normalization layer weights. If False, only loads the configuration and initializes a fresh model. Defaults to True.

  • add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.

Returns:

A BertBlocks model with architecture matched to BERT, optionally

loaded with pretrained weights.

Return type:

BertBlocksModel

Raises:
  • ValueError – If the model config cannot be loaded or if the model type is not BERT.

  • OSError – If the model path does not exist or is not accessible.

Note

  • The weight transfer is exact and lossless; no approximation is used.

  • All layer parameters (embeddings, QKV projections, feed-forward, norms) are copied directly.

  • The pooling layer (if added) is initialized with new random weights.

Example

>>> from bertblocks.integration import from_bert_model
>>> # Load and convert a pretrained BERT model
>>> model = from_bert_model("bert-base-uncased", load_weights=True)
>>> # Or load just the config without weights for a fresh model
>>> model = from_bert_model("bert-base-uncased", load_weights=False)

References