Integration

The bertblocks.integration package converts HuggingFace models into BertBlocks equivalents.

Loading Models

bertblocks.integration.from_huggingface(
pretrained_model_name_or_path: str,
load_weights: bool = True,
add_pooling_layer: bool = False,
attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'sdpa',
) BertBlocksModel[source]

Instantiate an equivalent BertBlocksModel from HuggingFace pretrained models.

Automatically detects the model type and routes to the appropriate conversion function. Supports BERT-like encoder models available on HuggingFace Hub.

Parameters:
  • pretrained_model_name_or_path (str) – HuggingFace model identifier (e.g., “bert-base-uncased”, “modernbert-base”) or local path to a pretrained model directory.

  • load_weights (bool, optional) – Whether to transfer weights from the pretrained HuggingFace model. If True, copies all layer parameters. If False, only loads the configuration and initializes a fresh model with random weights. Defaults to True.

  • add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.

  • attn_implementation (str, optional) – Attention backend. One of “flash_attention_2”, “sdpa”, or “eager”. Defaults to “sdpa”.

Returns:

A BertBlocks model with architecture matched to the source HuggingFace model,

optionally loaded with pretrained weights.

Return type:

BertBlocksModel

Raises:

ValueError – If the model type is not supported or cannot be detected.

Model-Specific Loaders

bertblocks.integration.load_modernbert.from_modernbert_model(
orig_model: ModernBertModel,
add_pooling_layer: bool = False,
attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'sdpa',
) BertBlocksModel[source]

Instantiate an equivalent BertBlocks model from a HuggingFace ModernBERT model instance.

Parameters:
  • orig_model – An instance of a HuggingFace ModernBertModel.

  • add_pooling_layer (bool, optional) – Whether to add a pooling layer. Defaults to False.

  • attn_implementation (str, optional) – Attention backend. One of “flash_attention_2”, “sdpa”, or “eager”. Defaults to “sdpa”.

Returns:

A BertBlocks model with architecture matched to ModernBERT,

loaded with pretrained weights.

Return type:

BertBlocksModel

bertblocks.integration.load_bert.from_bert_model(
orig_model: BertModel,
add_pooling_layer: bool = False,
attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'sdpa',
) BertBlocksModel[source]

Instantiate an equivalent BertBlocks model from a HuggingFace BERT model instance.

Converts a HuggingFace BERT model to BertBlocks architecture with weight transfer. The BertBlocks model uses post-normalization and standard MLP architecture to match BERT.

Parameters:
  • orig_model (BertModel) – An instance of a HuggingFace BertModel that has been loaded with pretrained weights.

  • add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Defaults to False.

Returns:

A BertBlocks model with architecture matched to BERT,

loaded with pretrained weights.

Return type:

BertBlocksModel

References