Benchmarks¶
The bertblocks.benchmarks package provides evaluation suites for finetuned models.
Running Evaluations¶
- bertblocks.benchmarks.run_eval(
- task_modules: list[type[TaskModule]],
- pretrained_model_name_or_path: str,
- pretrained_tokenizer_name_or_path: str | None = None,
- max_seq_length: int = 256,
- max_epochs: int = 3,
- learning_rate: float = 2e-05,
- weight_decay: float = 0.01,
- train_batch_size: int = 32,
- eval_batch_size: int = 64,
- task_config: dict[str, dict[str, Any]] | None = None,
Run evaluation on a list of task modules.
- Parameters:
task_modules – List of TaskModule subclasses to evaluate.
pretrained_model_name_or_path – HuggingFace model name or path.
pretrained_tokenizer_name_or_path – HuggingFace tokenizer name or path. If None, uses pretrained_model_name_or_path.
max_seq_length – Maximum sequence length for tokenization.
max_epochs – Number of training epochs per task.
learning_rate – Learning rate for AdamW optimizer.
weight_decay – Weight decay for AdamW optimizer.
train_batch_size – Batch size for training.
eval_batch_size – Batch size for evaluation.
task_config – Optional per-task hyperparameter overrides. Keys are task class names, values are dicts with optional keys: learning_rate, epochs, weight_decay.
- Returns:
Name, Group, Type, Metric, Score
- Return type:
DataFrame with columns
Task Modules¶
- class bertblocks.benchmarks.base.TaskModule(
- pretrained_model_name_or_path: str,
- pretrained_tokenizer_name_or_path: str,
- max_seq_length: int | None = 512,
- learning_rate: float | None = 1e-05,
- weight_decay: float | None = 1e-06,
- train_batch_size: int | None = 128,
- eval_batch_size: int | None = 128,
- num_workers: int | None = 2,
- max_epochs: int = 3,
Bases:
ABC,LightningModuleBase LightningModule for evaluation tasks.
GLUE¶
- class bertblocks.benchmarks.glue.GLUETaskModule(
- pretrained_model_name_or_path: str,
- pretrained_tokenizer_name_or_path: str,
- max_seq_length: int | None = 512,
- learning_rate: float | None = 1e-05,
- weight_decay: float | None = 1e-06,
- train_batch_size: int | None = 128,
- eval_batch_size: int | None = 128,
- num_workers: int | None = 2,
- max_epochs: int = 3,
Bases:
TaskModuleBase class for GLUE benchmark tasks.
Individual GLUE tasks: CoLA,
SST2,
MRPC,
QQP,
STSB,
MNLI,
QNLI,
RTE,
WNLI.
SuperGLEBer¶
- class bertblocks.benchmarks.supergleber.SuperGLEBerTaskModule(
- pretrained_model_name_or_path: str,
- pretrained_tokenizer_name_or_path: str,
- max_seq_length: int | None = 512,
- learning_rate: float | None = 1e-05,
- weight_decay: float | None = 1e-06,
- train_batch_size: int | None = 128,
- eval_batch_size: int | None = 128,
- num_workers: int | None = 2,
- max_epochs: int = 3,
Bases:
TaskModuleBase class for superGLEBER tasks.