Training API

Main Training Module

class cryoPARES.train.train.Trainer(symmetry, particles_star_fname, train_save_dir, particles_dir=None, n_epochs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, batch_size=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, num_dataworkers=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, image_size_px_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, sampling_rate_angs_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, mask_radius_angs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, split_halves=True, continue_checkpoint_dir=None, finetune_checkpoint_dir=None, compile_model=False, val_check_interval=None, overfit_batches=None, map_fname_for_simulated_pretraining=None, junk_particles_star_fname=None, junk_particles_dir=None)[source]

Bases: object

Parameters:

symmetry (str)
particles_star_fname (List[str])
train_save_dir (str)
particles_dir (List[str] | None)
n_epochs (int)
batch_size (int)
num_dataworkers (int)
image_size_px_for_nnet (int)
sampling_rate_angs_for_nnet (float)
mask_radius_angs (float | None)
split_halves (bool)
continue_checkpoint_dir (str | None)
finetune_checkpoint_dir (str | None)
compile_model (bool)
val_check_interval (float | None)
overfit_batches (int | None)
map_fname_for_simulated_pretraining (List[str] | None)
junk_particles_star_fname (List[str] | None)
junk_particles_dir (List[str] | None)

__init__(symmetry, particles_star_fname, train_save_dir, particles_dir=None, n_epochs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, batch_size=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, num_dataworkers=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, image_size_px_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, sampling_rate_angs_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, mask_radius_angs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, split_halves=True, continue_checkpoint_dir=None, finetune_checkpoint_dir=None, compile_model=False, val_check_interval=None, overfit_batches=None, map_fname_for_simulated_pretraining=None, junk_particles_star_fname=None, junk_particles_dir=None)[source]

Train a model on particle data.

Parameters:

symmetry (str) – Point group symmetry of the molecule (e.g., C1, D7, I, O, T)
particles_star_fname (List[str]) – Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files
train_save_dir (str) – Output directory where model checkpoints, logs, and training artifacts will be saved
particles_dir (Optional[List[str]]) – Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept)
n_epochs (int) – Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point
batch_size (int) – Number of particles per batch. Try to make it as large as possible before running out of GPU memory. We advice using batch sizes of at least 32 images
num_dataworkers (int) – Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs
image_size_px_for_nnet (int) – Target image size in pixels for neural network input. After rescaling to target sampling rate, images are cropped or padded to this size. We recommend tight box-sizes
sampling_rate_angs_for_nnet (float) – Target sampling rate in Angstroms/pixel for neural network input. Particle images are first rescaled to this sampling rate before processing
mask_radius_angs (Optional[float]) – Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size
split_halves (bool) – If True (default), trains two separate models on data half-sets for cross-validation. Use –NOT_split_halves to train single model on all data
continue_checkpoint_dir (Optional[str]) – Path to checkpoint directory to resume training from a previous run
finetune_checkpoint_dir (Optional[str]) – Path to checkpoint directory to fine-tune a pre-trained model on new dataset
compile_model (bool) – Enable torch.compile for faster training (experimental)
val_check_interval (Optional[float]) – Fraction of epoch between validation checks. You generally don’t want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback
overfit_batches (Optional[int]) – Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset)
map_fname_for_simulated_pretraining (Optional[List[str]]) – Path(s) to reference map(s) for simulated projection warmup before training on real data. The number of maps must match number of particle star files
junk_particles_star_fname (Optional[List[str]]) – Optional star file(s) with junk-only particles for estimating confidence z-score thresholds
junk_particles_dir (Optional[List[str]]) – Root directory for junk particle image paths (analogous to particles_dir)

get_continue_checkpoint_fname(partition)[source]

Parameters:: partition (Literal['allParticles', 'half1', 'half2'])

get_finetune_checkpoint_fname(partition)[source]

Parameters:: partition (Literal['allParticles', 'half1', 'half2'])

run(config_args)[source]

Parameters:: config_args – The command line arguments provided to modify the config
Returns:

cryoPARES.train.train.main()[source]

Trainer Class

class cryoPARES.train.train.Trainer(symmetry, particles_star_fname, train_save_dir, particles_dir=None, n_epochs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, batch_size=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, num_dataworkers=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, image_size_px_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, sampling_rate_angs_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, mask_radius_angs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, split_halves=True, continue_checkpoint_dir=None, finetune_checkpoint_dir=None, compile_model=False, val_check_interval=None, overfit_batches=None, map_fname_for_simulated_pretraining=None, junk_particles_star_fname=None, junk_particles_dir=None)[source]

Bases: object

Parameters:

symmetry (str)
particles_star_fname (List[str])
train_save_dir (str)
particles_dir (List[str] | None)
n_epochs (int)
batch_size (int)
num_dataworkers (int)
image_size_px_for_nnet (int)
sampling_rate_angs_for_nnet (float)
mask_radius_angs (float | None)
split_halves (bool)
continue_checkpoint_dir (str | None)
finetune_checkpoint_dir (str | None)
compile_model (bool)
val_check_interval (float | None)
overfit_batches (int | None)
map_fname_for_simulated_pretraining (List[str] | None)
junk_particles_star_fname (List[str] | None)
junk_particles_dir (List[str] | None)

__init__(symmetry, particles_star_fname, train_save_dir, particles_dir=None, n_epochs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, batch_size=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, num_dataworkers=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, image_size_px_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, sampling_rate_angs_for_nnet=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, mask_radius_angs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, split_halves=True, continue_checkpoint_dir=None, finetune_checkpoint_dir=None, compile_model=False, val_check_interval=None, overfit_batches=None, map_fname_for_simulated_pretraining=None, junk_particles_star_fname=None, junk_particles_dir=None)[source]

Train a model on particle data.

Parameters:

symmetry (str) – Point group symmetry of the molecule (e.g., C1, D7, I, O, T)
particles_star_fname (List[str]) – Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files
train_save_dir (str) – Output directory where model checkpoints, logs, and training artifacts will be saved
particles_dir (Optional[List[str]]) – Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept)
n_epochs (int) – Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point
batch_size (int) – Number of particles per batch. Try to make it as large as possible before running out of GPU memory. We advice using batch sizes of at least 32 images
num_dataworkers (int) – Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs
image_size_px_for_nnet (int) – Target image size in pixels for neural network input. After rescaling to target sampling rate, images are cropped or padded to this size. We recommend tight box-sizes
sampling_rate_angs_for_nnet (float) – Target sampling rate in Angstroms/pixel for neural network input. Particle images are first rescaled to this sampling rate before processing
mask_radius_angs (Optional[float]) – Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size
split_halves (bool) – If True (default), trains two separate models on data half-sets for cross-validation. Use –NOT_split_halves to train single model on all data
continue_checkpoint_dir (Optional[str]) – Path to checkpoint directory to resume training from a previous run
finetune_checkpoint_dir (Optional[str]) – Path to checkpoint directory to fine-tune a pre-trained model on new dataset
compile_model (bool) – Enable torch.compile for faster training (experimental)
val_check_interval (Optional[float]) – Fraction of epoch between validation checks. You generally don’t want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback
overfit_batches (Optional[int]) – Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset)
map_fname_for_simulated_pretraining (Optional[List[str]]) – Path(s) to reference map(s) for simulated projection warmup before training on real data. The number of maps must match number of particle star files
junk_particles_star_fname (Optional[List[str]]) – Optional star file(s) with junk-only particles for estimating confidence z-score thresholds
junk_particles_dir (Optional[List[str]]) – Root directory for junk particle image paths (analogous to particles_dir)

get_continue_checkpoint_fname(partition)[source]

Parameters:: partition (Literal['allParticles', 'half1', 'half2'])

get_finetune_checkpoint_fname(partition)[source]

Parameters:: partition (Literal['allParticles', 'half1', 'half2'])

run(config_args)[source]

Parameters:: config_args – The command line arguments provided to modify the config
Returns:

Training Execution

class cryoPARES.train.runTrainOnePartition.TrainerPartition(symmetry, particles_star_fname, train_save_dir, particles_dir=None, n_epochs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, partition='allParticles', continue_checkpoint_fname=None, finetune_checkpoint_fname=None, find_lr=False, compile_model=False, val_check_interval=None, num_dataworkers=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, mask_radius_angs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, overfit_batches=None)[source]

Bases: object

Parameters:

symmetry (str)
particles_star_fname (List[str])
train_save_dir (str)
particles_dir (List[str] | None)
n_epochs (int)
partition (Literal['allParticles', 'half1', 'half2'])
continue_checkpoint_fname (str | None)
finetune_checkpoint_fname (str | None)
find_lr (bool)
compile_model (bool)
val_check_interval (float | None)
num_dataworkers (int)
mask_radius_angs (float | None)
overfit_batches (int | None)

__init__(symmetry, particles_star_fname, train_save_dir, particles_dir=None, n_epochs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, partition='allParticles', continue_checkpoint_fname=None, finetune_checkpoint_fname=None, find_lr=False, compile_model=False, val_check_interval=None, num_dataworkers=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, mask_radius_angs=<cryoPARES.configManager.inject_defaults.CONFIG_PARAM object>, overfit_batches=None)[source]

Initialize trainer for a single partition.

Parameters:

symmetry (str) – Point group symmetry of the molecule (e.g., C1, D7, I, O, T)
particles_star_fname (List[str]) – Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files
train_save_dir (str) – Output directory where model checkpoints, logs, and training artifacts will be saved
particles_dir (Optional[List[str]]) – Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept)
n_epochs (int) – Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point
partition (Literal['allParticles', 'half1', 'half2']) – Data partition to train on: “half1”, “half2”, or “allParticles”. Used for half-set cross-validation
continue_checkpoint_fname (Optional[str]) – Path to specific checkpoint file to resume training from previous run
finetune_checkpoint_fname (Optional[str]) – Path to specific checkpoint file to fine-tune pre-trained model on new data
find_lr (bool) – Enable automatic learning rate finder to suggest optimal learning rate (single GPU only). Not recommended
compile_model (bool) – Enable torch.compile for faster training (experimental)
val_check_interval (Optional[float]) – Fraction of epoch between validation checks. You generally don’t want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback
num_dataworkers (int) – Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs
mask_radius_angs (Optional[float]) – Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size
overfit_batches (Optional[int]) – Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset)

run(config_args)[source]

cryoPARES.train.runTrainOnePartition.get_done_fname(dirname, partition)[source]

Get path to the DONE_TRAINING.txt file for a partition.

Parameters:

dirname (str) – Root directory of the experiment
partition (str) – Partition name (half1, half2, or allParticles)

Return type:

str

cryoPARES.train.runTrainOnePartition.get_reconstructions_dir(dirname, partition)[source]

Parameters:

dirname (str)
partition (str)

cryoPARES.train.runTrainOnePartition.check_if_training_partion_done(dirname, partition)[source]

Check if training for given partition is complete by looking for DONE_TRAINING.txt

Parameters:

dirname (str) – Root directory of the experiment (generally xxxx_v1, xxxx_v2…)
partition (str) – Partition name (half1, half2, or allParticles)

cryoPARES.train.runTrainOnePartition.execute_trainOnePartition(**kwargs)[source]