Command-Line Interface

This document provides instructions for using the command-line tools included with cryoPARES.

Table of Contents

cryopares_train - Train a new model
cryopares_infer - Run inference on new particles
cryopares_projmatching - Align particles via projection matching
cryopares_reconstruct - Reconstruct 3D volume from aligned particles
compactify_checkpoint - Package checkpoint for distribution
Utility Scripts - Analysis and visualization tools

`cryopares_train`

Train a CryoPARES model on pre-aligned particle data.

Usage

cryopares_train [OPTIONS]

Parameters

Parameter	Type	Default	Description
`--symmetry`	str	Required	Point group symmetry of the molecule (e.g., C1, D7, I, O, T)
`--particles_star_fname`	List[str]	Required	Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files
`--train_save_dir`	str	Required	Output directory where model checkpoints, logs, and training artifacts will be saved
`--particles_dir`	Optional[List[str]]	None	Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept)
`--n_epochs`	int	`100`	Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point
`--batch_size`	int	`32`	Number of particles per batch. Try to make it as large as possible before running out of GPU memory. We advice using batch sizes of at least 32 images
`--num_dataworkers`	int	`8`	Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs
`--image_size_px_for_nnet`	int	`160`	Target image size in pixels for neural network input. After rescaling to target sampling rate, images are cropped or padded to this size. We recommend tight box-sizes
`--sampling_rate_angs_for_nnet`	float	`1.5`	Target sampling rate in Angstroms/pixel for neural network input. Particle images are first rescaled to this sampling rate before processing
`--mask_radius_angs`	Optional[float]	None	Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size
`--split_halves`	bool	`True`	If True (default), trains two separate models on data half-sets for cross-validation. Use –NOT_split_halves to train single model on all data
`--continue_checkpoint_dir`	Optional[str]	None	Path to checkpoint directory to resume training from a previous run
`--finetune_checkpoint_dir`	Optional[str]	None	Path to checkpoint directory to fine-tune a pre-trained model on new dataset
`--compile_model`	bool	`False`	Enable torch.compile for faster training (experimental)
`--val_check_interval`	Optional[float]	None	Fraction of epoch between validation checks. You generally don’t want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback
`--overfit_batches`	Optional[int]	None	Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset)
`--map_fname_for_simulated_pretraining`	Optional[List[str]]	None	Path(s) to reference map(s) for simulated projection warmup before training on real data. The number of maps must match number of particle star files
`--junk_particles_star_fname`	Optional[List[str]]	None	Optional star file(s) with junk-only particles for estimating confidence z-score thresholds
`--junk_particles_dir`	Optional[List[str]]	None	Root directory for junk particle image paths (analogous to particles_dir)

--map_fname_for_simulated_pretraining MAP_FNAME Reference map(s) for simulated pre-training Must match length of --particles_star_fname

Configuration Overrides

Use --config as the last argument to override configuration parameters:

--config KEY=VALUE KEY2=VALUE2 ...

Common configuration overrides:

--config \
    train.learning_rate=1e-3 \
    train.weight_decay=1e-5 \
    train.accumulate_grad_batches=16 \
    models.image2sphere.lmax=8 \
    datamanager.particlesDataset.sampling_rate_angs_for_nnet=2.0 \
    datamanager.particlesDataset.image_size_px_for_nnet=128

View All Config Options

python -m cryopares_train --show-config

Examples

Basic training:

cryopares_train \
    --symmetry C1 \
    --particles_star_fname /path/to/aligned_particles.star \
    --train_save_dir /path/to/output \
    --n_epochs 20

Training with custom parameters:

cryopares_train \
    --symmetry D7 \
    --particles_star_fname /path/to/particles.star \
    --particles_dir /path/to/particles \
    --train_save_dir /path/to/output \
    --n_epochs 30 \
    --batch_size 64 \
    --compile_model \
    --config \
        train.learning_rate=5e-3 \
        models.image2sphere.lmax=10 \
        datamanager.particlesDataset.sampling_rate_angs_for_nnet=1.5

Continue training from checkpoint:

cryopares_train \
    --continue_checkpoint_dir /path/to/output/version_0 \
    --n_epochs 50

Fine-tune on new data:

cryopares_train \
    --symmetry C1 \
    --particles_star_fname /path/to/new_particles.star \
    --train_save_dir /path/to/finetuned_output \
    --finetune_checkpoint_dir /path/to/pretrained/version_0 \
    --n_epochs 10 \
    --config train.learning_rate=1e-4

Important Notes

File descriptor limit: Run ulimit -n 65536 before training to avoid “too many open files” errors
GPU memory: Reduce --batch_size or image_size_px_for_nnet if you encounter OOM errors
Monitoring: Use TensorBoard to monitor training: tensorboard --logdir /path/to/output/version_0

`cryopares_infer`

Run inference on new particles using a trained model.

Usage

cryopares_infer [OPTIONS]

Parameters

Parameter	Type	Default	Description
`--particles_star_fname`	str	Required	Path to input RELION particles .star file
`--checkpoint_dir`	str	Required	Path to training directory (or .zip file) containing half-set models with checkpoints and hyperparameters. By default they are called version_0, version_1, etc.
`--results_dir`	str	Required	Output directory for inference results including predicted poses and optional reconstructions
`--data_halfset`	‘half1’, ‘half2’, ‘allParticles’	`allParticles`	Which particle half-set(s) to process: “half1”, “half2”, or “allParticles”
`--model_halfset`	‘half1’, ‘half2’, ‘allCombinations’, ‘matchingHalf’	`matchingHalf`	Model half-set selection policy: “half1”, “half2”, “allCombinations”, or “matchingHalf” (uses matching data/model pairs)
`--particles_dir`	Optional[str]	None	Root directory for particle image paths. If provided, overrides paths in the .star file
`--batch_size`	int	`64`	Number of particles per batch for inference
`--n_jobs`	Optional[int]	None	Number of worker processes. Defaults to number of GPUs if CUDA enabled, otherwise 1
`--num_dataworkers`	int	`8`	Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs
`--use_cuda`	bool	`True`	Enable GPU acceleration for inference. If False, runs on CPU only
`--n_cpus_if_no_cuda`	int	`4`	Maximum CPU threads per worker when CUDA is disabled
`--compile_model`	bool	`False`	Compile model with torch.compile for faster inference (experimental, requires PyTorch 2.0+)
`--top_k_poses_nnet`	int	`1`	Number of top pose predictions to retrieve from neural network before local refinement
`--top_k_poses_localref`	int	`1`	Number of best matching poses to keep after local refinement
`--grid_distance_degs`	float	`6.0`	Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose
`--reference_map`	Optional[str]	None	Path to reference map (.mrc) for FSC computation during validation
`--reference_mask`	Optional[str]	None	Path to reference mask (.mrc) for masked FSC calculation
`--directional_zscore_thr`	Optional[float]	None	Confidence z-score threshold for filtering particles. Particles with scores below this are discarded as low-confidence
`--skip_localrefinement`	bool	`False`	Skip local pose refinement step and use only neural network predictions
`--skip_reconstruction`	bool	`False`	Skip 3D reconstruction step and output only predicted poses
`--subset_idxs`	Optional[List[int]]	None	List of particle indices to process (for debugging or partial processing)
`--n_first_particles`	Optional[int]	None	Process only the first N particles from dataset (debug feature)
`--check_interval_secs`	float	`2.0`	Polling interval in seconds for parent loop in distributed processing

Configuration Overrides

--config \
    inference.directional_zscore_thr=2.0 \
    inference.top_k_poses_nnet=10 \
    inference.skip_localrefinement=False \
    inference.skip_reconstruction=False \
    projmatching.grid_distance_degs=8.0 \
    projmatching.grid_step_degs=2.0

Examples

Basic inference:

cryopares_infer \
    --particles_star_fname /path/to/new_particles.star \
    --checkpoint_dir /path/to/training/version_0 \
    --results_dir /path/to/inference_results

Inference with reference map:

cryopares_infer \
    --particles_star_fname /path/to/particles.star \
    --checkpoint_dir /path/to/training/version_0 \
    --results_dir /path/to/results \
    --reference_map /path/to/reference.mrc \
    --config \
        inference.directional_zscore_thr=2.0 \
        projmatching.grid_distance_degs=10.0

Process only half1 with matching model:

cryopares_infer \
    --particles_star_fname /path/to/particles.star \
    --checkpoint_dir /path/to/training/version_0 \
    --results_dir /path/to/results \
    --data_halfset half1 \
    --model_halfset matchingHalf

Fast inference (skip reconstruction):

cryopares_infer \
    --particles_star_fname /path/to/particles.star \
    --checkpoint_dir /path/to/training/version_0 \
    --results_dir /path/to/results \
    --batch_size 2048 \
    --config inference.skip_reconstruction=True

Output Files

The inference process creates:

results_dir/particles_aligned.star - Aligned particles with predicted poses
results_dir/reconstruction.mrc - 3D reconstruction (if not skipped)
results_dir/fsc.txt - FSC curve (if half-sets used)
results_dir/inference.log - Inference log

`cryopares_projmatching`

Align particles to a reference volume using projection matching.

Usage

cryopares_projmatching [OPTIONS]

Parameters

Parameter	Type	Default	Description
`--reference_vol`	str	Required	Path to reference 3D volume (.mrc file) for generating projection templates
`--particles_star_fname`	str	Required	Path to input STAR file with particle metadata
`--out_fname`	str	Required	Path for output STAR file with aligned particle poses
`--particles_dir`	Optional[str]	Required	Root directory for particle image paths. If provided, overrides paths in the .star file
`--mask_radius_angs`	Optional[float]	None	Radius of circular mask in Angstroms applied to particle images
`--grid_distance_degs`	float	`6.0`	Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose
`--grid_step_degs`	float	`2.0`	Angular step size in degrees for grid search during local refinement
`--return_top_k_poses`	int	`1`	Number of top matching poses to save per particle
`--filter_resolution_angst`	Optional[float]	None	Low-pass filter resolution in Angstroms applied to reference volume before matching
`--n_jobs`	int	`1`	Number of parallel worker processes for distributed projection matching
`--num_dataworkers`	int	`1`	Number of CPU workers per PyTorch DataLoader for data loading
`--batch_size`	int	`1024`	Number of particles to process simultaneously per job
`--use_cuda`	bool	`True`	Enable GPU acceleration. If False, runs on CPU only
`--verbose`	bool	`False`	Enable verbose logging output
`--float32_matmul_precision`	‘highest’, ‘high’, ‘medium’	`high`	PyTorch float32 matrix multiplication precision mode (“highest”, “high”, or “medium”)
`--gpu_id`	Optional[int]	None	Specific GPU device ID to use (if multiple GPUs available)
`--n_first_particles`	Optional[int]	None	Process only the first N particles from dataset (for testing or validation)
`--correct_ctf`	bool	`True`	Apply CTF correction during projection matching
`--halfmap_subset`	Optional[‘1’, ‘2’	None	Select half-map subset (1 or 2) for half-map validation

Example

cryopares_projmatching \
    --reference_vol /path/to/your/reference.mrc \
    --particles_star_fname /path/to/your/particles.star \
    --out_fname /path/to/your/aligned_particles.star \
    --grid_distance_degs 10

`cryopares_reconstruct`

Reconstruct a 3D volume from particles with known poses.

Usage

cryopares_reconstruct [OPTIONS]

Parameters

Parameter	Type	Default	Description
`--particles_star_fname`	str	Required	Path to input STAR file with particle metadata and poses to reconstruct
`--symmetry`	str	Required	Point group symmetry of the volume for reconstruction (e.g., C1, D2, I, O, T)
`--output_fname`	str	Required	Path for output reconstructed 3D volume (.mrc file)
`--particles_dir`	Optional[str]	None	Root directory for particle image paths. If provided, overrides paths in the .star file
`--n_jobs`	int	`1`	Number of parallel worker processes for distributed reconstruction
`--num_dataworkers`	int	`1`	Number of CPU workers per PyTorch DataLoader for data loading
`--batch_size`	int	`128`	Number of particles to backproject simultaneously per job
`--use_cuda`	bool	`True`	Enable GPU acceleration for reconstruction. If False, runs on CPU only
`--correct_ctf`	bool	`True`	Apply CTF correction during reconstruction
`--eps`	float	`0.001`	Regularization constant for reconstruction (ideally set to 1/SNR). Prevents division by zero and stabilizes reconstruction
`--min_denominator_value`	Optional[float]	None	Minimum value for denominator to prevent numerical instabilities during reconstruction
`--use_only_n_first_batches`	Optional[int]	None	Reconstruct using only first N batches (for testing or quick validation)
`--float32_matmul_precision`	Optional[str]	`high`	PyTorch float32 matrix multiplication precision mode (“highest”, “high”, or “medium”)
`--weight_with_confidence`	bool	`False`	Apply per-particle confidence weighting during backprojection. If True, particles with higher confidence contribute more to reconstruction. It reads the confidence from the metadata label “rlnParticleFigureOfMerit”
`--halfmap_subset`	Optional[‘1’, ‘2’	None	Select half-map subset (1 or 2) for half-map reconstruction and validation

Example

cryopares_reconstruct \
    --particles_star_fname /path/to/your/particles.star \
    --symmetry C1 \
    --output_fname /path/to/your/reconstruction.mrc

`compactify_checkpoint`

Package a trained checkpoint directory into a compact ZIP file for easy distribution and inference. This tool removes unnecessary files (training logs, metrics, intermediate checkpoints, etc.) and keeps only what’s needed for inference.

Usage

python -m cryoPARES.scripts.compactify_checkpoint [options]

Required Arguments

--checkpoint_dir CHECKPOINT_DIR Path to checkpoint directory (e.g., /path/to/train_output/version_0)

Optional Arguments

--output_path OUTPUT_PATH Output ZIP file path Default: <checkpoint_dir_name>_compact.zip
--no-reconstructions Exclude reconstruction files to reduce size You’ll need to provide --reference_map during inference if you use this option
--no-compression Store files without compression (faster but larger)
--quiet Suppress progress messages

Examples

Basic usage:

python -m cryoPARES.scripts.compactify_checkpoint \
    --checkpoint_dir /path/to/training/version_0

Output: /path/to/training/version_0_compact.zip

Custom output name:

python -m cryoPARES.scripts.compactify_checkpoint \
    --checkpoint_dir /path/to/training/version_0 \
    --output_path my_model_C1.zip

Exclude reconstructions (smaller size):

python -m cryoPARES.scripts.compactify_checkpoint \
    --checkpoint_dir /path/to/training/version_0 \
    --output_path my_model_compact.zip \
    --no-reconstructions

What’s Included

The compactified checkpoint contains only files required for inference:

For each half-set (half1, half2, or allParticles):

checkpoints/best_script.pt (TorchScript model, preferred)
checkpoints/best.ckpt (fallback if best_script.pt doesn’t exist)
checkpoints/best_directional_normalizer.pt (for confidence scoring)
hparams.yaml (model hyperparameters)
reconstructions/0.mrc (reference map, optional)

At root:

configs_*.yml (training configuration)

Size Reduction

Typical size reduction: 40 GB → 10 GB (75% reduction)

Most of the space savings come from removing:

TensorBoard event logs
Intermediate checkpoints (last.ckpt, etc.)
Training metrics and validation data
Code snapshots

Example output:

Original size:  42.15 GB
Compact size:   9.87 GB
Savings:        32.28 GB (76.6%)

Using Compactified Checkpoints

You can use the ZIP file directly for inference:

cryopares_infer \
    --particles_star_fname /path/to/particles.star \
    --checkpoint_dir my_model_compact.zip \
    --results_dir /path/to/results

CryoPARES automatically detects ZIP files and reads models directly from the archive without extraction.

Utility Scripts

CryoPARES includes several utility scripts for analysis, visualization, and model management.

GMM Histogram Analysis

Script: cryoPARES.scripts.gmm_hists Automatic Usage: Yes - Called during training to estimate confidence thresholds Manual Usage: Yes - Can be run standalone for analysis

Analyzes and compares score distributions between “good” (aligned) and “bad” (misaligned) particle populations using Gaussian Mixture Models (GMMs). Automatically estimates optimal thresholds for filtering low-quality particles.

CLI Usage:

python -m cryoPARES.scripts.gmm_hists \
    --fname_good aligned.star \
    --fname_bad misaligned.star \
    --plot_fname results/distributions.png

FSC Computation

Script: cryoPARES.scripts.computeFsc Automatic Usage: Yes - Called during inference when reconstructing volumes Manual Usage: Yes - Can be run standalone for FSC analysis

Computes Fourier Shell Correlation (FSC) between two 3D volumes to assess resolution.

CLI Usage:

python -m cryoPARES.scripts.computeFsc \
    --fname_vol_half1 half1.mrc \
    --fname_vol_half2 half2.mrc \
    --fname_fsc_out fsc_curve.txt \
    --show_plot

Pose Comparison

Script: cryoPARES.scripts.compare_poses Automatic Usage: No - Manual analysis tool only Manual Usage: Yes

Compares predicted particle orientations between two STAR files to evaluate pose accuracy.

CLI Usage:

python -m cryoPARES.scripts.compare_poses \
    --starfile1 predicted_poses.star \
    --starfile2 ground_truth_poses.star \
    --symmetry C1 \
    --output_plot error_histogram.png

Learning Curve Visualization

Script: cryoPARES.scripts.plot_learning_curve Automatic Usage: No - Manual visualization tool only Manual Usage: Yes

Visualizes training metrics from PyTorch Lightning CSV logs to monitor training progress.

CLI Usage:

python -m cryoPARES.scripts.plot_learning_curve \
    --csv_file /path/to/training/version_0/metrics.csv \
    --skip_steps 100 \
    --log_scale

STAR File Histograms

Script: cryoPARES.scripts.hists_from_starfile Automatic Usage: No - Manual analysis tool only Manual Usage: Yes

Generates histograms of any metadata column(s) from RELION STAR files.

CLI Usage:

python -m cryoPARES.scripts.hists_from_starfile \
    --input particles.star \
    --cols rlnDirectionalZScore rlnDefocusU \
    --output histograms.png

Command-Line Interface

Table of Contents

`cryopares_train`

Usage

Parameters

Configuration Overrides

View All Config Options

Examples

Important Notes

See Also

`cryopares_infer`

Usage

Parameters

Configuration Overrides

Examples

Output Files

See Also

`cryopares_projmatching`

Usage

Parameters

Example

`cryopares_reconstruct`

Usage

Parameters

Example

`compactify_checkpoint`

Usage

Required Arguments

Optional Arguments

Examples

What’s Included

Size Reduction

Using Compactified Checkpoints

Utility Scripts

GMM Histogram Analysis

FSC Computation

Pose Comparison

Learning Curve Visualization

STAR File Histograms

See Also

Command-Line Interface

Table of Contents

cryopares_train

Usage

Parameters

Configuration Overrides

View All Config Options

Examples

Important Notes

See Also

cryopares_infer

Usage

Parameters

Configuration Overrides

Examples

Output Files

See Also

cryopares_projmatching

Usage

Parameters

Example

cryopares_reconstruct

Usage

Parameters

Example

compactify_checkpoint

Usage

Required Arguments

Optional Arguments

Examples

What’s Included

Size Reduction

Using Compactified Checkpoints

Utility Scripts

GMM Histogram Analysis

FSC Computation

Pose Comparison

Learning Curve Visualization

STAR File Histograms

See Also

`cryopares_train`

`cryopares_infer`

`cryopares_projmatching`

`cryopares_reconstruct`

`compactify_checkpoint`