Command-Line Interface
This document provides instructions for using the command-line tools included with cryoPARES.
Table of Contents
cryopares_train
- Train a new modelcryopares_infer
- Run inference on new particlescryopares_projmatching
- Align particles via projection matchingcryopares_reconstruct
- Reconstruct 3D volume from aligned particlescompactify_checkpoint
- Package checkpoint for distributionUtility Scripts - Analysis and visualization tools
cryopares_train
Train a CryoPARES model on pre-aligned particle data.
Usage
cryopares_train [OPTIONS]
Parameters
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
str |
Required |
Point group symmetry of the molecule (e.g., C1, D7, I, O, T) |
|
List[str] |
Required |
Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files |
|
str |
Required |
Output directory where model checkpoints, logs, and training artifacts will be saved |
|
Optional[List[str]] |
None |
Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept) |
|
int |
|
Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point |
|
int |
|
Number of particles per batch. Try to make it as large as possible before running out of GPU memory. We advice using batch sizes of at least 32 images |
|
int |
|
Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs |
|
int |
|
Target image size in pixels for neural network input. After rescaling to target sampling rate, images are cropped or padded to this size. We recommend tight box-sizes |
|
float |
|
Target sampling rate in Angstroms/pixel for neural network input. Particle images are first rescaled to this sampling rate before processing |
|
Optional[float] |
None |
Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size |
|
bool |
|
If True (default), trains two separate models on data half-sets for cross-validation. Use –NOT_split_halves to train single model on all data |
|
Optional[str] |
None |
Path to checkpoint directory to resume training from a previous run |
|
Optional[str] |
None |
Path to checkpoint directory to fine-tune a pre-trained model on new dataset |
|
bool |
|
Enable torch.compile for faster training (experimental) |
|
Optional[float] |
None |
Fraction of epoch between validation checks. You generally don’t want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback |
|
Optional[int] |
None |
Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset) |
|
Optional[List[str]] |
None |
Path(s) to reference map(s) for simulated projection warmup before training on real data. The number of maps must match number of particle star files |
|
Optional[List[str]] |
None |
Optional star file(s) with junk-only particles for estimating confidence z-score thresholds |
|
Optional[List[str]] |
None |
Root directory for junk particle image paths (analogous to particles_dir) |
--map_fname_for_simulated_pretraining MAP_FNAME
Reference map(s) for simulated pre-training Must match length of--particles_star_fname
Configuration Overrides
Use --config
as the last argument to override configuration parameters:
--config KEY=VALUE KEY2=VALUE2 ...
Common configuration overrides:
--config \
train.learning_rate=1e-3 \
train.weight_decay=1e-5 \
train.accumulate_grad_batches=16 \
models.image2sphere.lmax=8 \
datamanager.particlesDataset.sampling_rate_angs_for_nnet=2.0 \
datamanager.particlesDataset.image_size_px_for_nnet=128
View All Config Options
python -m cryopares_train --show-config
Examples
Basic training:
cryopares_train \
--symmetry C1 \
--particles_star_fname /path/to/aligned_particles.star \
--train_save_dir /path/to/output \
--n_epochs 20
Training with custom parameters:
cryopares_train \
--symmetry D7 \
--particles_star_fname /path/to/particles.star \
--particles_dir /path/to/particles \
--train_save_dir /path/to/output \
--n_epochs 30 \
--batch_size 64 \
--compile_model \
--config \
train.learning_rate=5e-3 \
models.image2sphere.lmax=10 \
datamanager.particlesDataset.sampling_rate_angs_for_nnet=1.5
Continue training from checkpoint:
cryopares_train \
--continue_checkpoint_dir /path/to/output/version_0 \
--n_epochs 50
Fine-tune on new data:
cryopares_train \
--symmetry C1 \
--particles_star_fname /path/to/new_particles.star \
--train_save_dir /path/to/finetuned_output \
--finetune_checkpoint_dir /path/to/pretrained/version_0 \
--n_epochs 10 \
--config train.learning_rate=1e-4
Important Notes
File descriptor limit: Run
ulimit -n 65536
before training to avoid “too many open files” errorsGPU memory: Reduce
--batch_size
orimage_size_px_for_nnet
if you encounter OOM errorsMonitoring: Use TensorBoard to monitor training:
tensorboard --logdir /path/to/output/version_0
See Also
Training Guide - Comprehensive training guide
Configuration Guide - All configuration parameters
cryopares_infer
Run inference on new particles using a trained model.
Usage
cryopares_infer [OPTIONS]
Parameters
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
str |
Required |
Path to input RELION particles .star file |
|
str |
Required |
Path to training directory (or .zip file) containing half-set models with checkpoints and hyperparameters. By default they are called version_0, version_1, etc. |
|
str |
Required |
Output directory for inference results including predicted poses and optional reconstructions |
|
‘half1’, ‘half2’, ‘allParticles’ |
|
Which particle half-set(s) to process: “half1”, “half2”, or “allParticles” |
|
‘half1’, ‘half2’, ‘allCombinations’, ‘matchingHalf’ |
|
Model half-set selection policy: “half1”, “half2”, “allCombinations”, or “matchingHalf” (uses matching data/model pairs) |
|
Optional[str] |
None |
Root directory for particle image paths. If provided, overrides paths in the .star file |
|
int |
|
Number of particles per batch for inference |
|
Optional[int] |
None |
Number of worker processes. Defaults to number of GPUs if CUDA enabled, otherwise 1 |
|
int |
|
Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs |
|
bool |
|
Enable GPU acceleration for inference. If False, runs on CPU only |
|
int |
|
Maximum CPU threads per worker when CUDA is disabled |
|
bool |
|
Compile model with torch.compile for faster inference (experimental, requires PyTorch 2.0+) |
|
int |
|
Number of top pose predictions to retrieve from neural network before local refinement |
|
int |
|
Number of best matching poses to keep after local refinement |
|
float |
|
Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose |
|
Optional[str] |
None |
Path to reference map (.mrc) for FSC computation during validation |
|
Optional[str] |
None |
Path to reference mask (.mrc) for masked FSC calculation |
|
Optional[float] |
None |
Confidence z-score threshold for filtering particles. Particles with scores below this are discarded as low-confidence |
|
bool |
|
Skip local pose refinement step and use only neural network predictions |
|
bool |
|
Skip 3D reconstruction step and output only predicted poses |
|
Optional[List[int]] |
None |
List of particle indices to process (for debugging or partial processing) |
|
Optional[int] |
None |
Process only the first N particles from dataset (debug feature) |
|
float |
|
Polling interval in seconds for parent loop in distributed processing |
Configuration Overrides
--config \
inference.directional_zscore_thr=2.0 \
inference.top_k_poses_nnet=10 \
inference.skip_localrefinement=False \
inference.skip_reconstruction=False \
projmatching.grid_distance_degs=8.0 \
projmatching.grid_step_degs=2.0
Examples
Basic inference:
cryopares_infer \
--particles_star_fname /path/to/new_particles.star \
--checkpoint_dir /path/to/training/version_0 \
--results_dir /path/to/inference_results
Inference with reference map:
cryopares_infer \
--particles_star_fname /path/to/particles.star \
--checkpoint_dir /path/to/training/version_0 \
--results_dir /path/to/results \
--reference_map /path/to/reference.mrc \
--config \
inference.directional_zscore_thr=2.0 \
projmatching.grid_distance_degs=10.0
Process only half1 with matching model:
cryopares_infer \
--particles_star_fname /path/to/particles.star \
--checkpoint_dir /path/to/training/version_0 \
--results_dir /path/to/results \
--data_halfset half1 \
--model_halfset matchingHalf
Fast inference (skip reconstruction):
cryopares_infer \
--particles_star_fname /path/to/particles.star \
--checkpoint_dir /path/to/training/version_0 \
--results_dir /path/to/results \
--batch_size 2048 \
--config inference.skip_reconstruction=True
Output Files
The inference process creates:
results_dir/particles_aligned.star
- Aligned particles with predicted posesresults_dir/reconstruction.mrc
- 3D reconstruction (if not skipped)results_dir/fsc.txt
- FSC curve (if half-sets used)results_dir/inference.log
- Inference log
See Also
API Reference - Detailed API documentation
Troubleshooting Guide - Common issues
cryopares_projmatching
Align particles to a reference volume using projection matching.
Usage
cryopares_projmatching [OPTIONS]
Parameters
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
str |
Required |
Path to reference 3D volume (.mrc file) for generating projection templates |
|
str |
Required |
Path to input STAR file with particle metadata |
|
str |
Required |
Path for output STAR file with aligned particle poses |
|
Optional[str] |
Required |
Root directory for particle image paths. If provided, overrides paths in the .star file |
|
Optional[float] |
None |
Radius of circular mask in Angstroms applied to particle images |
|
float |
|
Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose |
|
float |
|
Angular step size in degrees for grid search during local refinement |
|
int |
|
Number of top matching poses to save per particle |
|
Optional[float] |
None |
Low-pass filter resolution in Angstroms applied to reference volume before matching |
|
int |
|
Number of parallel worker processes for distributed projection matching |
|
int |
|
Number of CPU workers per PyTorch DataLoader for data loading |
|
int |
|
Number of particles to process simultaneously per job |
|
bool |
|
Enable GPU acceleration. If False, runs on CPU only |
|
bool |
|
Enable verbose logging output |
|
‘highest’, ‘high’, ‘medium’ |
|
PyTorch float32 matrix multiplication precision mode (“highest”, “high”, or “medium”) |
|
Optional[int] |
None |
Specific GPU device ID to use (if multiple GPUs available) |
|
Optional[int] |
None |
Process only the first N particles from dataset (for testing or validation) |
|
bool |
|
Apply CTF correction during projection matching |
|
Optional[‘1’, ‘2’ |
None |
Select half-map subset (1 or 2) for half-map validation |
Example
cryopares_projmatching \
--reference_vol /path/to/your/reference.mrc \
--particles_star_fname /path/to/your/particles.star \
--out_fname /path/to/your/aligned_particles.star \
--grid_distance_degs 10
cryopares_reconstruct
Reconstruct a 3D volume from particles with known poses.
Usage
cryopares_reconstruct [OPTIONS]
Parameters
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
str |
Required |
Path to input STAR file with particle metadata and poses to reconstruct |
|
str |
Required |
Point group symmetry of the volume for reconstruction (e.g., C1, D2, I, O, T) |
|
str |
Required |
Path for output reconstructed 3D volume (.mrc file) |
|
Optional[str] |
None |
Root directory for particle image paths. If provided, overrides paths in the .star file |
|
int |
|
Number of parallel worker processes for distributed reconstruction |
|
int |
|
Number of CPU workers per PyTorch DataLoader for data loading |
|
int |
|
Number of particles to backproject simultaneously per job |
|
bool |
|
Enable GPU acceleration for reconstruction. If False, runs on CPU only |
|
bool |
|
Apply CTF correction during reconstruction |
|
float |
|
Regularization constant for reconstruction (ideally set to 1/SNR). Prevents division by zero and stabilizes reconstruction |
|
Optional[float] |
None |
Minimum value for denominator to prevent numerical instabilities during reconstruction |
|
Optional[int] |
None |
Reconstruct using only first N batches (for testing or quick validation) |
|
Optional[str] |
|
PyTorch float32 matrix multiplication precision mode (“highest”, “high”, or “medium”) |
|
bool |
|
Apply per-particle confidence weighting during backprojection. If True, particles with higher confidence contribute more to reconstruction. It reads the confidence from the metadata label “rlnParticleFigureOfMerit” |
|
Optional[‘1’, ‘2’ |
None |
Select half-map subset (1 or 2) for half-map reconstruction and validation |
Example
cryopares_reconstruct \
--particles_star_fname /path/to/your/particles.star \
--symmetry C1 \
--output_fname /path/to/your/reconstruction.mrc
compactify_checkpoint
Package a trained checkpoint directory into a compact ZIP file for easy distribution and inference. This tool removes unnecessary files (training logs, metrics, intermediate checkpoints, etc.) and keeps only what’s needed for inference.
Usage
python -m cryoPARES.scripts.compactify_checkpoint [options]
Required Arguments
--checkpoint_dir CHECKPOINT_DIR
Path to checkpoint directory (e.g.,/path/to/train_output/version_0
)
Optional Arguments
--output_path OUTPUT_PATH
Output ZIP file path Default:<checkpoint_dir_name>_compact.zip
--no-reconstructions
Exclude reconstruction files to reduce size You’ll need to provide--reference_map
during inference if you use this option--no-compression
Store files without compression (faster but larger)--quiet
Suppress progress messages
Examples
Basic usage:
python -m cryoPARES.scripts.compactify_checkpoint \
--checkpoint_dir /path/to/training/version_0
Output: /path/to/training/version_0_compact.zip
Custom output name:
python -m cryoPARES.scripts.compactify_checkpoint \
--checkpoint_dir /path/to/training/version_0 \
--output_path my_model_C1.zip
Exclude reconstructions (smaller size):
python -m cryoPARES.scripts.compactify_checkpoint \
--checkpoint_dir /path/to/training/version_0 \
--output_path my_model_compact.zip \
--no-reconstructions
What’s Included
The compactified checkpoint contains only files required for inference:
For each half-set (half1, half2, or allParticles):
checkpoints/best_script.pt
(TorchScript model, preferred)checkpoints/best.ckpt
(fallback if best_script.pt doesn’t exist)checkpoints/best_directional_normalizer.pt
(for confidence scoring)hparams.yaml
(model hyperparameters)reconstructions/0.mrc
(reference map, optional)
At root:
configs_*.yml
(training configuration)
Size Reduction
Typical size reduction: 40 GB → 10 GB (75% reduction)
Most of the space savings come from removing:
TensorBoard event logs
Intermediate checkpoints (last.ckpt, etc.)
Training metrics and validation data
Code snapshots
Example output:
Original size: 42.15 GB
Compact size: 9.87 GB
Savings: 32.28 GB (76.6%)
Using Compactified Checkpoints
You can use the ZIP file directly for inference:
cryopares_infer \
--particles_star_fname /path/to/particles.star \
--checkpoint_dir my_model_compact.zip \
--results_dir /path/to/results
CryoPARES automatically detects ZIP files and reads models directly from the archive without extraction.
Utility Scripts
CryoPARES includes several utility scripts for analysis, visualization, and model management.
GMM Histogram Analysis
Script: cryoPARES.scripts.gmm_hists
Automatic Usage: Yes - Called during training to estimate confidence thresholds
Manual Usage: Yes - Can be run standalone for analysis
Analyzes and compares score distributions between “good” (aligned) and “bad” (misaligned) particle populations using Gaussian Mixture Models (GMMs). Automatically estimates optimal thresholds for filtering low-quality particles.
CLI Usage:
python -m cryoPARES.scripts.gmm_hists \
--fname_good aligned.star \
--fname_bad misaligned.star \
--plot_fname results/distributions.png
FSC Computation
Script: cryoPARES.scripts.computeFsc
Automatic Usage: Yes - Called during inference when reconstructing volumes
Manual Usage: Yes - Can be run standalone for FSC analysis
Computes Fourier Shell Correlation (FSC) between two 3D volumes to assess resolution.
CLI Usage:
python -m cryoPARES.scripts.computeFsc \
--fname_vol_half1 half1.mrc \
--fname_vol_half2 half2.mrc \
--fname_fsc_out fsc_curve.txt \
--show_plot
Pose Comparison
Script: cryoPARES.scripts.compare_poses
Automatic Usage: No - Manual analysis tool only
Manual Usage: Yes
Compares predicted particle orientations between two STAR files to evaluate pose accuracy.
CLI Usage:
python -m cryoPARES.scripts.compare_poses \
--starfile1 predicted_poses.star \
--starfile2 ground_truth_poses.star \
--symmetry C1 \
--output_plot error_histogram.png
Learning Curve Visualization
Script: cryoPARES.scripts.plot_learning_curve
Automatic Usage: No - Manual visualization tool only
Manual Usage: Yes
Visualizes training metrics from PyTorch Lightning CSV logs to monitor training progress.
CLI Usage:
python -m cryoPARES.scripts.plot_learning_curve \
--csv_file /path/to/training/version_0/metrics.csv \
--skip_steps 100 \
--log_scale
STAR File Histograms
Script: cryoPARES.scripts.hists_from_starfile
Automatic Usage: No - Manual analysis tool only
Manual Usage: Yes
Generates histograms of any metadata column(s) from RELION STAR files.
CLI Usage:
python -m cryoPARES.scripts.hists_from_starfile \
--input particles.star \
--cols rlnDirectionalZScore rlnDefocusU \
--output histograms.png
See Also
Training Guide - How to train models
Configuration Guide - All configuration parameters
Troubleshooting Guide - Common issues and solutions
API Reference - Programmatic usage