ASTRO-FOLD: Protein Structure Prediction from First Principles
ASTRO-FOLD, a first principles method for protein structure prediction, is based on an overall deterministic global optimization framework (αBB) coupled with a stochastic algorithm, conformational space annealing (CSA).
The first stage of Astro-Fold predicts helical and beta-sheet structures. A consensus method for secondary structure prediction has been developed. The prediction of beta-sheet and disulfide bridge topology is based on an ILP (Integer Linear Programming) model in which the hydrophobic contact energy between strands is maximized to derive the optimal topology.
The second stage predicts angle and distance restraints through residue contact prediction and loop prediction. The residue contact prediction is based on a novel ILP model that predicts contacts by minimizing the total statistical energy of a protein subject to a set of physically observed constraints. Restraints are determined for the loop residues connecting helical and strand regions through an iterative formulation involving dihedral angle sampling, constrained nonlinear optimization of ECEPP/3 force field, and a novel clustering approach.
Based on the constraints predicted from the previous stages, a hybrid algorithm that combines the deterministic αBB global optimization algorithm, stochastic global optimization (CSA), and molecular dynamics in torsion-angle space is implemented to solve the constrained non-convex global optimization problem. The features of αBB provide valid lower bounds and a theoretical guarantee of convergence to the global optimum while the features of CSA provide upper bounds through extensive sampling of the energy landscape.
At this stage, ICON, a novel iterative traveling-salesman problem-based clustering method, is used to identify the near-native structures of the protein. The selected structures are subject to chemical-shift-based structure prediction process. The chemical shifts are predicted through ShiftX and used by locally installed CS23D to re-predict the structures. These structures are used to generate tighter angle and distance constraints for a second iteration of tertiary structure prediction to generate the final set of predicted structures.