ICDS-Roar-OOD Protein Structure Prediction
Overview
An Open OnDemand Batch Connect app that provides a
web-based interface for running protein structure prediction jobs using AlphaFold 2
and AlphaFold 3 on the ICDS Roar cluster. The app simplifies the process of
submitting and monitoring AlphaFold jobs by providing a user-friendly interface and
automated job management.
This app uses the Batch Connect basic template with Slurm. It executes a two-phase
workflow: a CPU phase for MSA generation and a GPU phase for structure prediction,
with the GPU job submitted as a dependency of the CPU job.
- Upstream project: AlphaFold by DeepMind
- Batch Connect template:
basic
- Scheduler: Slurm
- Container runtime: Singularity
How it looks
| Model selection, partition & working directory (left) |
JSON input & terms of service (right) |
 |
 |
Progress after job submission
Supporting Materials
Conference Materials
- GOOD25 Conference Talk Abstract
- GOOD25 Conference Presentation
News Articles
- Leveraging AlphaFold in Graduate Research
- OSC News: Inaugural GOOD Conference Draws Strong Attendance from 10 Countries
> Presented as a talk at the Global Open On Demand Conference 2025, Harvard University
> Date: March 19, 2025, 4:00 PM – 4:25 PM (25 min)
> Title: AlphaFold accessibility: an optimized open-source OOD app for Protein Structure Prediction
> Speakers: Vinay Saji Mathew [Pennsylvania State University] , William Lai [Cornell], Matt Hansen [Pennsylvania State University]
>
> Track: Application Track [featuring AI OnDemand]
> Location: Tsai Auditorium (CGIS S010)
Features
Multiple Prediction Engines
-
AlphaFold 2:
- Supports AlphaFold v2.3.2 for protein structure prediction
- Handles both monomer and multimer predictions
- Uses full database configuration for maximum accuracy
- Automated MSA generation and template search
-
AlphaFold 3 (New!):
- Latest version of AlphaFold with improved accuracy
- Supports protein-protein, protein-DNA/RNA, and protein-ligand complexes
- Enhanced diffusion-based structure prediction
- Requires acceptance of Google's terms of service
Job Management
- Two-phase execution:
- CPU phase for MSA/templates
- GPU phase for prediction (set as a dependency)
- Real-time job status monitoring
- Detailed progress tracking
- Automatic error handling and recovery
User Interface
- Flexible Input Formats:
- GPU allocation selection
- Working directory customization
- Real-time progress visualization
- Direct access to output files
Output Files
-
AlphaFold 2:
- PDB structure files (ranked by confidence)
- Multiple Sequence Alignment (MSA) files
- Detailed prediction metrics and confidence scores
- Comprehensive log files
-
AlphaFold 3:
- CIF structure files
- Ranking scores for multiple predictions
- Detailed model outputs and metrics
- Complete execution logs
Prerequisites
Open OnDemand
- Slurm scheduler
- Has been tested to work with OOD v3 & v4
Database Setup
Both AlphaFold versions require genetic databases that must be set up before using the app:
- AlphaFold 2: Download using script from AlphaFold 2 repository
- AlphaFold 3: Additional databases required. Setup instructions available here
Singularity Containers
The app uses Singularity containers for execution:
- AlphaFold 2: Download from Sylabs
- AlphaFold 3: Requires official container from Google (subject to terms of use). Weights needed for running AlphaFold 3 have to be requested from Google here
Installation
- Clone this repository into your Open OnDemand apps directory
- Configure paths in
template/alphafold_env.sh
- Ensure all required databases are properly set up
- Verify GPU compute capabilities.
Edit form.yml.erb and update these values for your cluster:
| Attribute |
ICDS Default |
Change to |
cluster |
rc |
Your cluster name |
auto_accounts |
(dynamic) |
GPU account selection for your site |
auto_queues |
(dynamic) |
Queue/partition for your site |
working_directory |
/scratch/ |
Default scratch path on your site |
In before.sh.erb, the app sources alphafold_env.sh to set environment variables
for database paths, container paths, and working directories. You must configure this
file for your site.
Configuration
| Attribute |
Widget |
Description |
Default |
session_type |
select |
Prediction engine (AlphaFold 2 or AlphaFold 3) |
AlphaFold 2 |
auto_accounts |
select |
GPU account for job submission |
(dynamic) |
auto_queues |
select |
Queue/partition for job submission |
(dynamic) |
working_directory |
path_selector |
Output directory (scratch space recommended) |
/scratch/ |
protein_sequence |
text_area |
Input sequence (FASTA for AF2, JSON for AF3) |
(empty) |
agree_terms |
check_box |
Accept Google's Terms of Service (AF3 only) |
unchecked |
bc_email_on_started |
check_box |
Email notification on job start/completion |
unchecked |
Usage
- Access the Open OnDemand dashboard
- Navigate to "Interactive Apps"
- Select "Protein Structure Prediction"
- Choose prediction engine (AlphaFold 2 or 3)
- Fill out the form:
- For AlphaFold 2: Enter protein sequence in FASTA format
- For AlphaFold 3: Provide input in JSON format
- Select GPU allocation
- Choose working directory
- Accept terms of service (required for AlphaFold 3)
- Submit the job
AlphaFold 2 (FASTA)
The app accepts protein sequences in FASTA format.
Example:
>sequence_name
MVKVGVNGFGRIGRLVTRAAFNSGKVDIVAINDPFIDLNYMVYMFQYDSTHGKFHGTVKA
ENGKLVINGNPITIFQERDPSKIKWGDAGAEYVVESTGVFTTMEKAGAHLQGGAKRVIIS
AlphaFold 3 (JSON)
{
"name": "example_complex",
"sequences": [
{
"protein": {
"id": "protein_chain_A",
"sequence": "MVKVGVNG..."
}
}
],
"modelSeeds": [1, 2, 3]
}
Output Files
The app generates the following output structure:
working_directory/
└── run_YYYYMMDD_HHMMSS/
├── input/
│ ├── [structure files] # Predicted structures
│ ├── [prediction data] # Detailed predictions
│ └── msas/ # Multiple sequence alignments
├── logs/ # Job logs
├── CPU-SLURM/ # CPU phase files
└── GPU-SLURM/ # GPU phase files
Monitoring Jobs
The app provides real-time monitoring of:
- MSA generation progress
- Template search status
- Structure prediction progress
- Model relaxation status
Troubleshooting
Common issues and solutions:
-
Job fails in CPU phase:
- Check available disk space
- Verify database paths
- Examine CPU phase logs
-
GPU phase errors:
- Verify GPU allocation
- Check memory requirements
- Review GPU phase logs
- For AlphaFold 3: Ensure GPU compute availability.
Contributing
For bugs or feature requests,
open an issue.
References
License
MIT (see LICENSE file)
Acknowledgements
- AlphaFold by DeepMind Technologies Limited
- Singularity container by prehensilecode
- The research project is generously funded by Cornell University BRC Epigenomics Core Facility (RRID:SCR_021287), Penn State Institute for Computational and Data Sciences (RRID:SCR_025154) , Penn State University Center for Applications of Artificial Intelligence and Machine Learning to Industry Core Facility (AIMI) (RRID:SCR_022867) and supported by a gift to AIMI research from Dell Technologies.
- Computational support was provided by NSF ACCESS to William KM Lai and Gretta Kellogg through BIO230041
For questions or issues, please contact: