AlphaFold 3
Overview
AlphaFold 3 is an AI model developed by Google DeepMind and Isomorphic Labs. It generates 3D structure predictions of biological molecules, providing model confidence for the structure predictions. They make the trained model parameters and output generated using those available free of charge for certain non-commercial uses, in accordance with these terms of use and the AlphaFold 3 Model Parameters Prohibited Use Policy.
Legal Summary
Key things to know when using the AlphaFold 3 model parameters and output:
- The AlphaFold 3 model parameters and output are only available for non-commercial use by, or on behalf of, non-commercial organizations (i.e., universities, non-profit organizations and research institutes, educational, journalism and government bodies). If you are a researcher affiliated with a non-commercial organization, provided you are not a commercial organisation or acting on behalf of a commercial organisation, this means you can use these for your non-commercial affiliated research.
- You must not use nor allow others to use:
i. AlphaFold 3 model parameters or output in connection with any commercial activities, including research on behalf of commercial organizations; or
ii. AlphaFold 3 output to train machine learning models or related technology for biomolecular structure prediction similar to AlphaFold 3. - You must not publish or share AlphaFold 3 model parameters, except sharing these within your organization in accordance with these Terms.
- You can publish, share and adapt AlphaFold 3 output in accordance with these Terms, including the requirements to provide clear notice of any modifications you make and that ongoing use of AlphaFold 3 output and derivatives are subject to the AlphaFold 3 Output Terms of Use.
Full TERMS OF USE are available.
IMPORTANT:
Capstone and other projects sponsored by commercial entities are not allowed to use AlphaFold 3 software or the model parameters.
How to Join
To run AlphaFold 3 on the BYU ORC System, you must obtain permission for the model parameters directly from Google by completing this form. It is best to fill out this form as early as possible to account for delays on Google's end.
You will receive two e-mails. The first is acknowledgement of receipt of the request form. The second, in 2-3 business days, is the approval. If it takes more than 2-3 business days, please email alphafold@google.com and let them know that you are still waiting on the approval. Unless Google/DeepMind changes the text, the expected subject line is "AlphaFold 3 | Request to access model parameters". Please forward the approval email with your name and netid to rcsupport@byu.edu. We will then grant you access to use Alphafold 3. You won't need to download the model weights/parameters, they are part of the ORC installation.
How to Run
Type the following commands in a terminal ssh'd to the ORC system:
- Make AlphaFold 3 accessible: module load alphafold3
- Initiate the AlphaFold pipeline: alphafold3_pipeline.sh <json_input_file> <output_directory>
The alphafold3_pipeline.sh script is intended to be run from the Linux command line and starts 2 slurm jobs, one for each stage of the pipeline. The pipeline script normally should not be run inside a slurm job script because it causes an extra, unnecessary job to be run. Slurm parameters are pre-configured, so there is no need to specify wall time, CPUs, GPUs, or memory. If you find a circumstance that warrants an adjustment, please contact ORC support.
Performance Insight
In order to maximize GPU availability, AlphaFold 3 runs are performed as a 2-stage pipeline. The first stage is called the data pipeline, and is a genetic and template search of 9 different databases. The first stage is CPU and I/O intensive, does not use a GPU, and takes roughly 95% of a full run's compute time. The second stage uses the lookup information to run an inference model on a single GPU. Splitting an AlphaFold 3 run into 2 stages allows a GPU to be used for other tasks instead of sitting idle waiting for the database lookup phase to complete.
The AlphaFold 3 documentation gives a hint about GPUs and the number of tokens that can be handled. The maximum token count is dependent on the amount of GPU memory. All the GPUs noted below are manufactured by NVidia:
- A100 (80 GB): 5,120 tokens (from AlphaFold 3 documentation)
- H100 (80 GB): 5,120 tokens
- H200 (141 GB): 9,024 tokens (projected, assumes the token handling curve is a straight line)
Currently a single NVidia H200 is used in stage 2 of the pipeline. H200 significantly outperforms A100. H200 is a fair amount faster than H100 due to a 1.4 X memory speed advantage.
Additional Notes
- The alphafold3_pipeline.sh script ensures the .json input file syntax is valid. If the .json input file is malformed, an error is reported and no slurm jobs are submitted.
- If you wish to validate a .json input file yourself, the jq utility is available without loading a module.
- The "jq empty" command only produces output when there is an error in the .json file.
- In other words, no output from "jq empty" means the .json file is valid.
- Sample command: jq empty <my_cool.json>
- The "jq empty" command only produces output when there is an error in the .json file.
Fasta to Json Conversion
The f2j.py utility is available for converting a .fasta file to an alphafold 3 .json input file. The following features may be of interest:
- 2 of 7 known fasta header types are recognized (other headers could be added if needed):
- Plain vanilla fasta
- RCSB PDB fasta
- Plain vanilla fasta
- Supports protein, RNA, and DNA chains/sequences.
- Handles multi-sequence fasta files.
To use the utility, type the following command in an ORC terminal (alphafold3 module should be loaded):
How to View the Output
The ChimeraX viewer is available on the ORC system. Open a browser tab and visit the Open OnDemand web-page (ood.rc.byu.edu). Then click the ChimeraX tile and go through the startup process (can take 2-3 minutes). Once ChimeraX is up and running, browse to your alphafold results and open the .cif file.
You could also copy AlphaFold 3 results to your laptop and run the viewer of your choice.
Citation
Any publication that discloses findings arising from using AlphaFold 3 source code, the model parameters or outputs produced by those should cite:
journal = {Nature},
title = {Accurate structure prediction of biomolecular interactions with AlphaFold 3},
year = {2024},
volume = {630},
number = {8016},
pages = {493-500},
doi = {10.1038/s41586-024-07487-w}
Last changed on Tue Oct 7 09:39:28 2025