BYU

Office of Research Computing

AlphaFold 3

Overview

AlphaFold 3 is an AI model developed by Google DeepMind and Isomorphic Labs. It generates 3D structure predictions of biological molecules, providing model confidence for the structure predictions. They make the trained model parameters and output generated using those available free of charge for certain non-commercial uses, in accordance with these terms of use and the AlphaFold 3 Model Parameters Prohibited Use Policy.

Legal Summary

Key things to know when using the AlphaFold 3 model parameters and output:

  1. The AlphaFold 3 model parameters and output are only available for non-commercial use by, or on behalf of, non-commercial organizations (i.e., universities, non-profit organizations and research institutes, educational, journalism and government bodies). If you are a researcher affiliated with a non-commercial organization, provided you are not a commercial organisation or acting on behalf of a commercial organisation, this means you can use these for your non-commercial affiliated research.
  2. You must not use nor allow others to use:
    i. AlphaFold 3 model parameters or output in connection with any commercial activities, including research on behalf of commercial organizations; or
    ii. AlphaFold 3 output to train machine learning models or related technology for biomolecular structure prediction similar to AlphaFold 3.
  3. You must not publish or share AlphaFold 3 model parameters, except sharing these within your organization in accordance with these Terms.
  4. You can publish, share and adapt AlphaFold 3 output in accordance with these Terms, including the requirements to provide clear notice of any modifications you make and that ongoing use of AlphaFold 3 output and derivatives are subject to the AlphaFold 3 Output Terms of Use.

Full TERMS OF USE are available.

IMPORTANT:
Capstone and other projects sponsored by commercial entities are not allowed to use AlphaFold 3 software or the model parameters.

How to Join

To run AlphaFold 3 on the BYU ORC System, you must obtain permission for the model parameters directly from Google by completing this form. It is best to fill out this form as early as possible to account for delays on Google's end.

You will receive two e-mails. The first is acknowledgement of receipt of the request form. The second, in 2-3 business days, is the approval. If it takes more than 2-3 business days, please email alphafold@google.com and let them know that you are still waiting on the approval. Unless Google/DeepMind changes the text, the expected subject line is "AlphaFold 3 | Request to access model parameters". Please forward the approval email with your name and netid to rcsupport@byu.edu. We will then grant you access to use Alphafold 3. You won't need to download the model weights/parameters, they are part of the ORC installation.

How to Run

Type the following commands in a terminal ssh'd to the ORC system:

  1. Make AlphaFold 3 accessible: module load alphafold3
  2. Initiate the AlphaFold pipeline: alphafold3_pipeline.sh <json_input_file> <output_directory>

The alphafold3_pipeline.sh script is intended to be run from the Linux command line and starts 2 slurm jobs, one for each stage of the pipeline. The pipeline script normally should not be run inside a slurm job script because it causes an extra, unnecessary job to be run. Slurm parameters are pre-configured, so there is no need to specify wall time, CPUs, GPUs, or memory. If you find a circumstance that warrants an adjustment, please contact ORC support.

Performance Insight

In order to maximize GPU availability, AlphaFold 3 runs are performed as a 2-stage pipeline. The first stage is called the data pipeline, and is a genetic and template search of 9 different databases. The first stage is CPU and I/O intensive, does not use a GPU, and takes roughly 95% of a full run's compute time. The second stage uses the lookup information to run an inference model on a single GPU. Splitting an AlphaFold 3 run into 2 stages allows a GPU to be used for other tasks instead of sitting idle waiting for the database lookup phase to complete.

The AlphaFold 3 documentation gives a hint about GPUs and the number of tokens that can be handled. The maximum token count is dependent on the amount of GPU memory. All the GPUs noted below are manufactured by NVidia:

  • A100 (80 GB): 5,120 tokens (from AlphaFold 3 documentation)
  • H100 (80 GB): 5,120 tokens
  • H200 (141 GB): 9,024 tokens (projected, assumes the token handling curve is a straight line)

Currently a single NVidia H200 is used in stage 2 of the pipeline. H200 significantly outperforms A100. H200 is a fair amount faster than H100 due to a 1.4 X memory speed advantage.

Additional Notes

  1. The alphafold3_pipeline.sh script ensures the .json input file syntax is valid. If the .json input file is malformed, an error is reported and no slurm jobs are submitted.
  2. If you wish to validate a .json input file yourself, the jq utility is available without loading a module.
    • The "jq empty" command only produces output when there is an error in the .json file.
    • In other words, no output from "jq empty" means the .json file is valid.
    • Sample command: jq empty <my_cool.json>

Fasta to Json Conversion

The f2j.py utility is available for converting a .fasta file to an alphafold 3 .json input file. The following features may be of interest:

  • 2 of 7 known fasta header types are recognized (other headers could be added if needed):
    1. Plain vanilla fasta
    2. RCSB PDB fasta
  • Supports protein, RNA, and DNA chains/sequences.
  • Handles multi-sequence fasta files.

To use the utility, type the following command in an ORC terminal (alphafold3 module should be loaded):

f2j.py <input_file.fasta> <output_file.json> [--help]

How to View the Output

The ChimeraX viewer is available on the ORC system. Open a browser tab and visit the Open OnDemand web-page (ood.rc.byu.edu). Then click the ChimeraX tile and go through the startup process (can take 2-3 minutes). Once ChimeraX is up and running, browse to your alphafold results and open the .cif file.

You could also copy AlphaFold 3 results to your laptop and run the viewer of your choice.

Citation

Any publication that discloses findings arising from using AlphaFold 3 source code, the model parameters or outputs produced by those should cite:

@article{Abramson2024,
author = {Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J. and Bambrick, Joshua and Bodenstein, Sebastian W. and Evans, David A. and Hung, Chia-Chun and ONeill, Michael and Reiman, David and Tunyasuvunakool, Kathryn and Wu, Zachary and emgulytė, Akvilė and Arvaniti, Eirini and Beattie, Charles and Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and Congreve, Miles and Cowen-Rivers, Alexander I. and Cowie, Andrew and Figurnov, Michael and Fuchs, Fabian B. and Gladman, Hannah and Jain, Rishub and Khan, Yousuf A. and Low, Caroline M. R. and Perlin, Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine and Yakneen, Sergei and Zhong, Ellen D. and Zielinski, Michal and dek, Augustin and Bapst, Victor and Kohli, Pushmeet and Jaderberg, Max and Hassabis, Demis and Jumper, John M.},
journal = {Nature},
title = {Accurate structure prediction of biomolecular interactions with AlphaFold 3},
year = {2024},
volume = {630},
number = {8016},
pages = {493-500},
doi = {10.1038/s41586-024-07487-w}
}