The genome-editing Cas9 protein uses multiple amino-acid residues to bind the target DNA. Considering only the residues in proximity to the target DNA as potential sites to optimise Cas9’s activity, the number of combinatorial variants to screen through is too massive for a wet-lab experiment. Here we generate and cross-validate ten in silico and experimental datasets of multi-domain combinatorial mutagenesis libraries for Cas9 engineering, and demonstrate that a machine learning-coupled engineering approach reduces the experimental screening burden by as high as 95% while enriching top-performing variants by ∼7.5-fold in comparison to the null model. Using this approach and followed by structure-guided engineering, we identify the N888R/A889Q variant conferring increased editing activity on the protospacer adjacent motif-relaxed KKH variant of Cas9 nuclease from Staphylococcus aureus (KKH-SaCas9) and its derived base editor in human cells. Our work validates a readily applicable workflow to enable resource-efficient high-throughput engineering of genome editor’s activity. Screening combinatorial mutants is too massive for wet-lab experiment alone. Here the authors present a machine learning-coupled combinatorial mutagenesis approach to vastly reduce experimental burden for engineering Cas9 genome editing enzymes.
Staphylococcus aureus Cas9 (SaCas9) is an excellent candidate for in vivo gene therapy due to its small size allowing packaging into adeno-associated viral vectors for delivery into human cells for therapeutic applications. However, its gene editing activity may be insufficient for specific disease loci. The Cas9 protein contains several parts, including the protospacer-adjacent motif (PAM) interacting domain (PI) and Wedge (WED) to facilitate its interaction with the target DNA duplex. A research team therefore coupled machine learning and high-throughput screening platforms to design a SaCas9 protein with enhanced activity by combining mutations in its PI and WED domains surrounding the DNA duplex carrying a (PAM). PAM is essential for Cas9 to modify the target DNA and the idea was to reduce the PAM constraint for broader genome targeting while securing the protein structure by strengthening the interaction with the PAM-containing DNA duplex via the WED domain. The research team therefore designed new variants of Staphylococcus aureus Cas9 (SaCas9) with improved gene-editing efficiency, i.e. up to 33% improved activity at specific genomic loci The results are now published in Nature Communications and a patent application has been filed based on this work.