Supplementary Data for

Stone and Sidow, 2005. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Research 15:978-986

The MAPP Java program is available off the Downloads page of the SidowLab Home Page.

Each protein considered in the manuscript names a folder that contains its data and analysis. Files common to most folders are tabulated below.

Protein_Alignment.fa The alignment in fasta format
Protein_Weights.txt The sequence-specific weights from Figure 1, Step 2
Protein_Data.txt The raw data for the protein that MAPP analyzed
Protein_Scores.txt Table reporting MAPP score of each substitution at every position
Protein_Predictions.txt MAPP scores and coarse interpretation of the raw data
Protein_Performance.txt Summary of the analyses included in Table 1

The _Data.txt files are tab delimited and appear in one of two styles according to whether each row corresponds to an individual mutation or to one position in the protein. In the former case, there are three entries per row: the first entry names the position of the mutation, the second gives the lexicographic index of the mutation (A = 1, C = 3, etc.), and the third codes for the reported phenotype (see _Predictions.txt for decoding). In the latter case, each row has 25 entries, one for each of the letters A through Y. The first entry of a row records the coded phenotype of an (A)lanine substitution at the position that the row defines (if any, see _Data.txt). See the manuscript for appropriate references to each dataset.

Datasets:

Beta Hemoglobin

G6PD

HIV Protease

HIV Reverse Transcriptase

LacI

p53

Pyruvate Kinase

T4 Lysozyme