# Data sets (.txt files) for each allele are provided as peptide sequence followed by the IC50 of the # peptide-MHC complex (in nM) and the concentration of reporter peptide (in nM if reported or "NA" if # not reported), all in tab-delimited columns. Two data sets are provided per allele: as downloaded # from AntiJen and after homologous sequences are filtered out by UniqueProt. # The sizes of the data sets (before/after filtering by UniqueProt) are as follows: # DRB1*0101 (464/303) # DRB1*0401 (606/414) # DRB1*0404 ( 81/ 54) # DRB1*0405 (116/102) # DRB1*1501 (343/213) # Two postscript files for each allele are also provided: the length distribution of peptides binding # each allele and the local regression fit made by R. # ProPred predictions from which the AROC scores in Table 3 were calculated are also included as # text files. The left-hand column shows experimentally derived pIC50 values, while the right-hand # column shows predictions made by each algorithm. For each allele and algorithm, a separate file is # provided. The file "DRB1_0101_ProPred_combiRule.txt" gives the predictions made on the sequences # in AntiJen for DRB1*0101 (after filtering by UniqueProt) when the combination rule of Doytchinova and # Flower (2003) was used on the ProPred 9mer predictions. # Finally alternative versions of Tables 2 and 3 in the manuscript are provided. These versions use the # Pearson correlation coefficient to score algorithm performance (as opposed to the area under receiver # operating characteristic curve, or AROC, which is used in the manuscript).