D AMPs Predictorlocal alignments [17]. This kind of approach is commonly applied to cysteine-stabilized antimicrobial peptides, since the classes have a typical cysteine pattern. Indeed, the majority of plant AMPs are cysteine rich [27,28], with only few examples of plant disulphidefree AMPs [29?3]. If compared to the peptide purification process, the database search has the advantages of fast sequence identification and low costs. Therefore, this kind of approach can be applied in a more general manner, searching for any small cysteine-rich Lecirelin peptides in plant genomes [27] or in a more specific manner, by searching for a specific AMP class against the whole database [4,34]. However, since cysteine-stabilized AMPs are mostly multifunctional peptides, how is it possible to identify the sequences with antimicrobial activity? The answer will in fact be obtained only through in vitro and/or in vivo tests; however, the prediction methods can provide an indication of activity, improving the search methods. Bearing this in mind, here the CS-AMPPred (Cysteine-Stabilized Antimicrobial Peptides Predictor) is presented, as an updated version of the support vector machine (SVM) model proposed by our group [20] for antimicrobial activity prediction in cysteine-stabilized peptides.retrieved from the search by the term “NOT antimicrobial” were selected and then the sequences ranging from 16 to 90 residues were chosen. Therefore, redundant sequences were removed with a cutoff of 40 through CDHIT [36], with 1749 sequences remaining; from these, 385 were Naringin biological activity randomly selected to compose the NS. The blind data set (BS1) was composed of 75 sequences (approximately 20 ) randomly selected 18297096 from each set, PS and NS, totaling 150 sequences, while the training data set (TS) was composed of the remaining sequences, totaling 620 sequences (310 from each set). Similar negative data sets were used by Thomas et al. [23], Torrent et al. [24] and Fernandes et al. [25].Sequence Descriptors and Statistical AnalysisPreliminarily, nine structural/physicochemical properties were chosen: (i) average charge, (ii) average hydrophobicity, (iii) hydrophobic moment, (iv) amphipathicity, (v) a-helix propensity, (vi) flexibility and indexes of (vii) a-helix, (viii) b-sheet and (ix) loop formation. From our previous work [20], only three properties were considered (average hydrophobicity, hydrophobic moment and amphipathicity), being the average charge chosen instead the total charge. The secondary structure indexes were calculated as the average of weighted amino acid frequencies of Levitt (1977) [37]; flexibility was calculated as the average of amino acid flexibility, through the scale form Bhaskaran Ponnuswamy (1988) [38]; the a-helix propensity was measured as the average energy to be applied in each amino acid for a-helix formation [39]; the amphipathicity was calculated as the ratio between hydrophobic and charged residues [3]; average hydrophobicity and hydrophobic moment were calculated using 1379592 Eisenberg’s scale [40]; the hydrophobic moment was given by Eisenberg’s equation [40]; and the average charge was calculated as the net charge at physiological pH normalized by the number of residues. The final ensemble of sequence descriptors was defined through a principal component analysis (PCA). The nine descriptors were measured for the positive data set, and then the PCA was applied,Materials and Methods Data SetsThe positive data set (PS) was constructed by selecting sequences w.D AMPs Predictorlocal alignments [17]. This kind of approach is commonly applied to cysteine-stabilized antimicrobial peptides, since the classes have a typical cysteine pattern. Indeed, the majority of plant AMPs are cysteine rich [27,28], with only few examples of plant disulphidefree AMPs [29?3]. If compared to the peptide purification process, the database search has the advantages of fast sequence identification and low costs. Therefore, this kind of approach can be applied in a more general manner, searching for any small cysteine-rich peptides in plant genomes [27] or in a more specific manner, by searching for a specific AMP class against the whole database [4,34]. However, since cysteine-stabilized AMPs are mostly multifunctional peptides, how is it possible to identify the sequences with antimicrobial activity? The answer will in fact be obtained only through in vitro and/or in vivo tests; however, the prediction methods can provide an indication of activity, improving the search methods. Bearing this in mind, here the CS-AMPPred (Cysteine-Stabilized Antimicrobial Peptides Predictor) is presented, as an updated version of the support vector machine (SVM) model proposed by our group [20] for antimicrobial activity prediction in cysteine-stabilized peptides.retrieved from the search by the term “NOT antimicrobial” were selected and then the sequences ranging from 16 to 90 residues were chosen. Therefore, redundant sequences were removed with a cutoff of 40 through CDHIT [36], with 1749 sequences remaining; from these, 385 were randomly selected to compose the NS. The blind data set (BS1) was composed of 75 sequences (approximately 20 ) randomly selected 18297096 from each set, PS and NS, totaling 150 sequences, while the training data set (TS) was composed of the remaining sequences, totaling 620 sequences (310 from each set). Similar negative data sets were used by Thomas et al. [23], Torrent et al. [24] and Fernandes et al. [25].Sequence Descriptors and Statistical AnalysisPreliminarily, nine structural/physicochemical properties were chosen: (i) average charge, (ii) average hydrophobicity, (iii) hydrophobic moment, (iv) amphipathicity, (v) a-helix propensity, (vi) flexibility and indexes of (vii) a-helix, (viii) b-sheet and (ix) loop formation. From our previous work [20], only three properties were considered (average hydrophobicity, hydrophobic moment and amphipathicity), being the average charge chosen instead the total charge. The secondary structure indexes were calculated as the average of weighted amino acid frequencies of Levitt (1977) [37]; flexibility was calculated as the average of amino acid flexibility, through the scale form Bhaskaran Ponnuswamy (1988) [38]; the a-helix propensity was measured as the average energy to be applied in each amino acid for a-helix formation [39]; the amphipathicity was calculated as the ratio between hydrophobic and charged residues [3]; average hydrophobicity and hydrophobic moment were calculated using 1379592 Eisenberg’s scale [40]; the hydrophobic moment was given by Eisenberg’s equation [40]; and the average charge was calculated as the net charge at physiological pH normalized by the number of residues. The final ensemble of sequence descriptors was defined through a principal component analysis (PCA). The nine descriptors were measured for the positive data set, and then the PCA was applied,Materials and Methods Data SetsThe positive data set (PS) was constructed by selecting sequences w.