SEQUENCE DIVERSITY ANALYSIS USING DATAMONKEY

The diversity of He185/333 sequences was determined from their alignments using the HyPhy suite of algorithms that were accessed via Datamonkey. Unique, full-length He185/333 cDNA sequences were processed to remove 5’ and 3’ untranslated regions (UTRs). As there were 112 He185/333 sequences in our dataset, we customised our analytical approach to circumvent data processing restrictions on Datamonkey which will only process a maximum of 100 sequences at a time. We developed a custom script to randomly select 100 He185/333 sequences from our dataset for the analysis. These sequences were aligned using ClustalW and uploaded to Datamonkey for diversity analysis. Each alignment was subjected to automatic nucleotide substitution model detection, generation of NJ trees and then SLAC (Single Likelihood Ancestor Counting), FEL (Fixed Effects Likelihood), IFEL (Internal Fixed Effects Likelihood) analyses. The diversity scores were considered to be significant at a confidence interval of p ≤ 0.1. The final diversity score for He185/333 sequences was the consensus of the data output from all three analytical algorithms (SLAC, FEL and IFEL). This was repeated a further nine times (i.e. a total of ten sets of sequences, each containing 100 sequences, were analysed) and a consensus diversity score was generated.

Sham Nair 2014