1. Who is the target audience of this database and what can they expect to find in TcSNP?
TcSNP is meant to be used by anyone with an interest in Trypanosoma cruzi and/or Chagas Diseases in the areas of genetic variation, evolution, population genetics, ecoepidemiology. As a resource it aims to collect and integrate information on genetic polymorphisms between different stocks, strains and clones of T. cruzi. Also, it provides the ability to query this data in biological meaningful ways, for example look for genetic markers that would allow typing of a strain or clone against other strains or clones.
2. How do I cite TcSNP?
TcSNP: a database of genetic variation in Trypanosoma cruzi. Ackermann AA, Carmona SJ and Agüero F. Nucleic Acids Research: advance access published online on October 30, 2008. DOI: 10.1093/nar/gkn874

Alignments and SNPs.

1. My sequence of interest is not included in any alignment.
This may happen for a number of reasons. Many coding sequences in the genome of the CL-Brener clone belong to haploid regions of the genome. If there are no other sequences from public databases mapped to this region (ESTs, mRNAs, etc.) then this sequence appears in TcSNP as a singleton because it can't be aligned to any other sequence. In other cases, a single gene from a large gene family may be left as a singleton because it appears to be more divergent in sequence than the rest of the members of the family, and does not fit perfectly into another alignment. Yet another possibility is that your sequence of interest has a significant number of bases that match an entry in our database of repetitive sequences. If this is the case, then your sequence has been masked (repetitive bases have been replaced by N) before computing alignments. This might explain why it has not been aligned to other sequences.
2. Alignment X has many SNPs but they are not marked as synonymous or nonsynonymous.
In order to analyze how a particular SNP affects the encoded protein sequence, we have to analyze the SNP in the context of the corresponding translation reading frame. Sometimes, however, this analysis has not been done. This may be the case if the alignment has many coding sequences that are not perfectly aligned and that may even differ in the translation reading frame. In these cases we have chosen to avoid making this analysis. These alignments most likely contain paralogs and not true allelic variants. We have manually inspected and curated many of these alignments, but many others still remain uncurated at this time. Yet another possibility is that the SNP is a noncoding SNP and is located outside any mapped coding sequence.
3. When I search for my gene of interest I get more than one alignment.
When you search for your gene of interest using keywords or BLAST, you may get more than one alignment as a result. This might be the case if your gene of interest is not a single copy gene and has other divergent paralogs. If you are sure that this is not the case, and all the sequences in TcSNP are real allelic variants, then all sequences should have been placed into the same alignment. Please contact us with the details and we will merge the aligments as soon as possible.
4. My gene of interest does not have a dN/dS value but other genes do have it: why is that?
To estimate of the ratio of synonymous SNPs per number of synonymous sites (the same is true for nonsynonymous SNPs/sites) we require that the alignment meet a number of criteria: all coding sequences must be of similar length, and the number of indels must be a multiple of 3. Not all alignments in TcSNP meet this stringent criteria. In most cases, failure to meet this criteria is due to different annotation of the translation start codon in allelic variants. But sometimes this may be caused by the inclusion of pseudogenes (genes with premature stop codon mutations) or paralogs into these alignments. In these cases, even though the SNPs might be found to be synonymous or nonsynonymous at the aminoacid level, the number of synonymous (or nonsynonymous) sites will be different for each gene, and thus an estimation of dN or dS makes less sense.
5. I see that many polymorphic sites are marked as heterozygous. What is the base for this call?
In the case of the genome of the CL Brener clone, we have marked as heterozygous the sites that are polymorphic between allelic variants of CL-Brener.
6. In many alignments sequences extend out of the START and/or STOP codons of the reference genes. Why is that?
To build the alignments we have extracted all coding sequences (annotated CDS features) in the genome of CL-Brener and aligned them to other sequences obtained from public databases. These sequences sometimes include UTRs and/or intergenic regions, which may extend out of the START and/or STOP codons of the reference genes.
7. SNP X is marked as being located in a region of high SNP density. What does that mean?
This might mean that the region is either truly polymorphic or that the alignment contains at this location one or more single-pass, unedited sequences (typically ESTs), which contribute with many sequencing errors. Our scoring scheme works fine to discriminate these sequencing errors from true polymorphisms, so you may find that many of these SNPs have a low score. Yet for some applications, it is desirable to avoid these regions altogether, even if SNPs are not sequencing errors. We have identified and marked these regions so that you can filter them in your queries. To identify such high SNP density regions, we have scanned the alignments looking for 3 or more potential SNPs in a window of 10bp. SNPs that fall within these windows, were labeled as being located in a region of high density of SNPs.