Various groups around the world are carrying out SNP and variant analyses on high-volume numbers of human genomes.
A few notable such projects are
HapMap, 1000 Genomes,
and the Celera Natural selection on protein-coding genes in the human genome
(Pubmed 16237444) study.
As a general rule, SNP/variation projects consider an observed SNP/variant to be "called" or identified as a SNP/variant
only if seen in more than 1% of the population sampled.
The NIH Entrez SNP database (dbSNP)
receives the "called SNPs" and variants resulting from projects such as these,
in addition to submissions by genetic variation researchers working on a smaller scale.
It seems to contain both SNPs/variants occuring in more than 1% of the population,
and rare SNPs/variants (occuring in less than 1% of the population)
that have been identifed by individual researchers as being linked to human disease.
The purpose of the present web page is to make it easy to find out,
for a given SNP/variant/mutation in the SDH gene in the human genome,
whether it is a known variation already logged in dbSNP,
by converting the residue, exon or intron location to the chromosome location,
and interrogating dbSNP with the chromosome location.
As at June 8 2010, both this present web page and the dbSNP refer to chromosome locations
as the location on Genome Reference Consortium Human Build 37 (GRCh37).
As at June 8 2010, high-volume human variation projects have examined the following quantities of individual genomes :
- The 1000 Genomes project plans to carry out
next generation DNA sequencing of over 2000 individual human genomes from populations around the world.
When this DNA data becomes available, it should be available and could be used to identify rare SNPs and variants
even though the 1000 Genomes project itself does not plan to identify rare SNPs and variants itself.
However, as at June 8 2010, the 1000 Genomes project has sequenced 180 individuals with low coverage (pilot 1 project)
and has sequenced 6 individuals with deep coverage (pilot 2 project).
A further 1000 genes were sequenced in 900 individuals with deep coverage (pilot 3 project),
but SDH genes were not in the pilot 3 list of regions covered.
as explained on their About page.
The 1000 Genomes plans to be sequencing 2000 genomes with low coverage during 2010.
Samples are sent to Coriell to become cell lines. Called SNPs and variants are deposited in dbSNP.
- The HapMap project has carried out extensive microarray genotyping studies.
Their Populations Sampled page
explains that samples come from 270 individuals from various populations around the world.
Samples then go to Coriell to become cell lines.
The What is the HapMap? page
explains that SNPs are chosen to efficiently track haplotypes in populations.
Entering SDH genes (such as SDHB) in the HapMap browsers shows quite a few SNPs in SDH genes,
and they seem to be deposited in dbSNP. The HapMap project uses microarrays, not DNA sequencing.
Thus, the design of the HapMap project is such that rare SNPs and variants will not be identified by the project.
- The Celera study
(Bustamante et al. (2005) "Natural selection on protein-coding genes in the human genome" Nature 437, 1153-1157. doi:10.1038/nature04240)
carried out specific exon sequencing of the majority of human genes in 39 individuals
(20 European Americans and 19 African Americans).
Table II in the Supplementary Methods show that cell lines from Coriell Cell Repositories were used for all 39 human samples.
Supplementary Data 2 shows that genes SDHB and SDHC made it into the final list of genes examined,
and SDHB appears on page 1155 of the paper in the list of genes found to be under negative selection.
- Not all the variants identified in Uniprot
appear in dbSNP. These Uniprot variants seem to be well curated from published studies,
whose researchers presumably have not uploaded the variations to dbSNP
and dbSNP presumably does not carry out manual curation of published studies to populate the dbSNP.
- Presumably not all the variants identified in the Leiden Open Variation Database (LOVD)
will ever appear in dbSNP, because researchers are invited to submit to LOVD and dbSNP separately,
and there does not seem to be an update mechanism between these two resources.
- The Human Polymorphism Study Center has produced a
HGDP-CEPH Diversity Panel Database
from their Human Genome Diversity Cell Line Panel,
which consists of 1063 lymphoblastoid cell lines (LCLs) representing some 1050 individuals sampled from 51 populations throughout the world.
Genetic marker genotyping of genomes in this DNA database has been carried out using microarrays
(Illumina microarrays was used for data as of June 8 2010, and Affymetrix microarrays were used for data available as as August 2010),
and has produced some genetic markers that appears as SNPs in dbSNP.
Whole genome resequencing is planned for the future.
- The Coriell Institute for Medical Research
possesses hundreds of human cell lines from dozens of populations,
as can be seen in their human populations catalog.
The SNP Search
page states that although Coriell does not call SNPs, numerous investigators have called SNPs in the Coriell cell lines
and have deposited those SNPs in dbSNP.
Coriell also state on their genotype catalog page
that they are undertaking genetic marker genotyping of the genomes of selected cell lines, using the Affymetrix SNP 6.0 GeneChip.
Please note that this web page considers locations to start from 1, not 0.
Most data fields in dbSNP consider a location to start from 1,
with the exception of aa_position which seems to start from 0.
The genomic ranges appearing at Entrez Gene start from 0.
As at June 8 2010, it has been noticed that a few SDHA genes in dbSNP (such as rs1126687)
seem to be logged in dbSNP under an alternative genome (HuRef)
and cannot be found with the corresponding GRCh37 chromosome location.
A query has been sent to NIH data administrators about this.
Copyright © 2010 Emma Rath. All rights reserved.
Soon this program will be released under an open source software license such as GNU General Public License or
Creative Commons license for Free Software Foundation's GNU General Public License at creativecommons.org