brownbag-science

Terminology

The # of base pairs is frequently used as synonym for the # of nucleotides in a single-strand sequence ● This sequence has 5 nucleotides: ACGGT ● We can also say that it has 5 base pairs ● kilo, giga, etc for sequence lengths ● kb → kilo-bases ● Mb → Mega-bases ● Gb → Giga-bases

The full genetic information of an organism ● Contains all chromosomes ● Comprises the coding & non-coding sequence data of the organism ● Coding sequence data → part of the genome that encodes proteins ● Non-coding (in earlier days: junk) DNA → part of the genome that does not encode proteins but still has a function – The function of non-coding DNA is only partially known – Non-coding DNA regulates protein processes

Single-Strand DNA:

Coding versus non-coding DNA

depth of coverage: a measure of the number of times that a specific genomic site is sequenced during a sequencing run. The more coverage the better

nucleotide:

base: a nucleotide

bp: base pairs (same as a base)

sequence lengths:

read length: the number of base pairs that a sequencer “reads”. A read length could by anywhere from 50 bp to > 1000s

read: a single sequence produced from a sequencer. Think: a sequencing machine read a molecule and this is what it thinks it is.

library: a collection of DNA fragments that have been prepared for sequencing. This is generally talking about individual samples.

run: an entire sequencing reaction from start to finish.

NGS: next-generation sequencing. High-throughput (DNA) sequencing - technologies developed after ~ 2000.

hg19: The UCSC assembly of the human genome, version 19; equivalent to GRCh37.

hg38: The UCSC assembly of the human genome, version 38; equivalent GRCh38.

GRCh37: Genome Reference Consortium Human Build 37; matches with UCSC assembly hg19. Released in February 2009.

GRCh38: Genome Reference Consortium Human Build 38; matches with UCSC assembly hg38. Released in December 2013.