Useful terminology when talking about natural selection and genomic data

Adaptation: the evolution of heritable traits that increase fitness
Alignment: a comparison between two or more sequences by matching identical and/or similar residues and assigning a score to the match
Ancestral: an allele that was pre-existing in a population and from which a derived allele may arise
Annotation: adding pertinent information such as gene coded for, amino acid sequence, or other commentary to the database entry of raw sequence of DNA bases.
Assembly: putting sequenced fragments of DNA into their correct chromosomal positions.
Background selection: reduction of genetic diversity due to selection against deleterious mutations at linked sites
Balancing selection: a selection regime that results in the maintenance of two or more alleles at a single locus in a population
Candidate genes: a set of genes known to be involved in a pathway affecting a phenotype; sequencing the gene in individuals with divergent phenotypes can identify mutations, which may be associated with adaptive variation
Coalescent simulator: simulation tool that reconstructs the genealogical history of a sample backwards in time
Codon usage bias: the tendency of an organism's genome to more commonly have a certain codon for a given amino acid than any of its synonymous counterparts
Contig: the result of joining an overlapping collection of sequences or clones
Coverage (or depth): the average number of times that a nucleotide is represented by a high-quality base in a collection of random raw sequence.
Demographic history: the population history of a sample of individuals, which can include events such as population divergence, secondary contacts, and extinctions, as well as the value of demographic parameters such as effective population size, migration rate, and their variation over time
Derived: an allele that arises via a novel mutation and does not achieve fixation in a population (as contrasted with an ancestral allele)
dN/dS ratio: the rate ratio of nonsynonymous to synonymous substitutions
dN: number of nonsynonymous mutations per nonsynonymous site
dS: number of synonymous mutations per synonymous site
Effective population size: the size of a randomly mating population of constant size that would effectively recapitulate patterns and levels of variation observed in a real population
Enrichment of genomic regions: selective recover and subsequent sequencing of genomic loci of interest
Epigenetic modifications: DNA modifications that do not change the DNA sequence but can affect gene expression; the epigenome is the set of such modifications in the entire genome
Epistasis: interactions among genes; the interaction of mutational effects, resulting in a dependence of the effect of a mutation on the background it appears on
Equilibrium population: a panmictic population of continuously constant size
Fitness: a measure of the capacity of an organism to survive and reproduce
Fixation: describes the situation in which a mutation has achieved a frequency of 100% in a natural population
Forward simulator: simulation tool that models the evolution of populations forward in time; this allows for implementation of complex models, but also usually results in much longer computation times because all individuals/haplotypes must be tracked
Frequency-dependent selection: a trend in which the fitness of a given genotype is correlated with its prevalence in the population (e.g., if an allele is advantageous when it is rare)
Genetic drift: change in allele frequencies due to the random sampling process that is inherent in reproduction
Genetic hitch-hiking: also known as genetic draft is the process by which an allele may increase in frequency by virtue of being linked to a gene that is positively selected. In general, genetic hitchhiking can refer to changes in an allele's frequency due to any form of selection operating upon linked genes, including background selection against deleterious mutations
Genetic load: the loss of mean fitness relative to some ideal fitness; it includes mutation load, substitution load, recombination load, and segregation load.
Genome-wide association studies (also known as association mapping or LD mapping): statistical associations between genotype and phenotype are identified in unrelated individuals and only arise when the marker and the trait are in strong linkage disequilibrium (LD)
Haplotype block: a chromosomal region in which groups of alleles at different genetic loci are inherited together more often than expected by chance
Haplotype: allelic composition over a contiguous chromosome stretch
Heterozygote advantage: also called overdominance, when heterozygosity at a specific site has the highest fitness
Identity-by-descent  (IBD) segments: chromosomal segments carried by two or more individuals that are identical because they have been inherited from a common ancestor, without recombination
Introgression: new alleles entering the population by hybridization with members of a differentiated population or even a different species
Linkage disequilibrium (LD): the non-random association of alleles at different loci, often but not always due to physical linkage on the same chromosome
Negative selection: selection acting upon new deleterious mutation
Neutral mutation: a mutation that does not affect the fitness of individuals who carry it in either heterozygous or homozygous condition
Neutral theory: a theory that the vast majority of DNA substitutions between species is the result of neutral mutations and random drift rather than selectively driven substitutions; the neutral theory does not assert that all mutations are neutral or that there is no adaptive evolution, but instead that deleterious mutations are eliminated from a population and that positively selected mutations make only a small contribution to divergence between species
Neutrality test: a statistical test of a null model which assumes that a population is at equilibrium and that mutations are neutral
Nonsynonymous mutation: in a coding sequence, a nucleotide change that alters the amino acid encoded for by a codon
Non parametric bootstrap (or simply bootstrap): a statistical method for measuring consistency in data-sets in which new simulated datasets are generated by sampling with replacement from the real data
Omega (ω): the ratio between the rate of nonsynonymous and synonymous substitution; ω < 1 is evidence of purifying selection, ω >1 of positive selection; ω ~ 1 of neutrality
Parametric bootstrap: a parametric model is fit to the data, and then samples are drawn from this model in order to quantify confidence in a particular estimated parameter
Pleiotropy: the single gene controlling or influencing multiple (and possibly unrelated) phenotypic traits
Positive selection: selection acting upon advantageous mutations
Rare alleles: polymorphisms that occupy the lower bounds of the site frequency spectrum, often as singletons
Scaffold: supercontigs or scaffolds are sets of ordered, oriented contigs; they are longer sequences than contigs, but shorter than full chromosomes
Segregating variant: often termed a polymorphism; a mutation that is not fixed in the population
Selection coefficient: a measure of the strength of selection on a selected genotype; usually, the selection coefficient is measured as the relative difference between the reproductive success of the selected and the ancestral genotypes.
Selective sweep: the process of a beneficial mutation (and its closely linked chromosomal vicinity) being driven (‘swept’) to high frequency (incomplete sweep) or fixation (complete sweep) by natural selection; selective sweeps result in a genomic signature including a local reduction in genetic variation, and skews in the SFS; in a hard sweep, the beneficial allele corresponds to a single, new mutation appearing after an environmental change and producing a marked loss of diversity at linked sites; in a soft sweep the beneficial allele exists before an environmental change (thus representing standing variation) and is initially neutral or even slightly deleterious, or it appears several times independently; diversity at linked sites may be maintained in soft sweeps, which are therefore more difficult to detect using genomic scans; in a recurrent sweeps model, sweeps occur at a given rate, rather than at a fixed time in the past, and are responsible for most of the divergence between species
Site Frequency Spectrum (SFS): the distribution of the frequency of variants across biallelic loci in a population sample; if the ancestral-derived state of each site is known, the distribution of frequency of the derived allele is called unfolded site frequency spectrum and correspond to the count of the number of mutations that exist in a frequency of xi = i/n for i=1, 2, . . . , n−1, in a sample of size n; alternatively, the allele with the (global) minor frequency can be considered as derived and a minor allele frequency spectrum can be estimated; when sites with i/n and (n-1)/n frequencies cannot be distinguished, a folded SFS can be computed merging these frequencies
Standing variation: the variation in a population; in general refers to the polymorphisms already present before a change in environmental conditions, as opposed to de novo mutations
Structural variation: polymorphism in the genome structure, such as deletions, insertions, duplications, translocations and inversions, typically in the size range of kilobases–megabases; polymorphisms in the number of copies of a gene or genomic region are also referred to as copy number variants (CNVs)
Synonymous mutation: a change in the protein-coding region of a gene that does not change the amino acid encoded