Calendar

Week 1Feb 1 Introduction and Setup. Course mechanics. Lecture 0.
Strings of life I. Historical perspective on definition of life. Pre-genomic period. Lecture 1.1.
READModern definition of life.
 Feb 3 Strings of life II. Genomic period. Genome assembly problem. Lecture 1.2.
Post-genomic period. Some speculations. Lecture 1.3.
READMolecular Biology.
TASK Reading Assignment. Paper to read: link. Due: Feb 8.
TASK Quiz 1. Biosequences.
Week 2Feb 8 Pattern Matching I. Task of exact pattern matching. Algorithm by Knuth, Morris and Pratt (KMP). Complexity. Lecture 2.1.
DEMO Shifting Heuristics: LINK.
READ G*: Chapter 2.3.
 Feb 10 Pattern Matching II. Overlap function in linear time. Lecture 2.2.
READ G*: Chapter 2.3.2.
TASK Assignment 1: implementing KMP. Due: Feb 15.
Week 3Feb 15 Pattern Matching review. Quiz and activities.
TASK Assignment 2: pattern search. Due: Feb 22.
 Feb 17 Suffix trees I. Introduction to Suffix Trees. Pattern search. Lecture 3.1.
READ G*: Chapter 5.
Week 4Feb 23 Suffix Trees II. Applications of Suffix Trees. Finding repeats. Lecture 3.2. The Longest Common Substring in linear time. Lecture 3.3.
READ G*: Chapters 7.1-7.6 and 7.11-7.12.
TASK Assignment 3: suffix trees. Due: March 1.
 Feb 25 Suffix Arrays. Introduction to Suffix Arrays - a space-efficient alternative to suffix trees. Lecture 4.1.
READ Chapter about Suffix Arrays from this book.
Week 5Mar 1 Suffix Array construction. Building suffix arrays in time O(n log n). Algorithm by Larsson and Sadakane. Lecture 4.2.
READ the original paper
TASK Assignment 4: suffix arrays. Due: March 8.
 Mar 3 FM-indexes. Burrows-Wheeler transform. Compressed self-indexes. FM-index. Lecture 4.3.
READ blog.
Week 6Mar 8 Dynamic Programming. Shortest paths in a grid graph. Recursion vs. Dynamic Programming. Edit distance between two strings. Lecture 5.1.
READ G*: Chapters 11.1 - 11.5.
TASK Assignment 5: Dynamic Programming. Due: March 15.
 Mar 10 String similarity. Edit graph. Edit distance vs. Longest Common Subsequence. Global and Local alignment. Lecture 5.2.
READ G*: Chapters 11.6 - 11.9
Week 7Mar 15 Edit Distance in linear space. Algorithm by Hischberg. Lecture 5.3.
READ G*: Chapter 12.1.
 Mar 17 Faster Edit Distance. Algorithm by Miller and Myers. Lecture 5.4. Original paper: link.
READ G*: Chapter 12.2.
Week 8Mar 22,24 Spring break: no classes
Week 9Mar 29 Applications of String Searching algorithms. Bio-sequence databases and their uses. String searching algorithms: summary. Lecture 5.5.
READ G*: Chapter 15.
TASK Quiz 8 . Due: Mar 31.
TASK Assignment 6: Miller-Myers in linear space . Due: April 12.
 Mar 31Multiple Sequence Alignment. Motivation for comparing multiple strings. Molecular evolution. Discovering common biological functions. Multiple sequence alignment problem. Dynamic programming solution. Intractability. Approximation algorithm: SP-star. Lecture 6.1.
READ G*: Chapter 14, D*: Chapters 6.1 - 6.4.
TASK Quiz 9 . Due: Apr 4.
Week 10Apr 5 Parsimony and perfect phylogeny. Change through evolution. Phylogenetic trees. Parsimony principle. Algorithm for building perfect phylogenies (Gusfield Chapter 17.3). Lecture 6.2.
READ G*: Chapter 17.
TASK Quiz 10 . Due: Apr 6.
 Apr 7Character-based Phylogenies. Parsimony of mutational events. Small and large parsimony problems. The Fitch algorithm for the Small Parsimony Problem. Optimization for the Large Parsimony Problem: branch-and-bound. Lecture 6.3.
READ D*: Chapter 7.
Week 11Apr 12 Distance-based Phylogenies. Hierarchical clustering. UPGMA. Additivity. Ultrametric trees. Molecular clock. Lecture 6.4.
READ D*: Chapter 7.
TASK Assignment 7. Phylogenetic trees . Due: Apr 21.
 Apr 14 Statistics: primers. Conditional probabilities. Bayesian reasoning. Lecture 7.1.
Markov models. Markov chains. Lecture 7.2.
DEMO Markov models and equilibrium: markov.py.
DEMO Casino sequences: casino.py.
Week 12Apr 19 Hidden Markov Models (HMM). Honest and dishonest casino. Bayes method for discrimination between two model states. Occasionally dishonest casino. Viterbi algorithm for computing most probable path through states. HMM parameter estimation. Lecture 7.3.
READ D*: Chapter 3.
TASK Quiz 11. HMM . Due: Apr 20.
 Apr 21 Applications of HMM. Bio-sequence applications: gene hunting (CpG islands) and profile alignments. Lecture 7.4.
READ D*: Chapter 3.
Week 13Apr 26 Artificial Neural Networks (ANN). ANN primer. Multi-layer Perceptron. Importance of non-linearity. Lecture 8.1.
DEMO Perceptron, Multi-layer Perceptron, and a sample application: Link.
READ The 100 Page ML book: Chapter 6.
Grokking Deep Learning: Chapters 1-6.
 Apr 28 Applications of Neural Networks to sequential data. Main ideas behind Convolutional Neural Networks (CNNs). Sample applications: image recognition and sequence classification. Basics of Recurrent Neural Networks (RNNs). Lecture 8.2.
DEMO Sample applications of CNNs: Link.
READ The 100 Page ML book: Chapter 6.2.1.
Recurrent neural networks: Chapter 6.2.2.
Explanation of recurrent neural networks on a real-life example plus demo: Link.
*G refers to the Gusfield book.
*D refers to the Durbin book.