Most predictive models assume the features are arranged into rows and columns, like a spreadsheet, but many kinds of data do not conform to this structure. Sequences are one example of a different kind of data, which is why this data is usually stored in a text document, not a spreadsheet. To build predictive models for sequences and other non-conforming features, we have developed what we call dynamic kernel matching (DKM).
We can think of DKM as directly analogous to a convolutional neural network (CNN), but for non-conforming features. DKM finds the arrangement of features that exhibit the maximal response, like how max pooling identifies the image patch that exhibit the maximal response in a CNN. To find arrangement of features that exhibit the maximal response, we use alignment algorithms.
We apply DKM to two datasets of T-cell receptors to diagnose disease from the T-cell receptor sequences, showing that DKM works on some really hard problems!