Dynamic Time Warping kNearest Neighbors Classifier (KNNClassifier
)¶
The \(k\)nearest neighbors (\(k\)NN) classification algorithm is a very commonly used algorithm, and perhaps one of the most intuitive ones too.
Before we discuss \(k\)NN with dynamic time warping for sequence classification, let us recap \(k\)NN in the usual case of individual points \(\mathbf{x}\in\mathbb{R}^D\) in \(D\)dimensional Euclidean space.
Recap of general kNN¶
Suppose we have a training dataset \(\mathcal{D}_\text{train}=\big\{(\mathbf{x}^{(n)},c^{(n)})\big\}_{n=1}^N\), where \(N\) is the number of training examplelabel pairs, \(\mathbf{x}^{(n)}\in\mathbb{R}^D\) is a training example and \(c^{(n)}\in\{1,\ldots,C\}\) is its corresponding label.
When classifying a new test example \(\mathbf{x}^{(m)}\in\mathbb{R}^D\), a \(k\)NN classifier does the following:
Computes the distance from \(\mathbf{x}^{(m)}\) to every training example in \(\mathcal{D}_\text{train}\), using a distance measure such as Euclidean distance, \(d(\mathbf{x}^{(m)},\mathbf{x}^{(n)})=\Vert\mathbf{x}^{(m)}\mathbf{x}^{(n)}\Vert\).
Assigns \(c^{(m)}\) as the most common label of the \(k\) nearest training examples.
The most immediate observation one can make is that for every prediction, you need to look through the entire training dataset. As you can imagine, the inability to summarize the model with simpler parameters (e.g. weights of a neural network, or transition/emission/initial probabilities for a HMM), limits the practical use of \(k\)NN classifiers – especially on large datasets.
While this classifier is conceptually simple, it can very often outperform more sophisticated machine learning algorithms in various classification tasks, even when looking only at the nearest neighbor (\(k=1\)).
The algorithm works on the intuition that if \(\mathbf{x}^{(m)}\) has similar features to \(\mathbf{x}^{(n)}\), then they should physically be close together (in \(D\)dimensional Euclidean space).
Extending kNN to sequences¶
We can try to extend this intuition to work with sequences. In general, Sequentia supports multivariate observation sequences. These can be represented as an ordered sequence \(O=\mathbf{o}^{(1)},\ldots,\mathbf{o}^{(T)}\) of observations \(\mathbf{o}^{(t)}\in\mathbb{R}^D\). Indeed, the durations of any two observation sequences \(O^{(n)}\) and \(O^{(m)}\) may differ.
When trying to apply the \(k\)NN intuition to observation sequences, we can say that two sequences \(O^{(n)}\) and \(O^{(m)}\) which are similar to each other should have a small ‘distance’, and if they are different, they should have a large ‘distance’.
But what sort of ‘distance’ could this be? We need a measure that can compare any two sequences of different length, and is small when the sequences are similar, and large if they are different. One such distance measure that allows us to compare sequences of different length is Dynamic Time Warping (DTW).
Given sequencelabel pairs \(\mathcal{D}_\text{train}=\big\{(O^{(n)},c^{(n)})\big\}_{n=1}^N\), apart from the fact that we now compute DTW distances between sequences rather than Euclidean distances between points, the rest of the \(k\)NN algorithm remains unchanged, and indeed \(k\)NN and DTW coupled together creates a powerful sequence classifier.
This \(k\)NN classifier with the DTW distance measure is implemented by the KNNClassifier
class.
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13  import numpy as np
from sequentia.classifiers import KNNClassifier
# Create some sample data
X = [np.random.random((10 * i, 3)) for i in range(1, 4)]
y = ['class0', 'class1', 'class1']
# Create and fit the classifier
clf = KNNClassifier(k=1, classes=list(set(y)))
clf.fit(X, y)
# Predict labels for the training data (just as an example)
clf.predict(X)

For more elaborate examples, please have a look at the example notebooks.
API reference¶

class
sequentia.classifiers.knn.
KNNClassifier
(k, classes, weighting='uniform', window=1.0, use_c=False, independent=False, random_state=None)[source]¶ A kNearest Neighbor classifier that uses dynamic time warping as a distance measure for comparing observation sequences of different length.
 Parameters
 k: int > 0
Number of neighbors.
 classes: arraylike of str/numeric
The complete set of possible classes/labels.
 weighting: ‘uniform’ or callable
A callable that specifies how distance weighting should be performed. The callable should accept a
numpy.ndarray
of DTW distances, apply an elementwise weighting transformation, then return an equallysizednumpy.ndarray
of weighted distances.If a ‘uniform’ weighting is chosen, then the function
lambda x: np.ones(x.size)
is used, which weights all of the distances equally.If the callable is simple enough, it should be specified as a
lambda
, but a function will also work. Examples of weighting functions are:\(e^{\alpha x}\), specified by
lambda x: np.exp(alpha * x)
for some positive \(\alpha\)/alpha
,\(\frac{1}{x}\), specified by
lambda x: 1 / x
.
A good weighting function should ideally be defined at \(x=0\) in the rare event that two observations are perfectly aligned and therefore have zero DTW distance.
Tip
It may be desirable to restrict DTW distances to a small range if you intend to use a weighting function.
Using the
MinMaxScale
orStandardize
preprocessing transformations to scale your features helps to ensure that DTW distances remain small. window: 0 ≤ float ≤ 1
The width of the SakoeChiba band global constraint as a fraction of the length of the longest of the two sequences. A larger constraint will speed up the DTW alignment by restricting the maximum temporal deviation from the diagonal of the DTW matrix, but too much constraint may lead to poor alignment.
The default value of 1 corresponds to full DTW computation with no global constraint applied.
 use_c: bool
Whether or not to use fast pure C compiled functions (from the dtaidistance package) to perform the DTW computations.
Tip
If you set
use_c = True
and are receiving an error about a C library not being available, try reinstallingdtaidistance
and disabling the cache:pip install vvv upgrade nocachedir forcereinstall dtaidistance
 independent: bool
Whether or not to allow features to be warped independently from each other. See here for a good overview of both approaches.
 random_state: numpy.random.RandomState, int, optional
A random state object or seed for reproducible randomness.
 Attributes
 k (property): int > 0
The number of neighbors.
 weighting (property): callable
The distance weighting function.
 window (property): 0 ≤ float ≤ 1
The width of the SakoeChiba band global constraint as a fraction of the length of the longest of the two sequences.
 use_c (property): bool
Whether or not to use fast pure C compiled functions to perform the DTW computations.
 encoder_ (property): sklearn.preprocessing.LabelEncoder
The label encoder fitted on the set of
classes
provided during instantiation. classes_ (property): numpy.ndarray (str/numeric)
The complete set of possible classes/labels.
 X_ (property): list of numpy.ndarray (float)
A list of multiple observation sequences used to fit the classifier.
 y_ (property): numpy.ndarray (int)
The encoded labels for the observation sequences used to fit the classifier.

fit
(X, y)[source]¶ Fits the classifier by adding labeled training observation sequences.
 Parameters
 X: list of numpy.ndarray (float)
A list of multiple observation sequences.
 y: arraylike of str/numeric
An iterable of labels for the observation sequences.

predict
(X, original_labels=True, verbose=True, n_jobs=1)[source]¶ Predicts the label for an observation sequence (or multiple sequences).
 Parameters
 X: numpy.ndarray (float) or list of numpy.ndarray (float)
An individual observation sequence or a list of multiple observation sequences.
 original_labels: bool
Whether to inversetransform the labels to their original encoding.
 verbose: bool
Whether to display a progress bar or not.
Note
If both
verbose=True
andn_jobs > 1
, then the progress bars for each process are always displayed in the console, regardless of where you are running this function from (e.g. a Jupyter notebook). n_jobs: int > 0 or 1
 The number of jobs to run in parallel.Setting this to 1 will use all available CPU cores.
 Returns
 prediction(s): str/numeric or
numpy.ndarray
(str/numeric) The predicted label(s) for the observation sequence(s).
If
original_labels
is true, then the returned labels are inversetransformed into their original encoding.
 prediction(s): str/numeric or

evaluate
(X, y, verbose=True, n_jobs=1)[source]¶ Evaluates the performance of the classifier on a batch of observation sequences and their labels.
 Parameters
 X: list of numpy.ndarray (float)
A list of multiple observation sequences.
 y: arraylike of str/numeric
An iterable of labels for the observation sequences.
 verbose: bool
Whether to display a progress bar for predictions or not.
 n_jobs: int > 0 or 1
 The number of jobs to run in parallel.Setting this to 1 will use all available CPU cores.
 Returns
 accuracy: float
The categorical accuracy of the classifier on the observation sequences.
 confusion:
numpy.ndarray
(int) The confusion matrix representing the discrepancy between predicted and actual labels.

save
(path)[source]¶ Serializes the
KNNClassifier
object by pickling its hyperparameters, variables and training data.Warning
As \(k\)NN classifier must look through each training example during prediction, saving the classifier simply copies all of the training observation sequences and labels.
 Parameters
 path: str
File path (usually with .pkl extension) to store the serialized
KNNClassifier
object.

classmethod
load
(path)[source]¶ Deserializes a
KNNClassifier
object which was serialized with thesave()
function. Parameters
 path: str
File path of the serialized data generated by the
save()
method.
 Returns
 deserialized:
KNNClassifier
The deserialized DTW \(k\)NN classifier object.
 deserialized: