.. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_algorithm_comparison.py: Comparison of Dimension Reduction Techniques -------------------------------------------- A comparison of several different dimension reduction techniques on a variety of toy datasets. The datasets are all toy datasets, but should provide a representative range of the strengths and weaknesses of the different algorithms. The time to perform the dimension reduction with each algorithm and each dataset is recorded in the lower right of each plot. Things to note about the datasets: - Blobs: A set of five gaussian blobs in 10 dimensional space. This should be a prototypical example of something that should clearly separate even in a reduced dimension space. - Iris: a classic small dataset with one distinct class and two classes that are not clearly separated. - Digits: handwritten digits -- ideally different digit classes should form distinct groups. Due to the nature of handwriting digits may have several forms (crossed or uncrossed sevens, capped or straight line oes, etc.) - Wine: wine characteristics ideally used for a toy regression. Ultimately the data is essentially one dimensional in nature. - Swiss Roll: data is essentially a rectangle, but has been "rolled up" like a swiss roll in three dimensional space. Ideally a dimension reduction technique should be able to "unroll" it. The data has been coloured according to one dimension of the rectangle, so should form a rectangle of smooth color variation. - Sphere: the two dimensional surface of a three dimensional sphere. This cannot be represented accurately in two dimensions without tearing. The sphere has been coloured with hue around the equator and black to white from the south to north pole. .. code-block:: python import numpy as np import matplotlib.pyplot as plt import seaborn as sns import time from sklearn import datasets, decomposition, manifold, preprocessing from colorsys import hsv_to_rgb import umap sns.set(context="paper", style="white") blobs, blob_labels = datasets.make_blobs( n_samples=500, n_features=10, centers=5, random_state=42 ) iris = datasets.load_iris() digits = datasets.load_digits(n_class=10) wine = datasets.load_wine() swissroll, swissroll_labels = datasets.make_swiss_roll( n_samples=1000, noise=0.1, random_state=42 ) sphere = np.random.normal(size=(600, 3)) sphere = preprocessing.normalize(sphere) sphere_hsv = np.array( [ ( (np.arctan2(c[1], c[0]) + np.pi) / (2 * np.pi), np.abs(c[2]), min((c[2] + 1.1), 1.0), ) for c in sphere ] ) sphere_colors = np.array([hsv_to_rgb(*c) for c in sphere_hsv]) reducers = [ (manifold.TSNE, {"perplexity": 50}), # (manifold.LocallyLinearEmbedding, {'n_neighbors':10, 'method':'hessian'}), (manifold.Isomap, {"n_neighbors": 30}), (manifold.MDS, {}), (decomposition.PCA, {}), (umap.UMAP, {"n_neighbors": 30, "min_dist": 0.3}), ] test_data = [ (blobs, blob_labels), (iris.data, iris.target), (digits.data, digits.target), (wine.data, wine.target), (swissroll, swissroll_labels), (sphere, sphere_colors), ] dataset_names = ["Blobs", "Iris", "Digits", "Wine", "Swiss Roll", "Sphere"] n_rows = len(test_data) n_cols = len(reducers) ax_index = 1 ax_list = [] # plt.figure(figsize=(9 * 2 + 3, 12.5)) plt.figure(figsize=(10, 8)) plt.subplots_adjust( left=.02, right=.98, bottom=.001, top=.96, wspace=.05, hspace=.01 ) for data, labels in test_data: for reducer, args in reducers: start_time = time.time() embedding = reducer(n_components=2, **args).fit_transform(data) elapsed_time = time.time() - start_time ax = plt.subplot(n_rows, n_cols, ax_index) if isinstance(labels[0], tuple): ax.scatter(*embedding.T, s=10, c=labels, alpha=0.5) else: ax.scatter( *embedding.T, s=10, c=labels, cmap="Spectral", alpha=0.5 ) ax.text( 0.99, 0.01, "{:.2f} s".format(elapsed_time), transform=ax.transAxes, size=14, horizontalalignment="right", ) ax_list.append(ax) ax_index += 1 plt.setp(ax_list, xticks=[], yticks=[]) for i in np.arange(n_rows) * n_cols: ax_list[i].set_ylabel(dataset_names[i // n_cols], size=16) for i in range(n_cols): ax_list[i].set_xlabel(repr(reducers[i][0]()).split("(")[0], size=16) ax_list[i].xaxis.set_label_position("top") plt.tight_layout() plt.show() **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_auto_examples_plot_algorithm_comparison.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download :download:`Download Python source code: plot_algorithm_comparison.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: plot_algorithm_comparison.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_