SOMbrero

Types of self-organizing maps

Different types of data require different types of maps to analyze them. SOMbrero offers three types of algorithms, all based on the on-line (as opposed to batch) SOM:

Numeric is the standard self-organizing map, which uses numeric variables only. The data is expected to contains variables in columns and observations in rows.
It can be applied, for instance, to the four first variables of the iris dataset.
Korresp applies the self-organizing algorithm to contingency tables between two factors.
For instance, in the supplied dataset 'presidentielles 2002', which contains the results for the first round of the 2002 French prensidential elections, columns represent presidential candidates and rows represent the French districts called 'departements', so that each cell contains the number of votes for a specific candidate in a specific 'departement'.
Relational is used for dissimilarity matrices, in which each cell contains a measure of distance between two objects. For this method the data must be a square numeric matrix, in which rows and columns represent the same observations. The matrix must be symetric, contains only positive entries with a null diagonal.
For instance, the supplied dataset 'Les Miserables' contains the shortest path lengths between characters of Victor Hugo's novel Les Misérables in the co-appearance network provided here.

Data loading

You can choose the data among your current environment datasets (in the class data.frame or matrix). The three examples datasets are automatically loaded so you can try the methods.

Data also can be imported as a table, in text or csv format (columns are separated by specific symbols such as spaces or semicolons (option 'Separator'). Row names can be included in the data set for better post-analyses of the data (some of the plots will use these names). Check at the bottom of the 'Import Data' panel to see if the data have been properly imported. If not, change the file importation options.

Training options

The default options in the 'Self-Organize' panel are set according to the type of map and the size of the dataset, but you can modify the options at will:

Types of plots

Sombrero offers many different plots to analyze your data's topology using the self-organizing map.
There are two main choices of what to plot (in the plot and superclass panels): plotting prototypes uses the values of the neurons' prototypes of the map, which are the representative vectors of the clusters. Plotting observations uses the values of the observations within each cluster.

These are the standard types of plots:

Plots in the SuperClasses panel: the plot options are mostly the same as the ones listed above, but some are specific:

grid: plots the grid of the neurons, grouped by superclasses (color).
dendrogram: plots the dendrogram of the hierarchical clustering applied to the prototypes, along with the scree plot which shows the proportion of unexplained variance for incremental numbers of superclasses. These are helpful in determining the optimal number of superclasses.
dendro3d: similar to 'dendrogram', but in three dimensions and without the scree plot.

Plots in the 'Combine with external information' panel: the plot options are mostly the same as the ones listed above, but some are specific:

pie: requires the selected variable to be a categorical variable, and plots one pie for each neuron, corresponding to the values of this variable.
words: needs the external data to be a contingency table or numerical values: names of the columns will be used as words and printed on the map with sizes proportional to the sum of values in the neuron.
graph: needs the external data to be the adjacency matrix of a graph. According to the existing edges in the graph and to the clustering obtained with the SOM algorithm, a clustered graph is built in which vertices represent neurons and edge are weighted by the number of edges in the given graph between the vertices affected to the corresponding neurons. This plot can be tested with the supplied dataset Les Miserables that corresponds to the graph those adjacency table is provided at this link.

The show cluster names option in the 'Plot map' panel can be selected to show the names of the neurons on the map.

The energy option in the 'Plot map' panel is used to plot the energy levels of the intermediate backups recorded during training. This is helpful in determining whether the algorithm did converge. (This option only works if a 'Number of intermediate backups' larger than 2 is chosen in the 'Self-Organize' panel.)

Grouping prototypes into Superclasses

Use the options on the dedicated panel to group the prototypes of a trained map into a determined number of superclasses, using hierarchical clustering. The 'dendrogram' plot can help you to choose a relevant number of superclasses (or equivalently a relevant cutting height in the dendrogram).

Combine with external information

Plot external data on the trained map on the dedicated panel. If you have unused variables in your dataset (not used to train the map), you can select them as external data. Otherwise, or if you want to use other data, the external data importation process is similar to the one described in the 'Data importation' section, and the available plots are described in the 'types of plots' section.
Note that this is the only panel in which factors can be plotted on the self-organizing map. For instance, if the map is trained on the first four (numeric) variables of the supplied iris dataset, you can select the species variable and plot the iris species on the map.

SOMbrero Web User Interface (v1.2)

Welcome to SOMbrero, the open-source on-line interface for self-organizing maps (SOM).

References:

2. Data preparation

Choose from the environment/examples

OR Choose CSV/TXT File

3. Train the self-organizing map

Advanced options

Plot the self-organizing map

Options

1. Group prototypes into superclasses

2. Plot the superclasses

1. Load additional information

From unused variales