SOMbrero Web User Interface (v1.2)Welcome to SOMbrero, the open-source on-line interface for self-organizing maps (SOM).This interface trains SOM for numerical data, contingency tables and dissimilarity data using the R package SOMbrero (v1.2-3). Train a map on your data and visualize their topology in three simple steps using the panels on the right. |
It is kindly provided by the
SAMM team and the
MIA-T team under the
GPL-2.0
license, and was developed by Julien Boelaert,
Madalina Olteanu and
Nathalie Vialaneix, using
Shiny. It is also included in the
R package
SOMbrero. Its source code
is freely available on github:
git clone https://github.com/tuxette/sombrero.git
The interface can be tested using example data files for the numeric, korresp and relational algorithms (download these files on your computer and proceed).
Once your dataset is loaded, you can train a self-organizing map (SOM) and explore it. You can then download the resulting SOM in .rda format (you will need R and the package SOMbrero to open this file and use the SOM; its class is the 'somRes' class, handled by SOMbrero). You can also explore it using the next panels to visualize the results, compute super-classes or combine it with additional variables.
Consult the "Help" panel for information on how to choose adequate parameter values.
(1) SOMbrero is based on a stochastic (on-line) version of the SOM algorithm and thus uses randomness. Setting a seed results in fixing the random procedure in order to obtain reproducible results (runing several times the process with the same random seed will give the same map). More information on pseudo-random generators at this link .
In this panel and the next ones you can visualize the computed self-organizing map. This panel contains the standard plots used to analyze the map.
In this panel you can group the clusters into 'superclasses' (using a hierarchical clustering on the neurons' prototypes), download the resulting clustering in csv format and visualize it on charts. The 'dendrogram' plot can help you determine the adequate number of superclasses.
In this panel you can combine the self-organizing map with variables not used for the training. To do so, you must first import an additional file using the form below. The file must either contains the same number of rows as the file used for training (in the same order), or a (square) adjacency matrix for 'graph' plots (the adjacency matrix has a dimension equal to the number of rows .
Option not available for 'Korresp' type of SOM
Different types of data require different types of maps to analyze them. SOMbrero offers three types of algorithms, all based on the on-line (as opposed to batch) SOM:
You can choose the data among your current environment datasets (in the class data.frame or matrix). The three examples datasets are automatically loaded so you can try the methods.
Data also can be imported as a table, in text or csv format (columns are separated by specific symbols such as spaces or semicolons (option 'Separator'). Row names can be included in the data set for better post-analyses of the data (some of the plots will use these names). Check at the bottom of the 'Import Data' panel to see if the data have been properly imported. If not, change the file importation options.
The default options in the 'Self-Organize' panel are set according to the type of map and the size of the dataset, but you can modify the options at will:
Sombrero offers many different plots to analyze your
data's topology using the self-organizing map.
There are two main choices
of what to plot (in the plot and superclass panels): plotting prototypes
uses the values of the neurons' prototypes of the map, which are the
representative vectors of the clusters. Plotting observations uses the
values of the observations within each cluster.
These are the standard types of plots:
number_of_neurons * (number_of_neurons-1) / 2
Plots in the SuperClasses panel: the plot options are mostly the same as the ones listed above, but some are specific:
Plots in the 'Combine with external information' panel: the plot options are mostly the same as the ones listed above, but some are specific:
The show cluster names option in the 'Plot map' panel can be selected to show the names of the neurons on the map.
The energy option in the 'Plot map' panel is used to plot the energy levels of the intermediate backups recorded during training. This is helpful in determining whether the algorithm did converge. (This option only works if a 'Number of intermediate backups' larger than 2 is chosen in the 'Self-Organize' panel.)
Use the options on the dedicated panel to group the prototypes of a trained map into a determined number of superclasses, using hierarchical clustering. The 'dendrogram' plot can help you to choose a relevant number of superclasses (or equivalently a relevant cutting height in the dendrogram).
Plot external data on the trained map on the dedicated
panel. If you have unused variables in your dataset (not used to train the map),
you can select them as external data. Otherwise, or if you want to use other data,
the external data importation process is similar to the one described in
the 'Data importation' section, and the available
plots are described in the 'types of plots' section.
Note that this is the only panel in which factors can be plotted on the
self-organizing map. For instance, if the map is trained on the first four
(numeric) variables of the supplied iris dataset, you can select the species variable and plot the iris species
on the map.