Illustrative example

Distance metrics for MCDA methods

This manual explains the usage of the library package distance_metrics_mcda that provides metrics that can measure alternatives distance from reference solutions in multi-criteria decision analysis. This library contains module weighting_methods with the following distance metrics:

  • Euclidean distance euclidean

  • Manhattan (Taxicab) distance manhattan

  • Hausdorff distance hausdorff

  • Correlation distance correlation

  • Chebyshev distance chebyshev

  • Standardized euclidean distance std_euclidean

  • Cosine distance cosine

  • Cosine similarity measure csm

  • Squared Euclidean distance squared_euclidean

  • Sorensen or Bray-Curtis distance bray_curtis

  • Canberra distance canberra

  • Lorentzian distance lorentzian

  • Jaccard distance jaccard

  • Dice distance dice

  • Bhattacharyya distance bhattacharyya

  • Hellinger distance hellinger

  • Matusita distance matusita

  • Squared-chord distance squared_chord

  • Pearson chi-square distance pearson_chi_square

  • Squared chi-square distance squared_chi_square

The library also provides other methods necessary for multi-criteria decision analysis, which are as follows: The TOPSIS method for multi-criteria decision analysis TOPSIS in module mcda_methods. The TOPSIS method is based on measuring the distance of alternatives from Positive Ideal Solution and Negative Ideal Solution using distance_metrics mentioned above.

Normalization techniques:

  • Linear linear_normalization

  • Minimum-Maximum minmax_normalization

  • Maximum max_normalization

  • Sum sum_normalization

  • Vector vector_normalization

Correlation coefficients:

  • Spearman rank correlation coefficient rs spearman

  • Weighted Spearman rank correlation coefficient rw weighted_spearman

  • Pearson coefficent pearson_coeff

Objective weighting methods:

  • Entropy weighting method entropy_weighting

  • CRITIC weighting method critic_weighting

Import the necessary Python modules.

[1]:
import copy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Import the necessary modules and methods from package distance_metrics_mcda.

[2]:
from distance_metrics_mcda.mcda_methods import TOPSIS
from distance_metrics_mcda.additions import rank_preferences
from distance_metrics_mcda import correlations as corrs
from distance_metrics_mcda import normalizations as norms
from distance_metrics_mcda import distance_metrics as dists
from distance_metrics_mcda import weighting_methods as mcda_weights

Functions for results visualization.

[3]:
# Functions for visualization
def plot_barplot(df_plot, x_name, y_name, title):
    """
    Display column stacked column chart of weights for criteria for `x_name == Weighting methods`
    and column chart of ranks for alternatives `x_name == Alternatives`

    Parameters
    ----------
        df_plot : dataframe
            dataframe with criteria weights calculated different weighting methods
            or with alternaives rankings for different weighting methods
        x_name : str
            name of x axis, Alternatives or Weighting methods
        y_name : str
            name of y axis, Ranks or Weight values
        title : str
            name of chart title, Weighting methods or Criteria
    """
    list_rank = np.arange(1, len(df_plot) + 1, 1)
    stacked = True
    width = 0.5
    if x_name == 'Alternatives':
        stacked = False
        width = 0.8
    else:
        df_plot = df_plot.T
    ax = df_plot.plot(kind='bar', width = width, stacked=stacked, edgecolor = 'black', figsize = (9,4))
    ax.set_xlabel(x_name, fontsize = 12)
    ax.set_ylabel(y_name, fontsize = 12)

    if x_name == 'Alternatives':
        ax.set_yticks(list_rank)

    ax.set_xticklabels(df_plot.index, rotation = 'horizontal')
    ax.tick_params(axis = 'both', labelsize = 12)

    plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left',
    ncol=5, mode="expand", borderaxespad=0., edgecolor = 'black', title = title, fontsize = 11)

    ax.grid(True, linestyle = '--')
    ax.set_axisbelow(True)
    plt.tight_layout()
    plt.show()


def draw_heatmap(data, title):
    """
    Display heatmap with correlations of compared rankings generated using different methods

    Parameters
    ----------
    data : dataframe
        dataframe with correlation values between compared rankings
    title : str
        title of chart containing name of used correlation coefficient
    """
    plt.figure(figsize = (6, 4))
    sns.set(font_scale=0.8)
    heatmap = sns.heatmap(data, annot=True, fmt=".2f", cmap="YlGn",
                          linewidth=0.5, linecolor='w')
    plt.yticks(va="center")
    plt.xlabel('Weighting methods')
    plt.title('Correlation coefficient: ' + title)
    plt.tight_layout()
    plt.show()


def plot_boxplot(data):
    """
    Display boxplot showing distribution of criteria weights determined with different methods.

    Parameters
    ----------
    data : dataframe
        dataframe with correlation values between compared rankings
    """

    plt.figure(figsize = (7, 4))

    ax = data.boxplot()
    ax.grid(True, linestyle = '--')
    ax.set_axisbelow(True)
    ax.set_xlabel('Alternatives', fontsize = 12)
    ax.set_ylabel('TOPSIS preference distribution', fontsize = 12)
    plt.tight_layout()
    plt.show()

# Create dictionary class
class Create_dictionary(dict):

    # __init__ function
    def __init__(self):
        self = dict()

    # Function to add key:value
    def add(self, key, value):
        self[key] = value

The dataset of mobile phones was acquired from the paper: Guo, M., Liao, X., Liu, J., & Zhang, Q. (2020). Consumer preference analysis: A data-driven multiple criteria approach integrating online information. Omega, 96, 102074. This dataset contains data of 25 models of mobile phones considering 11 evaluation criteria. For the purposes of this research, we selected the first 15 alternatives from this set. The second to last row of CSV includes criteria types, and the last row includes expert criteria weights. However, the authors calculated weights using the objective CRITIC weighting method in this research example.

[4]:
criteria_presentation = pd.read_csv('smartphones_criteria.csv', index_col = 'G')
criteria_presentation
[4]:
Criteria group Cj Explanation Type
G
G1 Hardware and performance C1 Front camera resolution (megapixels) 1
C2 Rear camera resolution (megapixels) 1
C3 Battery capacity (mAh) 1
C4 RAM (GB) 1
C5 Screen size (inch) 1
C6 CPU rating 1
G2 Appearance C7 Appearance rating 1
G3 Brand C8 Market share (%) 1
C9 Brand favorable rate (%) 1
G4 Accessory C10 Accessory rating 1
G5 Price C11 Price (RMB) -1
[5]:
data_presentation = pd.read_csv('dataset_smartphones.csv', index_col = 'Ai')
data_presentation = data_presentation.iloc[:len(data_presentation) - 12, :]
data_presentation
[5]:
Name C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
Ai
A1 Huawei Honor V10 13.0 2.0 3750.0 6.0 6.0 6701.0 3.2 9.8 0.72 2.9 2999.0
A2 Samsung Galaxy Note8 8.0 12.0 3300.0 6.0 6.3 6806.0 4.3 12.7 0.82 3.7 6988.0
A3 iPhone8 Plus 7.0 12.0 2675.0 3.0 5.5 10304.0 3.4 7.8 0.86 3.0 6688.0
A4 Xiaomi Note3 8.0 12.0 3350.0 4.0 5.2 6805.0 3.6 7.3 0.65 3.0 2099.0
A5 iPhone X 7.0 12.0 2700.0 3.0 5.8 10304.0 4.1 7.8 0.86 3.2 8388.0
A6 Xiaomi Mix2 5.0 12.0 3400.0 6.0 6.0 6806.0 3.4 7.3 0.65 2.9 2999.0
A7 One Plus 5t 16.0 20.0 3300.0 6.0 6.0 6805.0 3.1 2.0 0.89 2.5 2999.0
A8 Oppo R11s 20.0 20.0 3205.0 4.0 6.0 5888.0 4.6 13.3 0.83 4.2 2999.0
A9 Huawei Mate10 Pro- 8.0 20.0 4000.0 6.0 6.0 6701.0 4.1 12.3 0.74 3.5 4899.0
A10 Samsung Galaxy S8 8.0 12.0 3000.0 4.0 5.6 6806.0 3.4 12.7 0.74 2.6 4999.0
A11 Xiaomi 5x 5.0 12.0 3080.0 4.0 5.5 6805.0 3.7 7.3 0.65 2.6 1399.0
A12 Xiaomi 6 16.0 12.0 3500.0 6.0 5.5 5888.0 3.5 7.3 0.65 2.9 2299.0
A13 Nokia 7 5.0 16.0 3000.0 6.0 5.2 4212.0 3.7 1.8 0.66 3.0 2199.0
A14 360 N6 Pro- 8.0 16.0 4050.0 6.0 6.0 5888.0 3.4 1.4 0.68 2.8 1899.0
A15 Vivo x20 12.0 12.0 3245.0 4.0 6.0 5888.0 3.5 17.4 0.88 2.7 2798.0

Load a decision matrix containing only the performance values of the alternatives against the criteria and the criteria type in the second to the last row, as shown below. Then, transform the decision matrix and criteria type from dataframe to NumPy array.

[6]:
# Load data from CSV
filename = 'dataset_mobile_phones.csv'
data = pd.read_csv(filename, index_col = 'Ai')
# Load decision matrix from CSV
df_data = data.iloc[:len(data) - 12, :]
# Criteria types are in the last row of CSV
types = data.iloc[len(data) - 2, :].to_numpy()

# Convert decision matrix from dataframe to numpy ndarray type for faster calculations.
matrix = df_data.to_numpy()

# Symbols for alternatives Ai
list_alt_names = [r'$A_{' + str(i) + '}$' for i in range(1, df_data.shape[0] + 1)]
# Symbols for columns Cj
cols = [r'$C_{' + str(j) + '}$' for j in range(1, data.shape[1] + 1)]
print('Decision matrix')
df_data
Decision matrix
[6]:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
Ai
A1 13.0 2.0 3750.0 6.0 6.0 6701.0 3.2 9.8 0.72 2.9 2999.0
A2 8.0 12.0 3300.0 6.0 6.3 6806.0 4.3 12.7 0.82 3.7 6988.0
A3 7.0 12.0 2675.0 3.0 5.5 10304.0 3.4 7.8 0.86 3.0 6688.0
A4 8.0 12.0 3350.0 4.0 5.2 6805.0 3.6 7.3 0.65 3.0 2099.0
A5 7.0 12.0 2700.0 3.0 5.8 10304.0 4.1 7.8 0.86 3.2 8388.0
A6 5.0 12.0 3400.0 6.0 6.0 6806.0 3.4 7.3 0.65 2.9 2999.0
A7 16.0 20.0 3300.0 6.0 6.0 6805.0 3.1 2.0 0.89 2.5 2999.0
A8 20.0 20.0 3205.0 4.0 6.0 5888.0 4.6 13.3 0.83 4.2 2999.0
A9 8.0 20.0 4000.0 6.0 6.0 6701.0 4.1 12.3 0.74 3.5 4899.0
A10 8.0 12.0 3000.0 4.0 5.6 6806.0 3.4 12.7 0.74 2.6 4999.0
A11 5.0 12.0 3080.0 4.0 5.5 6805.0 3.7 7.3 0.65 2.6 1399.0
A12 16.0 12.0 3500.0 6.0 5.5 5888.0 3.5 7.3 0.65 2.9 2299.0
A13 5.0 16.0 3000.0 6.0 5.2 4212.0 3.7 1.8 0.66 3.0 2199.0
A14 8.0 16.0 4050.0 6.0 6.0 5888.0 3.4 1.4 0.68 2.8 1899.0
A15 12.0 12.0 3245.0 4.0 6.0 5888.0 3.5 17.4 0.88 2.7 2798.0
[7]:
print('Criteria types')
types
Criteria types
[7]:
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1., -1.])

Calculate the weights with the selected weighting method. In this case, the CRITIC weighting method (critic_weighting) is selected.

[8]:
weights = mcda_weights.critic_weighting(matrix)
df_weights = pd.DataFrame(weights.reshape(1, -1), index = ['Weights'], columns = cols)
df_weights
[8]:
$C_{1}$ $C_{2}$ $C_{3}$ $C_{4}$ $C_{5}$ $C_{6}$ $C_{7}$ $C_{8}$ $C_{9}$ $C_{10}$ $C_{11}$
Weights 0.090631 0.078639 0.100805 0.148487 0.074187 0.089689 0.074635 0.083033 0.106157 0.066252 0.087486

Use the TOPSIS method to determine the value of the preference function (pref) and the ranking of alternatives (rank). The TOPSIS method ranks alternatives descendingly according to preference function values, so the reverse parameter in the rank_preferences method is set to True.

[9]:
# Create the TOPSIS method object
topsis = TOPSIS(normalization_method = norms.minmax_normalization, distance_metric = dists.euclidean)

# Calculate alternatives preference function values with TOPSIS method
pref = topsis(matrix, weights, types)

# rank alternatives according to preference values
rank = rank_preferences(pref, reverse = True)

# save results in dataframe
df_results = pd.DataFrame(index = list_alt_names)
df_results['Pref'] = pref
df_results['Rank'] = rank
df_results
[9]:
Pref Rank
$A_{1}$ 0.557878 5
$A_{2}$ 0.613773 2
$A_{3}$ 0.382911 12
$A_{4}$ 0.376623 13
$A_{5}$ 0.406067 11
$A_{6}$ 0.506174 9
$A_{7}$ 0.605020 4
$A_{8}$ 0.605045 3
$A_{9}$ 0.639116 1
$A_{10}$ 0.371339 14
$A_{11}$ 0.366135 15
$A_{12}$ 0.536113 7
$A_{13}$ 0.452698 10
$A_{14}$ 0.546563 6
$A_{15}$ 0.523334 8

The second part of the manual contains codes for benchmarking against several different distance metrics. First, list all the distance metrics you wish to explore.

[10]:
# part 2 - study with several distance metrics
# Create a list with distance metrics that you want to explore
distance_metrics = [
    dists.euclidean,
    dists.manhattan,
    # dists.hausdorff,
    # dists.correlation,
    # dists.chebyshev,
    # dists.cosine,
    # dists.squared_euclidean,
    dists.bray_curtis,
    dists.canberra,
    dists.lorentzian,
    # dists.jaccard,
    # dists.dice,
    dists.hellinger,
    dists.matusita,
    dists.squared_chord,
    dists.pearson_chi_square,
    dists.squared_chi_square
]

Below is a loop with code to collect results for each distance metric. Then display the results, namely preference function values, and rankings.

[11]:
# Create dataframes for preference function values and rankings determined using distance metrics
df_preferences = pd.DataFrame(index = list_alt_names)
df_rankings = pd.DataFrame(index = list_alt_names)

for distance_metric in distance_metrics:
    # Create the TOPSIS method object
    topsis = TOPSIS(normalization_method = norms.minmax_normalization, distance_metric = distance_metric)
    pref = topsis(matrix, weights, types)
    rank = rank_preferences(pref, reverse = True)
    df_preferences[distance_metric.__name__.capitalize().replace('_', ' ')] = pref
    df_rankings[distance_metric.__name__.capitalize().replace('_', ' ')] = rank
[12]:
df_preferences
[12]:
Euclidean Manhattan Bray curtis Canberra Lorentzian Hellinger Matusita Squared chord Pearson chi square Squared chi square
$A_{1}$ 0.557878 0.528809 0.764405 0.690239 0.526128 0.635184 0.635184 0.737152 0.117176 0.663579
$A_{2}$ 0.613773 0.626334 0.813167 0.784440 0.623530 0.675393 0.675393 0.876411 0.184789 0.795719
$A_{3}$ 0.382911 0.347487 0.673744 0.602494 0.349972 0.582057 0.582057 0.478531 0.038309 0.420936
$A_{4}$ 0.376623 0.352678 0.676339 0.609918 0.355209 0.592364 0.592364 0.540537 0.033966 0.467656
$A_{5}$ 0.406067 0.390897 0.695448 0.624102 0.393471 0.587017 0.587017 0.506540 0.047524 0.462203
$A_{6}$ 0.506174 0.466063 0.733031 0.641391 0.464597 0.611055 0.611055 0.627756 0.086047 0.575777
$A_{7}$ 0.605020 0.608266 0.804133 0.684588 0.604907 0.643671 0.643671 0.733333 0.157494 0.695822
$A_{8}$ 0.605045 0.685968 0.842984 0.840399 0.685448 0.695958 0.695958 0.910199 0.211622 0.844161
$A_{9}$ 0.639116 0.661767 0.830884 0.802897 0.658651 0.687982 0.687982 0.900479 0.221185 0.829805
$A_{10}$ 0.371339 0.360001 0.680001 0.667939 0.362713 0.607423 0.607423 0.648179 0.032175 0.511833
$A_{11}$ 0.366135 0.333139 0.666570 0.596674 0.335765 0.585178 0.585178 0.496258 0.031179 0.427904
$A_{12}$ 0.536113 0.506358 0.753179 0.686932 0.504284 0.628743 0.628743 0.713145 0.105171 0.639270
$A_{13}$ 0.452698 0.366788 0.683394 0.546127 0.365399 0.578036 0.578036 0.458851 0.057207 0.413545
$A_{14}$ 0.546563 0.528326 0.764163 0.680291 0.526059 0.629141 0.629141 0.702935 0.112033 0.619730
$A_{15}$ 0.523334 0.538333 0.769166 0.743719 0.538096 0.649168 0.649168 0.814663 0.104997 0.708472
[13]:
df_rankings
[13]:
Euclidean Manhattan Bray curtis Canberra Lorentzian Hellinger Matusita Squared chord Pearson chi square Squared chi square
$A_{1}$ 5 6 6 5 6 6 6 5 5 6
$A_{2}$ 2 3 3 3 3 3 3 3 3 3
$A_{3}$ 12 14 14 13 14 14 14 14 12 14
$A_{4}$ 13 13 13 12 13 11 11 11 13 11
$A_{5}$ 11 10 10 11 10 12 12 12 11 12
$A_{6}$ 9 9 9 10 9 9 9 10 9 9
$A_{7}$ 4 4 4 7 4 5 5 6 4 5
$A_{8}$ 3 1 1 1 1 1 1 1 2 1
$A_{9}$ 1 2 2 2 2 2 2 2 1 2
$A_{10}$ 14 12 12 9 12 10 10 9 14 10
$A_{11}$ 15 15 15 14 15 13 13 13 15 13
$A_{12}$ 7 8 8 6 8 8 8 7 7 7
$A_{13}$ 10 11 11 15 11 15 15 15 10 15
$A_{14}$ 6 7 7 8 7 7 7 8 6 8
$A_{15}$ 8 5 5 4 5 4 4 4 8 4

Visualize the results as column graphs of the TOPSIS preference function values, alternatives rankings and correlations.

[14]:
# plot box chart of alternatives preference values
plot_boxplot(df_preferences.T)
_images/example_25_0.png
[15]:
# plot column chart of alternatives rankings
plot_barplot(df_rankings, 'Alternatives', 'Rank', 'Distance metric')
_images/example_26_0.png
[16]:
# Plot heatmaps of rankings correlation coefficient
# Create dataframe with rankings correlation values
results = copy.deepcopy(df_rankings)
method_types = list(results.columns)
dict_new_heatmap_p = Create_dictionary()

for el in method_types:
    dict_new_heatmap_p.add(el, [])

for i, j in [(i, j) for i in method_types[::-1] for j in method_types]:
    dict_new_heatmap_p[j].append(corrs.pearson_coeff(results[i], results[j]))

df_new_heatmap_p = pd.DataFrame(dict_new_heatmap_p, index = method_types[::-1])
df_new_heatmap_p.columns = method_types
[17]:
# Plot heatmap with rankings correlation
draw_heatmap(df_new_heatmap_p, r'$Pearson$')
_images/example_28_0.png
[ ]: