Science Behind Sizes in Fashion Clustering Size Related Brand Data

Fitrrati’s aim is to go at the root cause of size related issues in online fashion retail and provide an intelligent fit technology platform to fashion brands and e-tailers. In one such attempt, we explored the science behind size related brand data by clustering the product measurements of various Size Labels across different gender, brands, product types and fit types.

The data used in the analysis involved 18,213 size label records across 224 Indian fashion brands in 17 product categories in both men’s & women’s clothing. The clustering approach used in this study is explained below.

K-means Clustering Approach

The clustering technique deployed is K-means. It is one of the simplest unsupervised learning algorithm which partitions a group of data points into a small number of clusters such that the distance of each data point belongs to the cluster with the nearest mean.

To determine the optimum number of clusters, a plot of Within-cluster sum of squares (WCSS) v/s No. of clusters (N) is generated. The optimum number of clusters is where WCSS becomes minimum after a steep change.

Here’s a plot of WCSS v/s N for Men’s Regular Fit Shirt (Chest & Shoulder Measurements). The number of optimum clusters in this case are 8.

Cluster plot of ‘Men’s Regular Fit Shirts’ is shown below. The points in this plot are product measurements (Shoulder v/s Chest) of different sizes across various brands offering Men’s Regular Fit Shirts in India. Each cluster (shown by lines connecting the points at one center point) in this plot represents an ideal Size Label (XS,S,M,L,XL,XXL,3XL,4-6XL) which are meant for users of different body measurements. The center points of each cluster are the mean of all points in that cluster.

The plot below shows same size of two brands offering ‘Men’s Regular Fit Shirts’ United Colors of Benetton (UCB) and Nautica on the cluster plot highlighted with + symbol. Size ‘S’ of UCB lies in the cluster ‘S’ while Size ‘S’ of Nautica lies in cluster ‘M’. The comparison shows that although the two shirts have same size tag (Size ‘S’), they are meant for customer of different body measurements. If Size ‘S’ of UCB fits a customer perfectly, chances are Size ‘S’ of Nautica will be loose for the same customer.

Similarly, the clusters plots of other product categories Men’s Regular Fit T-shirts, Women’s Regular Fit Dresses, Women’s Regular Fit Kurtas & Women’s Regular Fit Tees & Tops are generated using size related data and various clusters are analysed.

Below is the cluster plot of Men’s Regular Fit T-shirts.

Again, same sizes of two brands offering ‘Men’s Regular Fit T-shirts’ are analysed in the cluster plot below (depicted by + symbol). Size ‘S’ of Nautica is observed to run two sizes bigger (lies in Cluster ‘XL’) as compared to Size ‘S’ of Puma (lies in Cluster ‘M’).

Below is the cluster plot of Women’s Regular Fit Dresses.

Size S of Nautica and Anouk Regular Fit Dresses are analysed and compared in the cluster plot below (depicted by + symbol). Anouks Size S (lies in Cluster XS) is observed to run two sizes bigger compared to Nauticas Size S (lies in Cluster M).

Below is the cluster plot of Women’s Regular Fit Kurtas.

Similarly, Size XS of AND & Soch Regular Fit Kurtas are analysed on the cluster plot below (depicted by + symbol). The comparison shows that Size XS of two brands lies in different cluster again indicating that they are meant for customers of different body measurements.

Below is the cluster plot of Women’s Regular Fit Tees and Tops.

Again, the comparison of two brands (Levis & Wills Lifestyle) of same size (Size XS) in Regular Fit Tees & Tops is carried out on the cluster plot below (depicted by + symbol). Leviss Size XS (lies in Cluster XS) is observed to run one size bigger than Size XS of Wills Lifestyle (lies in Cluster S).


The clustering of size related brand data brings out an important insight. In all product categories for mens and womens clothing, same size in two brands are not necessarily meant for customers with similar body measurements. A possible reason for this can be the difference in the brand identity and target customers of various brands. While one brand may be targeting customers with petite body structure and hence keeping the product measurements smaller while retaining the standard size labels. Other brand may be targeting customers with fuller bodies and as a result the product measurements run larger in standard size labels.

This phenomena results in non-standardized size charts in the fashion industry resulting in a lot of confusion for a normal shopper. It is also one of the major reasons why order return rates of online fashion retailers remains as high as 15-30%. Clearly, theres a need for a simple technology solution which can understand the science behind sizes in fashion and offer a confident experience to online shoppers by eliminating guesswork.