Sales of Summer Clothes on E-Commerce Platform — Wish

Source :
Wish E-commerce Platform


Data Processing:

  1. Product Color
  1. Created a list of categories (unique in nature) based on general business knowledge about clothing industry. This is our “Reference list”
  2. Created a list of all possible unique product descriptions from ‘title_orig’ column. This is our “Input List”.
  3. Ran a fuzzy string-to-string matching algorithm between the “Reference List” and “Input List” using fuzz module of fuzzywuzzy package to extract top categories of interest.
  4. Basically, this algorithm will match a string from “Input List” with each and every string in “Reference List” to generate a similarity score for each and every pair/combination.
  5. Finally, I chose to preserve only those combinations that have a high similarity score (>=90). This indicates a strong match to the values (colors) in “Reference List”.

1. What are the top selling categories, colors, and sizes in Summer?

Modeling Methodology:

  1. Created two arrays X and Y each containing independent variables and the target variable (units sold) respectively.
  2. Split data into training and validation datasets.
  3. Standardized all features in training dataset using standard scaler (subtract by mean and divide by standard deviation) and applied the same standardization process to all variables in validation dataset.
  4. Leveraged Feature Importance attribute from Random Forest algorithm to identify top 15 variables to predict number of units sold. It’s important to get rid of redundant features to minimize multi-collinearity or highly correlated features in the model. Multicollinearity reduces the precision of the estimate coefficients, which weakens the statistical power of the regression model.

2. Which key variables/features help us successfully predict the number of units sold?

Can you build a machine learning model to predict the number of units sold?


Github Repo:




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

READ/DOWNLOAD$! Numsense! Data Science for the Layman: No Math Added FULL BOOK PDF & FULL AUDIOBOOK

How can stakeholder engagement and mini-publics better inform the use of data for pandemic response?

Top 5 Skills for an E-Commerce Data Scientist

Explainable Machine Learning Prediction Tool — No Coding Required!

Rethinking Fast and Slow in Data Science

4 Typical Sampling Methods You Need for Data Science Jobs (Python Code Included)

Numerical Analysis of a Drum

Location of Data for Efficient Data Science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akshay Jain

Akshay Jain

More from Medium

Day 36 of #66DaysOfDataChallenge

Getting Started With Tableau

Google Analytics vs. Adobe Analytics: Which One Is Right For You?

How i learned a Tableau in 2.5 Hours