This blog was written to meet requirements for my Udacity “Data Scientist Nanodegree” capstone project. In this project, several prediction models have been tested using Spark to identify users that have higher likelihood to churn for a fictitious streaming music service.


Sparkify is a music streaming service ( like Spotify, Pandora, etc. ). In Sparkify users can either listen to music for free or buy a subscription. The free users have to listen to ads while the subscription users ( or paid users ) listen to songs ad-free. Users need a login to listen to a song on the service.

This blog post is part of Udacity Data Scientists Nanodegree Program. Detailed analysis with all required code is posted in

Source :
Source :
Wish E-commerce Platform


This dataset available on Kaggle was originally scraped from Wish E-Commerce Platform. It contains product listings, products ratings, sales performance, and merchant/supplier information if you type “Summer” in the search field of the platform.

Key Business Objectives:

1. What are the top selling categories, product size and colors for Summer?
2. Which key variables/factors successfully predict the number of units sold?
3. Can you build a machine learning model to predict the number of units sold?

Source Data:
The data…

Akshay Jain

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store