final project for "how to win a data science competition" coursera course question

by Zetta Price 4 min read

How to win Data Science competitions Online course

For the previous 3 weeks I took the new Coursera online course on “ How to win Data Science competitions ”. I wanted to take this course because I thought it will have a good ROI for my Master Tier Journey and it exceeded my expectations!

Topics Covered

The course covers the different stages of the data pipeline. Some of these topics are specific to data competitions, however, I think a big part is transposable to real world data applications. Topics included in the course are the following:

Workload

Each week in the course is made of 30 min to 90 min of video lectures, quizzes and programming assignments. The videos are split into 10 to 20 min videos, good thing for short attention span. The quizzes are available only for students that pay for the course. Programming assignment are related to the course kaggle competition.

Final competition

The highlight of this course is the final competition hosted on kaggle. The goal is to predict future sales volume of different items based on sales history. The data provided represents daily sales from January 2013 to October 2015 provided by 1C, items description and item categories.

Final thought

I highly recommend this course for anyone wanting to get better at data science competitions if you want to get that edge needed to take your skills to the next level.

Learn from Top Kagglers

This course is a part of Advanced Machine Learning, a 7-course Specialization series from Coursera.

Get a Reminder

Not ready to enroll yet? We'll send you an email reminder for this course

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

이 강좌에 대하여

If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few.

Introduction & Recap

This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions.

Feature Preprocessing and Generation with Respect to Models

In this module we will summarize approaches to work with features: preprocessing, generation and extraction. We will see, that the choice of the machine learning model impacts both preprocessing we apply to the features and our approach to generation of new ones.

Final Project Description

This is just a reminder, that the final project in this course is better to start soon! The final project is in fact a competition, in this module you can find an information about it.

Exploratory Data Analysis

We will start this week with Exploratory Data Analysis (EDA). It is a very broad and exciting topic and an essential component of solving process. Besides regular videos you will find a walk through EDA process for Springleaf competition data and an example of prolific EDA for NumerAI competition with extraordinary findings.

Validation

In this module we will discuss various validation strategies. We will see that the strategy we choose depends on the competition setup and that correct validation scheme is one of the bricks for any winning solution.

Data Leakages

Finally, in this module we will cover something very unique to data science competitions. That is, we will see examples how it is sometimes possible to get a top position in a competition with a very little machine learning, just by exploiting a data leakage.