Data Science


Natural Language Processing From First Principles

A Beginner Friendly Introduction to Deep Learning and Artificial Intelligence with Python: Word Embedding from Scratch

4.88 (4 reviews)


3 hours


Jun 2021

Last Update
Regular Price

Unlimited access to 30 000 Premium SkillShare courses

What you will learn

How to code word embeddings from scratch

How to perform stochastic gradient descent

How to augment data for natural language processing

How to perform negative sampling

How to perform sub-sampling


In this course motivated beginners will learn the fundamentals of natural language processing and deep learning. Students will code their own word embedding vectors from scratch, using just Numpy and a little bit of calculus. For students who don't have the required background, a crash course in the required mathematics is included. We'll cover the fundamentals of differential calculus and linear algebra in a succinct overview, so students can easily follow all mathematical derivations.

Rather than simply be presented with results, each step of the mathematical derivations is included. This is to help students foster a deeper understanding of natural language processing and artificial intelligence in general.

Far from being a course where students are simply spoon fed the instructors' interpretation, students will learn to gather information directly from the source. I will show you a repeatable and easy to remember framework to read, understand, and implement deep learning research papers. You will get insight into how the verbiage in research papers maps to real world code. This is an essential skill set for all practitioners of artificial intelligence and data science, and will help you stand out from the crowd.

Throughout the course, good coding practices will be stressed. Students will learn the fundamentals of writing pythonic and extensible code from the very beginning, so that they can easily transition into writing more complex code for production.

By the end of the course, students will be able to answer the following questions:

  • What is the difference between the skip-gram and continuous bag of words models?

  • What is distributional semantics?

  • How can we use vectors to teach computers about language?

  • How do we derive the word2vec gradients?

  • Why is the softmax function so slow in natural language processing?

  • How can we deal with small datasets for natural language processing?

  • How can we improve word embedding using negative sampling?

  • What is the best way to to deal with proper nouns in natural language processing?

  • What were some of the historical approaches to natural language processing?

  • What can word plots teach us about how computers understand language?

There is zero fluff in this course. It is taught at a brisk pace, and is intended for motivated beginners who want deeper insights into natural language processing. Those that complete this course will learn how to implement research papers on there own; you'll never have to rely on Medium blog posts again.


Natural Language Processing From First Principles
Natural Language Processing From First Principles
Natural Language Processing From First Principles
Natural Language Processing From First Principles



What You Will Learn in this Course

Required Background, Software, and Hardware

How to Succeed in this Course

Overview of Natural Language Processing

What is the Purpose of Language?

How do we Represent Words with Computers?

Representing Words and Meaning with Vectors

Teaching Computers to Understanding Language with the Word2Vec Algorithm

Intuition of the Word2Vec Algorithm

Coding up Our Data Parser

Augmenting our Dataset and Implementing Sub-sampling

It's all about the Context

Overview of the Mathematics We Will Need

Essential Mathematical Functions

Coming to Grips with Optimization: A Crash Course in Calculus

Gradients of More Complicated Functions

Deriving the Word2Vec Gradients

The Skipgram Algorithm

Stochastic Gradient Descent

Turbocharging Skipgram with Negative Sampling

The Negative Sampling Loss Function and Gradients

Coding the Main Loop and Visualizing Our Word Vectors

Reading the Word2Vec Paper

A Primer on Reading Deep Learning Research Papers

Reading the Abstract and Introduction

Introducing the Skipgram Model

Empirical Results and Learning Phrases

Additive Compositionality, Comparisons, and Conclusion


6/21/202195% OFFExpired


Udemy ID


Course created date


Course Indexed date
Course Submitted by