4.88 (4 reviews)
☑ How to code word embeddings from scratch
☑ How to perform stochastic gradient descent
☑ How to augment data for natural language processing
☑ How to perform negative sampling
☑ How to perform sub-sampling
In this course motivated beginners will learn the fundamentals of natural language processing and deep learning. Students will code their own word embedding vectors from scratch, using just Numpy and a little bit of calculus. For students who don't have the required background, a crash course in the required mathematics is included. We'll cover the fundamentals of differential calculus and linear algebra in a succinct overview, so students can easily follow all mathematical derivations.
Rather than simply be presented with results, each step of the mathematical derivations is included. This is to help students foster a deeper understanding of natural language processing and artificial intelligence in general.
Far from being a course where students are simply spoon fed the instructors' interpretation, students will learn to gather information directly from the source. I will show you a repeatable and easy to remember framework to read, understand, and implement deep learning research papers. You will get insight into how the verbiage in research papers maps to real world code. This is an essential skill set for all practitioners of artificial intelligence and data science, and will help you stand out from the crowd.
Throughout the course, good coding practices will be stressed. Students will learn the fundamentals of writing pythonic and extensible code from the very beginning, so that they can easily transition into writing more complex code for production.
By the end of the course, students will be able to answer the following questions:
What is the difference between the skip-gram and continuous bag of words models?
What is distributional semantics?
How can we use vectors to teach computers about language?
How do we derive the word2vec gradients?
Why is the softmax function so slow in natural language processing?
How can we deal with small datasets for natural language processing?
How can we improve word embedding using negative sampling?
What is the best way to to deal with proper nouns in natural language processing?
What were some of the historical approaches to natural language processing?
What can word plots teach us about how computers understand language?
There is zero fluff in this course. It is taught at a brisk pace, and is intended for motivated beginners who want deeper insights into natural language processing. Those that complete this course will learn how to implement research papers on there own; you'll never have to rely on Medium blog posts again.
What You Will Learn in this Course
Required Background, Software, and Hardware
How to Succeed in this Course
Overview of Natural Language Processing
What is the Purpose of Language?
How do we Represent Words with Computers?
Representing Words and Meaning with Vectors
Teaching Computers to Understanding Language with the Word2Vec Algorithm
Intuition of the Word2Vec Algorithm
Coding up Our Data Parser
Augmenting our Dataset and Implementing Sub-sampling
It's all about the Context
Overview of the Mathematics We Will Need
Essential Mathematical Functions
Coming to Grips with Optimization: A Crash Course in Calculus
Gradients of More Complicated Functions
Deriving the Word2Vec Gradients
The Skipgram Algorithm
Stochastic Gradient Descent
Turbocharging Skipgram with Negative Sampling
The Negative Sampling Loss Function and Gradients
Coding the Main Loop and Visualizing Our Word Vectors
Reading the Word2Vec Paper
A Primer on Reading Deep Learning Research Papers
Reading the Abstract and Introduction
Introducing the Skipgram Model
Empirical Results and Learning Phrases
Additive Compositionality, Comparisons, and Conclusion