Scrapy: Powerful Web Scraping & Crawling with Python

Python Scrapy Tutorial - Learn how to scrape websites and build a powerful web crawler using Scrapy, Splash and Python

4.05 (2760 reviews)

Udemy

platform

English

language

Programming Languages

What you will learn

Creating a web crawler in Scrapy

Crawling a single or multiple pages and scrape data

Deploying & Scheduling Spiders to ScrapingHub

Logging into Websites with Scrapy

Running Scrapy as a Standalone Script

Integrating Splash with Scrapy to scrape JavaScript rendered websites

Using Scrapy with Selenium in Special Cases, e.g. to Scrape JavaScript Driven Web Pages

Building Scrapy Advanced Spider

More functions that Scrapy offers after Spider is Done with Scraping

Editing and Using Scrapy Parameters

Exporting data extracted by Scrapy into CSV, Excel, XML, or JSON files

Storing data extracted by Scrapy into MySQL and MongoDB databases

Several real-life web scraping projects, including Craigslist, LinkedIn and many others

Python source code for all exercises in this Scrapy tutorial can be downloaded

Q&A board to send your questions and get them answered quickly

Why take this course?

很高興你對於Scrapy這個強大的Python網頁抓取框架感興趣，並且希望透過本教程來學習如何使用它。Scrapy是一個開源的Web爬蟲框架，專為複雜的Web爬蟲任務而設計，以提供高性能和可擴展性。它不僅僅是用於從網站中提取內容的工具，還包括了網頁爬取（crawling）的功能。以下是這個Python Scrapy教程的主要內容概述： 1. **Scrapy介紹** - Scrapy和其他Python網頁抓取庫（如BeautifulSoup, LXML, Requests, Selenium等）的比較，以及何時選擇使用Scrapy。 2. **安裝和創建Scrapy項目** - 學習如何在你的開發環境中安裝Scrapy，並創建一個新的Scrapy項目。 3. **基本Spider** - 創建一個基本的Spider來從網站提取數據。 4. **XPath命令** - 學習如何使用XPath命令與Scrapy一起使用，以精確提取所需數據。 5. **高級Spider** - 構建一個更複雜的Spider，能夠遍歷多個網頁頁面並從每一頁提取數據。 6. **Scrapy架構** - 探討Scrapy項目的整體架構，理解各部分的作用和如何在Spider代碼中使用它們。 7. **最佳實踐** - 學習如何避免因為不當行為而被網站封禁的最佳實踐。 8. **部署到Scrapy Cloud** - 學習如何將你的Scrapy爬蟲部署到Scrapy Cloud，這是一個無需自行設置伺服器的平台。 9. **認証會話** - 探索Scrapy在需要用戶登錄才能訪問數據的網站上進行爬取時的使用情境。 10. **與Splash/Selenium結合** - 當你需要抓取包含許多AJAX調用和JavaScript執行的動態網頁時，學習如何結合Scrapy、Splash或Selenium。 11. **數據輸出** - 學習如何將提取的數據寫入CSV、JSON和XML文件。 12. **數據存儲** - 探索如何將Scrapy提取的數據存儲到MySQL和MongoDB數據庫中。這個教程旨在幫助你從基礎到高級，全面理解並使用Scrapy進行網頁抓取。通过实际操作和案例分析，你將能够构建、调试和优化自己的网络爬虫项目。

Our review

📚 **Course Overview:** The course in question is designed to teach students web scraping using Scrapy, a powerful Python framework for extracting data from websites. The global rating stands at an impressive 4.05, with recent reviews providing a mix of feedback on various aspects of the course. **Pros:** - **Comprehensive Learning**: Many students found the course to be a good introduction to Scrapy and the topic of web scraping in general. It covers necessary tools and concepts effectively. - **Clear Explanations**: Several reviewers highlighted that the instructor provides very clear and detailed explanations, leaving no stone unturned when it comes to teaching the subject matter. - **Responsive Instructor**: Lazar, the instructor, is noted for being responsive to questions, making the learning experience more interactive and supportive. - **Real-World Application**: The course is praised for its practical approach, with some students successfully applying what they learned in real-world scenarios. - **Best Practices**: Some students appreciated the coverage of best practices and the depth of content provided, even if some code examples were outdated or required updates to align with current standards (e.g., Python 2 to Python 3). - **Well-Structured Syllabus**: The syllabus is commended for being well-structured, preparing students to scrape a wide range of websites. - **Positive Impact**: A notable number of students reported that they came away from the course with a new understanding of Scrapy and web scraping, which they found invaluable. **Cons:** - **Pacing Issues**: Some students felt the course progressed too quickly or was disorganized, making it difficult to follow along and fully grasp certain key points. - **Updating Content**: A common concern was that the course content needed updates to reflect the latest version of Scrapy and other technologies discussed. - **Video Quality**: Several students pointed out issues with video quality, including low volume and hard-to-read screen text on various devices. - **Platform Limitations**: The course primarily focuses on Mac users, potentially leaving out Windows or similar OS users. - **Assumptions of Prior Knowledge**: A few reviewers mentioned that the course occasionally assumes prior knowledge that isn't always provided within the lesson, leading to confusion. - **Code Practicality**: Some students reported that the code examples in later sections of the course did not work as expected, requiring additional time and effort to correct or find alternatives. - **Visual Clarity**: At least one student commented on the need for better visualization of HTML code for easier reading on larger screens. **Additional Notes:** - **Accessibility Concerns**: There are concerns that the course may not be fully accessible to users who are not using a Mac, as some content specifically references macOS and may lack equivalent instructions for Windows or Linux users. - **Evolving Technologies**: The rapidly evolving nature of web scraping tools and websites necessitates regular updates to course materials to ensure they remain relevant and effective. **Recommendations for Improvement:** - **Update Course Material**: Ensure that all Scrapy framework examples are up-to-date with the latest version and compatible across different operating systems. - **Improve Video Quality**: Address issues related to video clarity and volume to enhance the learning experience. - **Slow Down Pacing**: Consider revising the pacing of the course to allow students more time to absorb complex information and follow along effectively. - **Bridge Knowledge Gaps**: Provide a clearer introduction to any necessary prerequisite knowledge or explain more clearly what each file in Scrapy architecture does. - **Enhance Visual Aids**: Improve the visual representation of code, especially when it comes to HTML, for better readability on various devices. In conclusion, while the course has received positive feedback for its comprehensive teaching approach and detailed explanations, there are areas that could be improved to enhance the learning experience for a broader range of students. Updating content regularly and ensuring accessibility across different platforms will likely address many of the current concerns.

Charts

Price

Rating

Enrollment distribution

Scrapy: Powerful Web Scraping & Crawling with Python - Distribution chart