MediaCrawler — Multi-Platform Social Media Crawler

Published:

MediaCrawler is an open-source project designed for learning and research purposes to crawl public social media data from multiple platforms, including Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu. The project demonstrates practical browser automation using Playwright, preserving login sessions to obtain necessary data without reverse-engineering complex JavaScript encryption.

Key functionalities include:

  • Keyword-based search and specific post ID crawling
  • Crawling secondary comments and creator homepages
  • Login state caching and IP proxy support
  • Data export to SQLite, MySQL, CSV, or JSON
  • Generating comment word clouds for analytics

The project emphasizes learning modern web scraping architecture, with a MediaCrawlerPro version providing advanced features like multi-account support, resume crawling, Linux compatibility, and decoupled JS signature logic for enterprise-level code quality.

My Contribution
I contributed to the project by reviewing, reorganizing, and enhancing the documentation, ensuring that installation guides, usage instructions, and configuration explanations are clear and easy to follow. This improvement makes the project more accessible to learners and developers, and helps users quickly get started with multi-platform social media crawling using MediaCrawler.

Usage Notice: This project is strictly for learning and research purposes. Commercial or illegal use is prohibited, and the developer assumes no legal responsibility for misuse.