'scrapy_puppeteer.PuppeteerMiddleware': 800 If you are running your spiders from a script, you will have to make sure you install the asyncio reactor before importing scrapy or doing anything else:įrom twisted.internet import asyncioreactorĪsyncioreactor.install(asyncio.get_event_loop())Īdd the `PuppeteerMiddleware` to the downloader middlewares: That's why you **cannot** use the buit-in `scrapy` command line (installing the default reactor), you will have to use the `scrapyp` one, provided by this module. Luckily, we can use the Twisted's () to make the two talking with each other. The main issue when running Scrapy and Puppeteer together is that Scrapy is using () and that () (the python port of puppeteer we are using) is using () for async stuff. The design is strongly inspired of the Scrapy (). This is an attempt to make Scrapy and Puppeteer work together to handle Javascript-rendered pages. # ⚠ IN ACTIVE DEVELOPMENT - READ BEFORE USING ⚠ Scrapy middleware to handle javascript pages using ().
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |