log( "CHILD: url received from parent process", url) Ĭonst browser = await puppeteer. The code snippet below is a simple example of running parallel downloads with Puppeteer.Ĭonst downloadPath = path. 💡 If you are not familiar with how child process work in Node I highly encourage you to give this article a read. How to Use Pyppeteer To use Pyppeteer, start by importing the required packages. We can combine the child process module with our Puppeteer script and download files in parallel. pip install pyppeteer When you launch Pyppeteer for the first time, it'll download the most recent version of Chromium (150MB) if it isn't already installed, taking longer to execute as a result. Child process is how Node.js handles parallel programming. We can fork multiple child_proces in Node. Our CPU cores can run multiple processes at the same time. 💡 Learn more about the single threaded architecture of node here Puppeteer-core package is a version of Puppeteer that not everyone might need as it doesnt download any browser by default. Therefore if we have to download 10 files each 1 gigabyte in size and each requiring about 3 mins to download then with a single process we will have to wait for 10 x 3 = 30 minutes for the task to finish.
It can only execute one process at a time. You see Node.js in its core is a single-threaded system. However, if you have to download multiple large files things start to get complicated. New Relic Instant Observability (I/O) is a rich, open source catalog of more than 400 quickstartspre-built bundles of dashboards, alert configurations, and guidescontributed by experts around the world, reviewed by New Relic, and ready for you to install in a few clicks. In this next part, we will dive deep into some of the advanced concepts.