Playwright on Elastic Beanstalk
***

Playwright on Elastic Beanstalk

Est. 2m read

I’ve spent a lot of time with web scrapers over the years. BeautifulSoup was my first love. And then it was Puppeteer. But the modern approach is to use Playwright. It’s an incredible tool for all kinds of browser automation and I recommend starting with it.

I appreciate how intuitive the API is and it typically just works for all of my JavaScript needs.

python-3.7 on Amazon Linux 2

Today, I tried setting up Playwright in AWS Elastic Beanstalk. While using the python-3.7 platform I was unable to execute playwright install via SSH:

$ playwright install
ERROR: cannot install on amzn distribution - only Ubuntu is supported

Lucky for me, I had come across a similar issue at work. We wanted playwright to be setup on our self-hosted runners for GitHub Actions without having to playwright install each time a new runner is setup.

The solution was to use a Docker image with playwright pre-installed. We used mcr.microsoft.com/playwright:v1.27.0-focal as the container for each of the steps. See a full example here.

Ubuntu Dockerfile

A simple solution is to recreate the EB environment using the docker platform. We’ll no longer use the python-3.7 platform. That also means the Procfile is no longer needed.

$ eb init --platform docker --region us-east-1 my-app-name
$ eb create \
    --platform docker \
    --elb-type application \
    --region us-east-1 \
    -k my-app-keys

From there, I created a Dockerfile with the following contents. You may need to customize this container for your project. I’m using Python + FastAPI.

  • Note: You may need to replace main:app with your own entrypoint.
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y python3.9 python3.9-dev python3-pip
RUN pip install gunicorn

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

# install playwright
RUN playwright install --with-deps

COPY . .

EXPOSE 8000

# the -b 0.0.0.0:8000 is required for the load balancer
# to communicate with the container
CMD ["gunicorn", "main:app", "-b", "0.0.0.0:8000", "--worker-class", "uvicorn.workers.UvicornWorker", "--workers", "1"]

That’s all you really need to get playwright configured in your EB environment.

Deploying

Updating your app with your local files is as you’d expect…

$ eb deploy

Closing Notes

  • Docker images are cleaned up after each deployment.
  • You’ll probably want to update the SSL certificate in the Listeners section of your load balancer after it’s created.
    • To use SSL, update the newly created security group to allow inbound/outbound HTTPS traffic.
  • Use the eb ssh command to connect to your EB environment via SSH (assuming you specified -k in eb create.)
  • Use the eb logs command to view the logs for your EB environment.