Kto-Blog
Published on

Playwright Installation Acceleration Guide for China

Authors
  • avatar
    Name
    Kto

Playwright Installation Acceleration Guide for China: Switch Sources and Test Installation

引言 / Introduction

Playwright 是一个强大的自动化测试工具,支持多种浏览器(如 Chrome、Firefox、WebKit)的自动化操作。 Playwright is a powerful automation testing tool that supports automation operations on multiple browsers (such as Chrome, Firefox, WebKit).

然而,对于国内用户来说,安装 Playwright 时可能会遇到下载速度慢的问题,尤其是在安装浏览器(如 Chrome)时。 However, for domestic users, installing Playwright may encounter slow download speeds, especially when installing browsers (such as Chrome).

虽然可以通过更换清华源来加速 Python 包的安装,但 Playwright 安装浏览器时的源更换方式有所不同。 Although you can accelerate Python package installation by switching to Tsinghua source, the method for switching Playwright's browser installation source is different.

本文将介绍如何通过脚本更换国内源,并提供一个测试脚本来验证 Playwright 是否安装成功。 This article will introduce how to switch to domestic sources through scripts, and provide a test script to verify whether Playwright is installed successfully.

1. Playwright 安装慢的原因 / Reasons for Slow Playwright Installation

Playwright 在安装过程中需要下载浏览器二进制文件,这些文件通常托管在国外的服务器上,导致国内用户下载速度较慢。 Playwright needs to download browser binary files during installation, which are usually hosted on foreign servers, resulting in slow download speeds for domestic users.

虽然可以通过更换 Python 包的源来加速 Playwright 本身的安装,但浏览器二进制文件的下载源并不受此影响。 Although you can accelerate the installation of Playwright itself by switching Python package sources, the download source for browser binary files is not affected.

2. 更换国内源加速安装 / Switch to Domestic Sources for Acceleration

为了加速 Playwright 的安装,我们可以通过设置环境变量来指定浏览器二进制文件的下载源。 To accelerate Playwright installation, we can specify the download source for browser binary files by setting environment variables.

以下是一个安装脚本,可以帮助你更换为国内源并加速安装。 The following is an installation script that can help you switch to domestic sources and accelerate installation.

安装脚本 / Installation Script

见附录安装脚本 See Appendix Installation Script

安装脚本说明 / Installation Script Description

  1. 安装 Playwright:使用清华大学的 PyPI 镜像源安装 Playwright,确保 Python 包的下载速度。 Install Playwright: Use Tsinghua University's PyPI mirror to install Playwright, ensuring fast Python package download speeds.

  2. 修改 CDN 镜像配置:通过正则表达式查找并替换 Playwright 的 CDN 镜像配置,将其替换为国内镜像(如 https://registry.npmmirror.com/-/binary/playwright),从而加速浏览器二进制文件的下载。 Modify CDN Mirror Configuration: Use regular expressions to find and replace Playwright's CDN mirror configuration with domestic mirrors (such as https://registry.npmmirror.com/-/binary/playwright), accelerating browser binary file downloads.

  3. 安装 Chromium:执行 playwright install chromium 安装 Chromium 浏览器。 Install Chromium: Execute playwright install chromium to install the Chromium browser.

脚本功能详解 / Script Function Details

  • install_playwright 函数:使用清华源安装 Playwright 库。 install_playwright function: Install Playwright library using Tsinghua source.
  • read_and_modify_file 函数:递归查找 Playwright 的配置文件 index.js,并将其中的 CDN 镜像替换为国内镜像。 read_and_modify_file function: Recursively find Playwright's configuration file index.js and replace the CDN mirror with a domestic mirror.
  • install_chromium 函数:安装 Playwright 所需的 Chromium 浏览器。 install_chromium function: Install the Chromium browser required by Playwright.
  • 错误处理:脚本中包含了错误处理逻辑,确保在安装或修改过程中出现问题时能够及时反馈。 Error Handling: The script includes error handling logic to ensure timely feedback when problems occur during installation or modification.

3. 测试 Playwright 安装 / Test Playwright Installation

为了验证 Playwright 是否安装成功,我们可以编写一个简单的测试脚本。 To verify whether Playwright is installed successfully, we can write a simple test script.

测试脚本 / Test Script

见附录测试脚本 See Appendix Test Script

测试脚本说明 / Test Script Description

  1. 导入模块:导入 playwright.sync_api 模块。 Import Module: Import the playwright.sync_api module.
  2. 启动浏览器:使用 p.chromium.launch() 启动 Chromium 浏览器。 Launch Browser: Use p.chromium.launch() to launch the Chromium browser.
  3. 打开百度并搜索:访问百度首页,输入搜索关键词并提交搜索请求。 Open Baidu and Search: Visit the Baidu homepage, enter search keywords, and submit the search request.
  4. 等待并验证结果:等待搜索结果加载,并检查是否成功找到搜索结果。 Wait and Verify Results: Wait for search results to load and check whether search results are successfully found.
  5. 关闭浏览器:使用 browser.close() 关闭浏览器。 Close Browser: Use browser.close() to close the browser.

如果脚本正常运行并打印出"调试成功:已找到搜索结果。",说明 Playwright 安装成功。 If the script runs normally and prints "调试成功:已找到搜索结果。" (Debug successful: Search results found), it means Playwright is installed successfully.

4. 总结 / Summary

通过更换国内源,我们可以显著加速 Playwright 及其浏览器的安装过程。 By switching to domestic sources, we can significantly accelerate the installation of Playwright and its browsers.

本文提供的安装脚本和测试脚本可以帮助国内用户更轻松地完成 Playwright 的安装和验证。 The installation and test scripts provided in this article can help domestic users complete Playwright installation and verification more easily.

希望这篇博客对你有所帮助,如果你有任何问题或建议,欢迎在评论区留言。 Hope this blog helps you. If you have any questions or suggestions, please leave a comment.


注意:本文提供的脚本和方法适用于大多数情况,但由于网络环境的复杂性,可能会遇到个别问题。如果遇到问题,请参考 Playwright 官方文档或社区支持。 Note: The scripts and methods provided in this article are suitable for most situations, but due to the complexity of network environments, individual problems may be encountered. If you encounter problems, please refer to the official Playwright documentation or community support.

附录 / Appendix

安装脚本代码 / Installation Script Code

# -*- coding: utf-8 -*-
import os
import re
import subprocess
import sys

'''
原始镜像备份
const PLAYWRIGHT_CDN_MIRRORS = ['https://playwright.azureedge.net', 'https://playwright-akamai.azureedge.net', 'https://playwright-verizon.azureedge.net'];
'''

def install_playwright():
    """
    安装 Playwright 库。
    使用清华大学的 PyPI 镜像源进行安装。
    """
    try:
        subprocess.check_call(
            [sys.executable, "-m", "pip", "install", "playwright", "-i", "https://pypi.tuna.tsinghua.edu.cn/simple"])
    except subprocess.CalledProcessError as e:
        handle_error(f"安装过程中出现错误: {e}")
    except Exception as e:
        handle_error(f"发生了一个意外错误: {e}")

def install_chromium():
    """
    安装 Playwright 的 Chromium 浏览器。
    """
    try:
        subprocess.check_call([sys.executable, "-m", "playwright", "install", "chromium"])
    except subprocess.CalledProcessError as e:
        handle_error(f"安装过程中出现错误: {e}")
    except Exception as e:
        handle_error(f"发生了一个意外错误: {e}")

def handle_error(message):
    """
    处理错误并打印错误信息。
    """
    print(message)

def find_specific_directory(start_path, target_directory):
    """
    从起始路径递归查找特定名称的目录。
    """
    matched_directories = []

    for root, dirs, files in os.walk(start_path):
        for dir_name in dirs:
            full_path = os.path.join(root, dir_name)
            if full_path.endswith(target_directory):
                matched_directories.append(full_path)

    return matched_directories

def read_and_modify_file():
    """
    读取并修改 Playwright 的 CDN 镜像配置文件。
    """
    # 获取当前Python可执行文件所在的目录
    directory = os.path.dirname(sys.executable)
    # 获取上一级目录
    parent_directory = os.path.dirname(directory)

    target_directory = r'playwright\driver\package\lib\server\registry'
    matched_dirs = find_specific_directory(parent_directory, target_directory)

    if not matched_dirs:
        print(f"未找到目录: {target_directory}")
        return

    if len(matched_dirs) > 1:
        print(f"找到多个匹配的目录: {matched_dirs}")
        return

    directory = matched_dirs[0]
    # 定义 index.js 文件的路径
    index_js_path = os.path.join(directory, 'index.js')

    # 检查文件是否存在
    if not os.path.exists(index_js_path):
        print(f"未找到文件: {index_js_path}")
        return

    try:
        # 打开并读取文件内容
        with open(index_js_path, 'r', encoding='utf-8') as file:
            content = file.read()

        # 定义要匹配的模式
        pattern = r"const PLAYWRIGHT_CDN_MIRRORS = \['https://[^']+?(?:', 'https://[^']+?)*'\];"
        # 使用正则表达式查找匹配的字符串
        match = re.search(pattern, content)

        # 如果未找到匹配的字符串,直接返回
        if not match:
            print("未找到匹配的字符串")
            return

        # 打印找到的原始镜像
        print("找到原始镜像:")
        print(match.group())

        # 替换匹配到的字符串
        new_content = re.sub(pattern,
                             "const PLAYWRIGHT_CDN_MIRRORS = ['https://registry.npmmirror.com/-/binary/playwright'];",
                             content)
        # 打印替换后的镜像
        print("替换后的镜像:")
        match2 = re.search(pattern, new_content)
        print(match2.group())

        # 将修改后的内容写回文件
        with open(index_js_path, 'w', encoding='utf-8') as file:
            file.write(new_content)

    except Exception as e:
        handle_error(f"读取文件时出错: {e}")

if __name__ == "__main__":
    """
    主函数入口。
    依次执行安装 Playwright、读取并修改配置文件、安装 Chromium 的操作。
    """
    # 安装 Playwright
    install_playwright()
    # 读取并修改 Playwright 的 CDN 镜像配置文件
    read_and_modify_file()
    # 安装 Chromium
    install_chromium()

测试脚本代码 / Test Script Code

from playwright.sync_api import sync_playwright

def list_baidu_search_results(search_keyword):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto("https://www.baidu.com/")
        # 输入搜索关键词
        page.fill('input[name="wd"]', search_keyword)
        page.press('input[name="wd"]', "Enter")

        # 等待搜索结果加载
        page.wait_for_timeout(3000)  # 等待3秒以确保页面加载完成
        page.wait_for_selector(".result")

        # 列出搜索结果
        results = page.query_selector_all(".result")
        if results:
            print("调试成功:已找到搜索结果。")
        else:
            print("调试失败:未找到搜索结果。")

        browser.close()

# 示例调用,传递搜索关键词
search_keyword = "www.ktovoz.com"
list_baidu_search_results(search_keyword)

Comments

Join the discussion and share your thoughts

Sort after loading
Sign in to post comments and replies