cloudflare 우회해서 크롤링 하기

728x90

이전 글에서 zenrows 를 이용해 cloudflare 를 우회해서 크롤링하는 소스를 올렸는데

테스트를 하다보니 잘되는데 프리 이용횟수를 초과하니 막혀버려서 다른 방법을 찾게 되었다.

아래 방법은 selenium 을 이용하는 방법으로 물론 무료이다.

원래 selenium 을 이용할 경우 cloudflare가 bot 으로 인식해서 크롤링이 막히는데

해당 방법은 bot 으로 인식하지 않도록 처리했다.

몇일 테스트를 해봤는데 문제없이 잘 돌아간다.

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

import subprocess

from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.common.by import By

#chrome.exe 경로를 찾아입력

subprocess.Popen(r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\chromeCookie"')

option = Options()

option.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=option)

driver.maximize_window()

driver.get(url)

#content 아이디를 갖는 태그부분 가져와서 출력

content = driver.find_element(By.ID, "content")

print(content)

태그 삭제하기 - 파이썬 (0)	2024.07.18
selenium 사용하기 - 파이썬 (0)	2024.07.18
한번에 여러개 문자열 치환기 - 파이썬 (0)	2024.07.18
UnicodeEncodeError: 'cp949' codec can't encode character '\u2013' (0)	2024.07.16
cloudflare 우회해서 크롤링 하기 - zenrows (0)	2024.07.15

기록하고 싶은...