[Python] bs4 와 selenium

Recent Posts

Recent Comments

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

개인용 복습공간

[Python] bs4 와 selenium 본문

Python

[Python] bs4 와 selenium

taehwanis 2021. 6. 5. 14:23

bs4와 셀레니움을 이용한 크롤링을
다뤄보려 한다.

pip install bs4와 pip install pandas를 cmd창에서 인스톨해줘야 사용 가능하다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

from selenium import webdriver
from bs4 import BeautifulSoup as bs
import pandas as pd
from selenium.webdriver.common.keys import Keys
import time
 
keyword = '롤'
url = 'https://www.youtube.com/results?search_query={}'.format(keyword)
 
driver = webdriver.Chrome('chromedriver.exe')
driver.get(url)
soup = bs(driver.page_source, 'html.parser')
driver.close()
 
name = soup.select('a#video-title')
video_url = soup.select('a#video-title')
view = soup.select('a#video-title')
 
name_list = []
url_list = []
view_list = []
 
for i in range(len(name)):
    name_list.append(name[i].text.strip())
    view_list.append(view[i].get('aria-label').split()[-1])
for i in video_url:
    url_list.append('{}{}'.format('https://www.youtube.com',i.get('href')))
    
youtubeDic = {
    '제목': name_list,
    '주소': url_list,
    '조회수': view_list
}
 
youtubeDf = pd.DataFrame(youtubeDic)
 
 
print(youtubeDf)
Colored by Color Scripter

cs

bs4로만 유튜브 크롤링을 해보려 했지만 작동이 안돼서 셀레니움과 bs4를 같이 사용했다.
크롤링해오는 과정은 셀레니움과 비슷하다. 리스트에 넣은 데이터는 pandas를 이용해 데이터 프레임화를 시켰다.

bs4와 셀레니움을 같이 사용하면 속도가 빠르다는데 셀레니움으로 작업을 하고 있지만 크게 차이를 못느끼겠다.

저작자표시 비영리 변경금지

'Python' 카테고리의 다른 글

[Python] 형태소 분석기 - konlpy (0)	2021.06.25
[Python] 파이썬으로 유튜브 크롤링 - 3 (0)	2021.06.02
[Python] 파이썬으로 유튜브 크롤링 - 2 (2)	2021.05.26
[Python] 파이썬으로 유튜브 크롤링 - 1 (0)	2021.05.14

'Python' Related Articles

Comments

개인용 복습공간

[Python] bs4 와 selenium 본문

[Python] bs4 와 selenium

'Python' 카테고리의 다른 글

티스토리툴바