In ther source pages of web writing by html, it uses CSS to decorate the web. We can use the different class of CSS and the tag in it to get the content that we want.
Demo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
from bs4 import BeautifulSoup from urllib.request import urlopen #if has Chinese, apply decode() 'utf-8' html = urlopen("https://morvanzhou.github.io/static/scraping/list.html").read().decode('utf-8') '''insert html_text to 'soup' by using BeautifulSoup library with the feature 'lxml', can learn more feature of analyze html in BeautifulSoup ''' soup = BeautifulSoup(html,features = 'lxml') #print the tag 'month' in tag 'li' all_month = soup.find_all('li',{'class': 'month'}) for month in all_month: print(month.get_text()) #prnit the tag 'jan' in tag 'ul' all_jan = soup.find_all('ul',{'class':'jan'}) for jan in all_jan: print(jan.get_text())
In the source pages, we can see the class of CSS and it also in the html source.