python


when I use urllib2 to crawl a wibsite,but without labels ,such as html,body


import urllib2
url = 'http://www.bilibili.com/video/av1669338'
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
headers={"User-Agent":user_agent}
request=urllib2.Request(url,headers=headers)
response=urllib2.urlopen(request)
text = response.read()
text[:100]
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xcd}ys\x1bG\xb2\xe7\xdfV\xc4|\x87\x1exhRk\x81\xb8\x08\x10\x90E\xfa\x89\xb2f\x9f\xe3\xd9\xcf\x9e\x1dyb7\xec\tD\x03h\x90\x90p\t\x07)yf"D\xf9I&EI\xd4}\x91\xb6.\xeb\xb0e\x93\x94%Y\xbc$E\xccW\x194\x00\xfe\xe5\xaf\xf0~Y\xd5\xd5\xa8\xeeF\x83\xa7'
Try this:
import bs4, requests
res = requests.get("http://www.bilibili.com/video/av1669338")
soup = bs4.BeautifulSoup(res.content, "lxml")
result = soup.find("meta", attrs = {"name":"keywords"}).get("content")
print result
import requests
from bs4 import BeautifulSoup
def data():
url = 'http://www.bilibili.com/video/av1669338'
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
data = response.content
_html = BeautifulSoup(data)
_meta = _html.head.select('meta[name=keywords]')
print _meta[0]['content']

Related Links

Python, Multiprocessing: what to do if process.join() waits forever?
How to sum values in an iterator in a PySpark groupByKey()
Allauth will not save additional fields
Read Flask Session Cookie
How do I make my script take only numeric inputs without screwing it up
Incorrect output while reading text file in Python
PhantomJS - Permission Denied
Combining image RGB channels
Python input validation and edge case handling
Struggling with making a Python module accessible via PyPi
Finding the minimum and maximum of a list of arrays
Accessing Google Drive Spreadsheets with Python Gspread
py.test & pytest on Raspberry Pi : Differences ?
Find maximum of column for each business quarter pandas
placeholder functions in sympy
Django: how to chain 2 add() calls in 1 create()?

Categories

HOME
converter
crystal-reports
udp
openssl
amazon-swf
textwatcher
ubuntu-16.04
youtube-livestreaming-api
actionscript
playframework
webdav
gspread
filter
simpy
onenote-api
structuremap
jboss-eap-7
google-shopping
equalizer
riak-ts
apache-karaf
ip-camera
rails-activerecord
kitematic
tcpclient
upnp
dss
vegan
jackson-dataformat-csv
twitter-bootstrap-2
modulo
mapguide
rhino
service-locator
oscommerce
xen
agent
parent
wsadmin
cgo
knockout-3.0
php-ews
archer
seamless-immutable
mouseclick-event
shutdown
dwarf
rpostgresql
search-box
addin-express
rxvt
pdfkit
gulp-typescript
verbose
inject
ebtables
manifest.mf
software-product-lines
gitweb
dts
bacnet
linkageerror
pagekit
financial
fill
app.xaml
qbfc
programming-paradigms
risk-analysis
playscape
cpu-cores
ibm-data-studio
scrollspy
hg-git
android-search
fragment-tab-host
dmp
java-collections-api
bullet
bsp
colt
level
winrt-httpclient
visual-c++-2010-express
magicalrecord-2.2
convex-polygon
sslexception
driver-signing
nimrod
localtime
git-filter-branch
msgbox
querystringparameter
zookeeper
surf
sendfile
querypath
revert
boost-date-time
oggvorbis
visual-studio-dbpro

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App