python


when I use urllib2 to crawl a wibsite,but without labels ,such as html,body


import urllib2
url = 'http://www.bilibili.com/video/av1669338'
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
headers={"User-Agent":user_agent}
request=urllib2.Request(url,headers=headers)
response=urllib2.urlopen(request)
text = response.read()
text[:100]
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xcd}ys\x1bG\xb2\xe7\xdfV\xc4|\x87\x1exhRk\x81\xb8\x08\x10\x90E\xfa\x89\xb2f\x9f\xe3\xd9\xcf\x9e\x1dyb7\xec\tD\x03h\x90\x90p\t\x07)yf"D\xf9I&EI\xd4}\x91\xb6.\xeb\xb0e\x93\x94%Y\xbc$E\xccW\x194\x00\xfe\xe5\xaf\xf0~Y\xd5\xd5\xa8\xeeF\x83\xa7'
Try this:
import bs4, requests
res = requests.get("http://www.bilibili.com/video/av1669338")
soup = bs4.BeautifulSoup(res.content, "lxml")
result = soup.find("meta", attrs = {"name":"keywords"}).get("content")
print result
import requests
from bs4 import BeautifulSoup
def data():
url = 'http://www.bilibili.com/video/av1669338'
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
data = response.content
_html = BeautifulSoup(data)
_meta = _html.head.select('meta[name=keywords]')
print _meta[0]['content']

Related Links

Reconstructing an exception from an extracted traceback
Appending unique mixed string using pandas or python
python setuptools Include man page in RPM
How do I move an empty array from a variable with multiple arrays?
Media images not showing, even though they successfully load on the page
Python Script won't run right when double-clicking .py
Frequently receiving 503 error when conducting Reddit search with PRAW
How to create an XML file with some elements containing CDATA
Python: will having python 2 and python 3 casues trouble? [duplicate]
Difference between 2 files in order
Global Dictionary Between Two Files
Python: In a for loop going through a list, how do you multiply by every other element of that list
How to set up multiple Dag directories in airflow
Uninstall Python 3.x on osx
Understanding ZMQ's HWM
How to change names of a list of numpy files?

Categories

HOME
app-inventor
answer-set-programming
voip
fparsec
wms
survival-analysis
apple-numbers
ll
tizen-wearable-sdk
proguard
subquery
gimp
structuremap
android-viewpager
wordpress-theming
ctypes
event-log
r-lavaan
propel2
android-fragmentactivity
jconsole
excel-2010
kitematic
qsub
croppic
switching
skmaps
pptp
frp
vertex-buffer
consumer
ocs
large-data
service-locator
rhel.net
readfile
haskell-pipes
code-composer
crop
upsert
rkt
stress-testing
scalafx
encapsulation
lirc
livescribe
bnd
text-classification
castle-dynamicproxy
inject
system.web.ui.webcontrols
c64
mongoose-populate
mailcatcher
software-product-lines
fps
mpeg-4
concur
difference
java.util.calendar
app.xaml
ffserver
git-checkout
musl
jcr-sql2
belongs-to
and-operator
scala-2.11
ogr2ogr
ghostdoc
libssh2
alpha-transparency
roxygen
dynamic-proxy
osx-snow-leopard
ruboto
html-form-post
apache2.2
zend-lucene
wpdb
dynamic-data
cadisplaylink
getstring
serp
goliath
authenticode
ets
svn-hooks
winverifytrust
plinq
opcodes
outlook-form

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App