python


Boto error in Scrapy: “The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.”


I'm trying to crawl the following spider:
import scrapy
from tutorial.items import QuoteItem
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
'FEED_URI': 's3://apkmirror/quotes.json',
'AWS_ACCESS_KEY_ID': 'foo',
'AWS_SECRET_ACCESS_KEY': 'bar',
}
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for quote in response.css('div.quote'):
item = QuoteItem()
item['text'] = quote.css('span.text::text').extract_first()
item['author'] = quote.css('small.author::text').extract_first()
item['tags'] = quote.css('div.tags a.tag::text').extract()
yield item
where 'foo' and 'bar' are the AWS access key ID and secret key, respectively, of an Amazon S3 bucket in Frankfurt, and items.py is simply
import scrapy
class QuoteItem(scrapy.Item):
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
However, when I try to scrapy crawl quotes, the logs contain the following error messages:
2017-05-15 18:33:56 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7fd56fd3b488>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7fd56fd38b18>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7fd56fd3ba28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7fd56fd38aa0>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7fd56fd38a28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7fd56fd38ed8>
2017-05-15 18:33:56 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'body': <open file '<fdopen>', mode 'w+b' at 0x7fd56ef29810>, 'url': u'https://s3.amazonaws.com/apkmirror/quotes.json', 'headers': {'Content-MD5': u'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue', 'User-Agent': 'Botocore/1.4.67 Python/2.7.12 Linux/4.4.0-75-generic'}, 'context': {'client_region': u'us-east-1', 'signing': {'bucket': 'apkmirror'}, 'has_streaming_input': True, 'client_config': <botocore.config.Config object at 0x7fd56ec7b610>}, 'query_string': {}, 'url_path': u'/apkmirror/quotes.json', 'method': u'PUT'}
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7fd56ec7b510>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7fd56fe285f0>
2017-05-15 18:33:56 [botocore.utils] DEBUG: Checking for DNS compatible bucket for: https://s3.amazonaws.com/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.utils] DEBUG: URI updated to: https://apkmirror.s3.amazonaws.com/quotes.json
2017-05-15 18:33:56 [botocore.auth] DEBUG: Calculating signature using hmacv1 auth.
2017-05-15 18:33:56 [botocore.auth] DEBUG: HTTP request method: PUT
2017-05-15 18:33:56 [botocore.auth] DEBUG: StringToSign:
PUT
U+PeT0soEYWoCF4DMQXEzA==
Mon, 15 May 2017 16:33:56 GMT
/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): apkmirror.s3.amazonaws.com
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Received a non 100 Continue response from the server, NOT sending request body.
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /quotes.json HTTP/1.1" 400 None
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response headers: {'x-amz-region': 'eu-central-1', 'x-amz-id-2': 'ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'close', 'x-amz-request-id': '276FC0F60406C7C5', 'date': 'Mon, 15 May 2017 16:33:55 GMT', 'content-type': 'application/xml'}
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>276FC0F60406C7C5</RequestId><HostId>ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=</HostId></Error>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7fd56ece8290>
2017-05-15 18:33:56 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [scrapy.extensions.feedexport] ERROR: Error storing jsonlines feed (20 items) in: s3://apkmirror/quotes.json
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 250, in inContext
result = inContext.theWork()
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 85, in callWithContext
return func(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/extensions/feedexport.py", line 118, in _store_in_thread
Bucket=self.bucketname, Key=self.keyname, Body=file)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 251, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 537, in _make_api_call
raise ClientError(parsed_response, operation_name)
ClientError: An error occurred (InvalidRequest) when calling the PutObject operation: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.
It seems from The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256 and Using boto for AWS S3 Buckets for Signature V4 that the problem is germane (no pun intended) to S3 buckets based in Frankfurt. One solution involves changing the host argument in boto's connect_to_region.
In my case, however, the use of boto is handled by the scrapy source code, which I'd rather not touch. How might I go about fixing this problem?
One solution involves changing the host argument in boto's connect_to_region.
Storage backend for exporting to S3 is handled by scrapy.extensions.feedexport.S3FeedStorage
You may subclass that S3FeedStorage class and implement your own one, which resolves the issue of unmatched S3 bucket auth mechanism.
You'll also need to add
{
"s3": "myproject.extentions.MyS3FeedStorage",
}
into the FEED_STORAGES setting to ask Scrapy to use it.
See also the document
So this is an open issue in scrapy (here). You can work around this by using the aws shared configuration file to set the signature version to s3v4. You can see all the s3 config docs here.
To just set sigv4 you can create the file ~/.aws/config with the following contents:
[default]
s3 =
signature_version = s3v4
Or if you already have the aws cli installed you can just run:
aws configure set default.s3.signature_version s3v4

Related Links

How to recognize single digit in string to insert leading zero?
How can I configure a test environment with Falcon
Computing the gradient for a custom TensorFlow op with py_func()
get all files in drive via REST API
Setting an image as a tkinter window background
Displaying A File Using StringVar() // Tkinter Python 3.5
How are features ranked in RFECV in scikit learn(sklearn)?
Solr requests hang when started via python subprocess
How to extract internet email headers from outlook emails?
Tensorboard get blank page
Python Matplotlib Streamplot providing start points
Reduce inner points in numpy coordinate dataset (speed up concave hull)
Blitting several layers pygame
floating and integer power difference in Python
how to skip certain line in text file and keep reading the next line in python?
match key and insert into new column

Categories

HOME
dotnetrdf
sidekiq
oauth
performancecounter
puppet
smartphone
apollo
ios10
snap.svg
ext.net
user
devstack
criteria
iso
camera-calibration
gatsby
intellij-plugin
cpanel
public-key-encryption
viber
squarespace
aspell
zoomcharts
jcl
atlassian-plugin-sdk
kadanes-algorithm
distance
jconsole
prediction
fop
cronet
data-cleansing
visual-c++-2017
geopandas
branch
pentaho-report-designer
frp
superagent
smart-mobile-studio
graph-databases
spring-insight
m2e
ocs
transient
mms
strstr
freetts
concrete5-5.7
brute-force
flex4.5
ssh.net
dojox.mobile
ttcn
rich-text-editor
z3py
context-free-language
persistent
xmgrace
shutdown
rpostgresql
spyne
keycode
xib
software-product-lines
multiple-file-upload
asteriskami
amf
contour
console-redirect
axes
void
hateoas
prezto
graphical-logo
robocode
jcr-sql2
orientation-changes
utf-32
jquery-autocomplete
ibmsbt
cg
jstat
pyjade
magicalrecord-2.2
sametime
cgpath
cbcentralmanager
sequelpro
mvccontrib
web-safe-fonts
wcf-web-api
flexicious
surf
newsstand-kit
qtembedded
goliath
noir
server-error
nvelocity
email-spec
graph-layout
visual-studio-2010-beta-2
database-diagramming

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App