Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TS3WebFile.cxx
Go to the documentation of this file.
1// @(#)root/net:$Id$
2// Author: Fabio Hernandez 22/01/2013
3// extending an initial version by Marcelo Sousa (class TAS3File)
4
5/*************************************************************************
6 * Copyright (C) 1995-2011, Rene Brun and Fons Rademakers. *
7 * All rights reserved. *
8 * *
9 * For the licensing terms see $ROOTSYS/LICENSE. *
10 * For the list of contributors see $ROOTSYS/README/CREDITS. *
11 *************************************************************************/
12
13//////////////////////////////////////////////////////////////////////////
14// //
15// TS3WebFile //
16// //
17// A TS3WebFile is a TWebFile which retrieves the file contents from a //
18// web server implementing the REST API of the Amazon S3 protocol. This //
19// class is meant to be as generic as possible to be used with files //
20// hosted not only by Amazon S3 servers but also by other providers //
21// implementing the core of the S3 protocol. //
22// //
23// The S3 protocol works on top of HTTPS (and HTTP) and imposes that //
24// each HTTP request be signed using a specific convention: the request //
25// must include an 'Authorization' header which contains the signature //
26// of a concatenation of selected request fields. For signing the //
27// request, an 'Access Key Id' and a 'Secret Access Key' need to be //
28// known. These keys are used by the S3 servers to identify the client //
29// and to authenticate the request as genuine. //
30// //
31// As an end user, you must know the Access Key and Secret Access Key //
32// in order to access each S3 file. They are provided to you by your S3 //
33// service provider. Those two keys can be provided to ROOT when //
34// initializing an object of this class by two means: //
35// a) by using the environmental variables S3_ACCESS_KEY and //
36// S3_SECRET_KEY, or //
37// b) by specifying them when opening each file. //
38// //
39// You can use AWS temporary security credentials (temporary access key //
40// and secret access key), but you must also give the associated //
41// session token. The token may be set in the S3_SESSION_TOKEN //
42// environmental variable, or on open in the TOKEN option. //
43// //
44// The first method is convenient if all the S3 files you want to //
45// access are hosted by a single provider. The second one is more //
46// flexible as it allows you to specify which credentials to use //
47// on a per-file basis. See the documentation of the constructor of //
48// this class for details on the syntax. //
49// //
50// For generating and signing the HTTP request, this class uses //
51// TS3HTTPRequest. //
52// //
53// For more information on the details of S3 protocol please refer to: //
54// "Amazon Simple Storage Service Developer Guide": //
55// http://docs.amazonwebservices.com/AmazonS3/latest/dev/Welcome.html //
56// //
57// "Amazon Simple Storage Service REST API Reference" //
58// http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html //
59//////////////////////////////////////////////////////////////////////////
60
61#include "TS3WebFile.h"
62#include "TROOT.h"
63#include "TError.h"
64#include "TSystem.h"
65#include "TPRegexp.h"
66#include "TEnv.h"
67
68
70
71////////////////////////////////////////////////////////////////////////////////
72/// Construct a TS3WebFile object. The path argument is a URL of one of the
73/// following forms:
74///
75/// ```
76/// s3://host.example.com/bucket/path/to/my/file
77/// s3http://host.example.com/bucket/path/to/my/file
78/// s3https://host.example.com/bucket/path/to/my/file
79/// as3://host.example.com/bucket/path/to/my/file
80/// ```
81///
82/// For files hosted by Google Storage, use the following forms:
83///
84/// ```
85/// gs://storage.googleapis.com/bucket/path/to/my/file
86/// gshttp://storage.googleapis.com/bucket/path/to/my/file
87/// gsthttps://storage.googleapis.com/bucket/path/to/my/file
88/// ```
89///
90/// The 'as3' scheme is accepted for backwards compatibility but its usage is
91/// deprecated.
92///
93/// The recommended way to create an instance of this class is through
94/// TFile::Open, for instance:
95///
96/// ```c++
97/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file")
98/// TFile* f2 = TFile::Open("gs://storage.googleapis.com/bucket/path/to/my/file")
99/// ```
100///
101/// The specified scheme (i.e. s3, s3http, s3https, ...) determines the underlying
102/// transport protocol to use for downloading the file contents, namely HTTP or HTTPS.
103/// The 's3', 's3https', 'gs' and 'gshttps' schemes imply using HTTPS as the transport
104/// protocol. The 's3http', 'as3' and 'gshttp' schemes imply using HTTP as the transport
105/// protocol.
106///
107/// The 'options' argument can contain 'NOPROXY' if you want to bypass
108/// the HTTP proxy when retrieving this file's contents. As for any TWebFile-derived
109/// object, the URL of the web proxy can be specified by setting an environmental
110/// variable 'http_proxy'. If this variable is set, we ask that proxy to route our
111/// requests HTTP(S) requests to the file server.
112///
113/// In addition, you can also use the 'options' argument to provide the access key
114/// and secret key to be used for authentication purposes for this file by using a
115/// string of the form "AUTH=myAccessKey:mySecretkey". This may be useful to
116/// open several files hosted by different providers in the same program/macro,
117/// where the environemntal variables solution is not convenient (see below).
118///
119/// To use AWS temporary security credentials you need to specify the session
120/// token. This can be added to the options argument with a string of the form
121/// TOKEN=mySessionToken. The temporary access and secret keys must also be
122/// available, either via the AUTH option or by environmental variable.
123///
124/// If you need to specify more than one option separate them by ' '
125/// (blank), for instance:
126/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+"
127///
128/// Examples:
129/// ```
130/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
131/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
132/// TFile* f2 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
133/// "AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
134/// TFile* f3 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
135/// "TOKEN=AQoDYXdzEM///////////wEa8AHEYmCinjD+TsGEjtgKSMAT6wnY");
136/// ```
137///
138/// If there is no authentication information in the 'options' argument
139/// (i.e. not AUTH="....") the values of the environmental variables
140/// S3_ACCESS_KEY and S3_SECRET_KEY (if set) are expected to contain
141/// the access key id and the secret access key, respectively. You have
142/// been provided with these credentials by your S3 service provider.
143///
144/// If neither the AUTH information is provided in the 'options' argument
145/// nor the environmental variables are set, we try to open the file
146/// without providing any authentication information to the server. This
147/// is useful when the file is set an access control that allows for
148/// any unidentified user to read the file.
149
150TS3WebFile::TS3WebFile(const char* path, Option_t* options)
151 : TWebFile(path, "IO")
152{
153 // Make sure this is a valid S3 path. We accept 'as3' as a scheme, for
154 // backwards compatibility
155 Bool_t doMakeZombie = kFALSE;
156 TString errorMsg;
157 TString accessKey;
158 TString secretKey;
159 TString token;
160 TPMERegexp rex("^([a]?s3|s3http[s]?|gs|gshttp[s]?){1}://([^/]+)/([^/]+)/([^/].*)", "i");
161 if (rex.Match(TString(path)) != 5) {
162 errorMsg = TString::Format("invalid S3 path '%s'", path);
163 doMakeZombie = kTRUE;
164 }
165 else if (!ParseOptions(options, accessKey, secretKey, token)) {
166 errorMsg = TString::Format("could not parse options '%s'", options);
167 doMakeZombie = kTRUE;
168 }
169
170 // Should we stop initializing this object?
171 if (doMakeZombie) {
172 Error("TS3WebFile", "%s", (const char*)errorMsg);
173 MakeZombie();
175 return;
176 }
177
178 // Set this S3 object's URL, the bucket name this file is located in
179 // and the object key
180 fS3Request.SetBucket(rex[3]);
181 fS3Request.SetObjectKey(TString::Format("/%s", (const char*)rex[4]));
182
183 // Initialize super-classes data members (fUrl is a data member of
184 // super-super class TFile)
185 TString protocol = "https";
186 if (rex[1].EndsWith("http", TString::kIgnoreCase) ||
187 rex[1].EqualTo("as3", TString::kIgnoreCase))
188 protocol = "http";
189 fUrl.SetUrl(TString::Format("%s://%s/%s/%s", (const char*)protocol,
190 (const char*)rex[2], (const char*)rex[3], (const char*)rex[4]));
191
192 // Set S3-specific data members. If the access and secret keys are not
193 // provided in the 'options' argument we look in the environmental
194 // variables.
195 const char* kAccessKeyEnv = "S3_ACCESS_KEY";
196 const char* kSecretKeyEnv = "S3_SECRET_KEY";
197 const char* kSessionToken = "S3_SESSION_TOKEN";
198 if (accessKey.IsNull())
199 GetCredentialsFromEnv(kAccessKeyEnv, kSecretKeyEnv, kSessionToken,
200 accessKey, secretKey, token);
201
202 // Initialize the S3 HTTP request
204 if (accessKey.IsNull() || secretKey.IsNull()) {
205 // We have no authentication information, neither in the options
206 // nor in the enviromental variables. So may be this is a
207 // world-readable file, so let's continue and see if
208 // we can open it.
210 } else {
211 // Set the authentication information we need to use
212 // for this file
213 fS3Request.SetAuthKeys(accessKey, secretKey);
214 if (!token.IsNull())
216 if (rex[1].BeginsWith("gs"))
218 else
220 }
221
222 // Assume this server does not serve multi-range HTTP GET requests. We
223 // will detect this when the HTTP headers of this files are retrieved
224 // later in the initialization process
226
227 // Call super-class initializer
229
230 // Were there some errors opening this file?
231 if (IsZombie() && (accessKey.IsNull() || secretKey.IsNull())) {
232 // We could not open the file and we have no authentication information
233 // so inform the user so that they can check.
234 Error("TS3WebFile", "could not find authentication info in "\
235 "'options' argument and at least one of the environment variables '%s' or '%s' is not set",
236 kAccessKeyEnv, kSecretKeyEnv);
237 }
238}
239
240
241////////////////////////////////////////////////////////////////////////////////
242/// Extracts the S3 authentication key pair (access key and secret key)
243/// from the options. The authentication credentials can be specified in
244/// the options provided to the constructor of this class as a string
245/// containing: "AUTH=<access key>:<secret key>" and can include other
246/// options, for instance "NOPROXY" for not using the HTTP proxy for
247/// accessing this file's contents.
248/// For instance:
249/// "NOPROXY AUTH=F38XYZABCDeFgHiJkLm:V+frt4re7J1euSNFnmaf8wwmI401234E7kzxZ/TTM+"
250/// A security token may be given by the TOKEN option, in order to allow the
251/// use of a temporary key pair.
252
253Bool_t TS3WebFile::ParseOptions(Option_t* options, TString& accessKey, TString& secretKey, TString& token)
254{
255 TString optStr = (const char*)options;
256 if (optStr.IsNull())
257 return kTRUE;
258
260 if (optStr.Contains("NOPROXY", TString::kIgnoreCase))
261 fNoProxy = kTRUE;
262 CheckProxy();
263
264 // Look in the options string for the authentication information.
265 TPMERegexp rex_token("(^TOKEN=|^.* TOKEN=)([\\S]+)[\\s]*.*$", "i");
266 if (rex_token.Match(optStr) == 3) {
267 token = rex_token[2];
268 }
269 TPMERegexp rex("(^AUTH=|^.* AUTH=)([a-z0-9]+):([a-z0-9+/]+)[\\s]*.*$", "i");
270 if (rex.Match(optStr) == 4) {
271 accessKey = rex[2];
272 secretKey = rex[3];
273 }
274 if (gDebug > 0)
275 Info("ParseOptions", "using authentication information from 'options' argument");
276 return kTRUE;
277}
278
279
280////////////////////////////////////////////////////////////////////////////////
281/// Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this
282/// file. Uses TS3HTTPRequest to generate an HTTP HEAD request which includes
283/// the authorization header expected by the S3 server.
284
286{
288 return TWebFile::GetHead();
289}
290
291
292////////////////////////////////////////////////////////////////////////////////
293/// Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET
294/// request compliant to the authentication mechanism used by the S3
295/// protocol. The GET request must contain an "Authorization" header with
296/// the signature of the request, generated using the user's secret access
297/// key.
298
299void TS3WebFile::SetMsgReadBuffer10(const char* redirectLocation, Bool_t tempRedirect)
300{
301 TWebFile::SetMsgReadBuffer10(redirectLocation, tempRedirect);
303 return;
304}
305
306
307////////////////////////////////////////////////////////////////////////////////
308
310{
311 // Overwrites TWebFile::ReadBuffers() for reading specified byte ranges.
312 // According to the kind of server this file is hosted by, we use a
313 // single HTTP request with a muti-range header or we generate multiple
314 // requests with a single range each.
315
316 // Does this server support multi-range GET requests?
317 if (fUseMultiRange)
318 return TWebFile::ReadBuffers(buf, pos, len, nbuf);
319
320 // Send multiple GET requests with a single range of bytes
321 // Adapted from original version by Wang Lu
322 for (Int_t i=0, offset=0; i < nbuf; i++) {
323 TString rangeHeader = TString::Format("Range: bytes=%lld-%lld\r\n\r\n",
324 pos[i], pos[i] + len[i] - 1);
325 TString s3Request = fS3Request.GetRequest(TS3HTTPRequest::kGET, kFALSE) + rangeHeader;
326 if (GetFromWeb10(&buf[offset], len[i], s3Request) == -1)
327 return kTRUE;
328 offset += len[i];
329 }
330 return kFALSE;
331}
332
333
334////////////////////////////////////////////////////////////////////////////////
335/// This method is called by the super-class TWebFile when a HTTP header
336/// for this file is retrieved. We scan the 'Server' header to detect the
337/// type of S3 server this file is hosted on and to determine if it is
338/// known to support multi-range HTTP GET requests. Some S3 servers (for
339/// instance Amazon's) do not support that feature and when they
340/// receive a multi-range request they sent back the whole file contents.
341/// For this class, if the server do not support multirange requests
342/// we issue multiple single-range requests instead.
343
345{
346 TPMERegexp rex("^Server: (.+)", "i");
347 if (rex.Match(headerLine) != 2)
348 return;
349
350 // Extract the identity of this server and compare it to the
351 // identify of the servers known to support multi-range requests.
352 // The list of server identities is expected to be found in ROOT
353 // configuration.
354 TString serverId = rex[1].ReplaceAll("\r", "").ReplaceAll("\n", "");
355 TString multirangeServers(gEnv->GetValue("TS3WebFile.Root.MultiRangeServer", ""));
356 fUseMultiRange = multirangeServers.Contains(serverId, TString::kIgnoreCase) ? kTRUE : kFALSE;
357}
358
359
360////////////////////////////////////////////////////////////////////////////////
361/// Sets the access and secret keys from the environmental variables, if
362/// they are both set. Sets the security session token if it is given.
363
364Bool_t TS3WebFile::GetCredentialsFromEnv(const char* accessKeyEnv, const char* secretKeyEnv,
365 const char* tokenEnv, TString& outAccessKey,
366 TString& outSecretKey, TString& outToken)
367{
368 // Look first in the recommended environmental variables. Both variables
369 // must be set.
370 TString accKey = gSystem->Getenv(accessKeyEnv);
371 TString secKey = gSystem->Getenv(secretKeyEnv);
372 TString token = gSystem->Getenv(tokenEnv);
373 if (!token.IsNull()) {
374 outToken = token;
375 }
376 if (!accKey.IsNull() && !secKey.IsNull()) {
377 outAccessKey = accKey;
378 outSecretKey = secKey;
379 if (gDebug > 0)
380 Info("GetCredentialsFromEnv", "using authentication information from environmental variables '%s' and '%s'",
381 accessKeyEnv, secretKeyEnv);
382 return kTRUE;
383 }
384
385 // Look now in the legacy environmental variables, for keeping backwards
386 // compatibility.
387 accKey = gSystem->Getenv("S3_ACCESS_ID"); // Legacy access key
388 secKey = gSystem->Getenv("S3_ACCESS_KEY"); // Legacy secret key
389 if (!accKey.IsNull() && !secKey.IsNull()) {
390 Warning("SetAuthKeys", "usage of S3_ACCESS_ID and S3_ACCESS_KEY environmental variables is deprecated.");
391 Warning("SetAuthKeys", "please use S3_ACCESS_KEY and S3_SECRET_KEY environmental variables.");
392 outAccessKey = accKey;
393 outSecretKey = secKey;
394 return kTRUE;
395 }
396
397 return kFALSE;
398}
399
const Bool_t kFALSE
Definition RtypesCore.h:101
long long Long64_t
Definition RtypesCore.h:80
const Bool_t kTRUE
Definition RtypesCore.h:100
const char Option_t
Definition RtypesCore.h:66
#define ClassImp(name)
Definition Rtypes.h:364
#define gDirectory
Definition TDirectory.h:385
R__EXTERN TEnv * gEnv
Definition TEnv.h:170
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:220
void Error(const char *location, const char *msgfmt,...)
Use this function in case an error occurred.
Definition TError.cxx:187
void Warning(const char *location, const char *msgfmt,...)
Use this function in warning situations.
Definition TError.cxx:231
Int_t gDebug
Definition TROOT.cxx:592
#define gROOT
Definition TROOT.h:404
R__EXTERN TSystem * gSystem
Definition TSystem.h:559
virtual Int_t GetValue(const char *name, Int_t dflt) const
Returns the integer value for a resource.
Definition TEnv.cxx:491
TUrl fUrl
!URL of file
Definition TFile.h:111
R__ALWAYS_INLINE Bool_t IsZombie() const
Definition TObject.h:153
void MakeZombie()
Definition TObject.h:53
Wrapper for PCRE library (Perl Compatible Regular Expressions).
Definition TPRegexp.h:97
Int_t Match(const TString &s, UInt_t start=0)
Runs a match on s against the regex 'this' was created with.
Definition TPRegexp.cxx:706
TS3HTTPRequest & SetObjectKey(const TString &objectKey)
TString GetRequest(TS3HTTPRequest::EHTTPVerb httpVerb, Bool_t appendCRLF=kTRUE)
Returns the HTTP request ready to be sent to the server.
TS3HTTPRequest & SetAuthKeys(const TString &accessKey, const TString &secretKey)
TS3HTTPRequest & SetBucket(const TString &bucket)
TS3HTTPRequest & SetSessionToken(const TString &token)
TS3HTTPRequest & SetAuthType(TS3HTTPRequest::EAuthType authType)
TS3HTTPRequest & SetHost(const TString &host)
TS3HTTPRequest fS3Request
Definition TS3WebFile.h:87
Bool_t ParseOptions(Option_t *options, TString &accessKey, TString &secretKey, TString &token)
Extracts the S3 authentication key pair (access key and secret key) from the options.
virtual Int_t GetHead()
Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this file.
virtual void ProcessHttpHeader(const TString &headerLine)
This method is called by the super-class TWebFile when a HTTP header for this file is retrieved.
virtual void SetMsgReadBuffer10(const char *redirectLocation=0, Bool_t tempRedirect=kFALSE)
Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET request compliant to the authentic...
Bool_t fUseMultiRange
Definition TS3WebFile.h:88
Bool_t GetCredentialsFromEnv(const char *accessKeyEnv, const char *secretKeyEnv, const char *tokenEnv, TString &outAccessKey, TString &outSecretKey, TString &outToken)
Sets the access and secret keys from the environmental variables, if they are both set.
virtual Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf)
Read specified byte ranges from remote file via HTTP daemon.
Basic string class.
Definition TString.h:136
@ kIgnoreCase
Definition TString.h:268
Bool_t IsNull() const
Definition TString.h:407
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2336
Bool_t Contains(const char *pat, ECaseCompare cmp=kExact) const
Definition TString.h:624
virtual const char * Getenv(const char *env)
Get environment variable.
Definition TSystem.cxx:1663
void SetUrl(const char *url, Bool_t defaultIsFile=kFALSE)
Parse url character string and split in its different subcomponents.
Definition TUrl.cxx:110
const char * GetHost() const
Definition TUrl.h:67
virtual Int_t GetHead()
Get the HTTP header.
virtual void SetMsgReadBuffer10(const char *redirectLocation=0, Bool_t tempRedirect=kFALSE)
Set GET command for use by ReadBuffer(s)10(), handle redirection if needed.
Definition TWebFile.cxx:268
virtual Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf)
Read specified byte ranges from remote file via HTTP daemon.
Definition TWebFile.cxx:501
virtual void CheckProxy()
Check if shell var "http_proxy" has been set and should be used.
Definition TWebFile.cxx:353
TString fMsgGetHead
Definition TWebFile.h:50
virtual void Init(Bool_t readHeadOnly)
Initialize a TWebFile object.
Definition TWebFile.cxx:212
TString fMsgReadBuffer10
Definition TWebFile.h:49
virtual Int_t GetFromWeb10(char *buf, Int_t len, const TString &msg, Int_t nseg=0, Long64_t *seg_pos=0, Int_t *seg_len=0)
Read multiple byte range request from web server.
Definition TWebFile.cxx:676
Bool_t fNoProxy
Definition TWebFile.h:47