Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TS3WebFile.cxx
Go to the documentation of this file.
1// @(#)root/net:$Id$
2// Author: Fabio Hernandez 22/01/2013
3// extending an initial version by Marcelo Sousa (class TAS3File)
4
5/*************************************************************************
6 * Copyright (C) 1995-2011, Rene Brun and Fons Rademakers. *
7 * All rights reserved. *
8 * *
9 * For the licensing terms see $ROOTSYS/LICENSE. *
10 * For the list of contributors see $ROOTSYS/README/CREDITS. *
11 *************************************************************************/
12
13/**
14\file TS3WebFile.cxx
15\class TS3WebFile
16\ingroup IO
17
18A TS3WebFile is a TWebFile which retrieves the file contents from a
19web server implementing the REST API of the Amazon S3 protocol. This
20class is meant to be as generic as possible to be used with files
21hosted not only by Amazon S3 servers but also by other providers
22implementing the core of the S3 protocol.
23
24The S3 protocol works on top of HTTPS (and HTTP) and imposes that
25each HTTP request be signed using a specific convention: the request
26must include an 'Authorization' header which contains the signature
27of a concatenation of selected request fields. For signing the
28request, an 'Access Key Id' and a 'Secret Access Key' need to be
29known. These keys are used by the S3 servers to identify the client
30and to authenticate the request as genuine.
31
32As an end user, you must know the Access Key and Secret Access Key
33in order to access each S3 file. They are provided to you by your S3
34service provider. Those two keys can be provided to ROOT when
35initializing an object of this class by two means:
36a. by using the environmental variables S3_ACCESS_KEY and
37 S3_SECRET_KEY, or
38b. by specifying them when opening each file.
39
40You can use AWS temporary security credentials (temporary access key
41and secret access key), but you must also give the associated
42session token. The token may be set in the S3_SESSION_TOKEN
43environmental variable, or on open in the TOKEN option.
44
45The first method is convenient if all the S3 files you want to
46access are hosted by a single provider. The second one is more
47flexible as it allows you to specify which credentials to use
48on a per-file basis. See the documentation of the constructor of
49this class for details on the syntax.
50
51For generating and signing the HTTP request, this class uses
52TS3HTTPRequest.
53
54For more information on the details of S3 protocol please refer to:
55"Amazon Simple Storage Service Developer Guide":
56http://docs.amazonwebservices.com/AmazonS3/latest/dev/Welcome.html
57
58"Amazon Simple Storage Service REST API Reference"
59 http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
60
61**/
62
63#include "TS3WebFile.h"
64#include "TROOT.h"
65#include "TError.h"
66#include "TSystem.h"
67#include "TPRegexp.h"
68#include "TEnv.h"
69
70
72
73////////////////////////////////////////////////////////////////////////////////
74/// Construct a TS3WebFile object. The path argument is a URL of one of the
75/// following forms:
76///
77/// ```
78/// s3://host.example.com/bucket/path/to/my/file
79/// s3http://host.example.com/bucket/path/to/my/file
80/// s3https://host.example.com/bucket/path/to/my/file
81/// as3://host.example.com/bucket/path/to/my/file
82/// ```
83///
84/// For files hosted by Google Storage, use the following forms:
85///
86/// ```
87/// gs://storage.googleapis.com/bucket/path/to/my/file
88/// gshttp://storage.googleapis.com/bucket/path/to/my/file
89/// gsthttps://storage.googleapis.com/bucket/path/to/my/file
90/// ```
91///
92/// The 'as3' scheme is accepted for backwards compatibility but its usage is
93/// deprecated.
94///
95/// The recommended way to create an instance of this class is through
96/// TFile::Open, for instance:
97///
98/// ```c++
99/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file")
100/// TFile* f2 = TFile::Open("gs://storage.googleapis.com/bucket/path/to/my/file")
101/// ```
102///
103/// The specified scheme (i.e. s3, s3http, s3https, ...) determines the underlying
104/// transport protocol to use for downloading the file contents, namely HTTP or HTTPS.
105/// The 's3', 's3https', 'gs' and 'gshttps' schemes imply using HTTPS as the transport
106/// protocol. The 's3http', 'as3' and 'gshttp' schemes imply using HTTP as the transport
107/// protocol.
108///
109/// The 'options' argument can contain 'NOPROXY' if you want to bypass
110/// the HTTP proxy when retrieving this file's contents. As for any TWebFile-derived
111/// object, the URL of the web proxy can be specified by setting an environmental
112/// variable 'http_proxy'. If this variable is set, we ask that proxy to route our
113/// requests HTTP(S) requests to the file server.
114///
115/// In addition, you can also use the 'options' argument to provide the access key
116/// and secret key to be used for authentication purposes for this file by using a
117/// string of the form "AUTH=myAccessKey:mySecretkey". This may be useful to
118/// open several files hosted by different providers in the same program/macro,
119/// where the environemntal variables solution is not convenient (see below).
120///
121/// To use AWS temporary security credentials you need to specify the session
122/// token. This can be added to the options argument with a string of the form
123/// TOKEN=mySessionToken. The temporary access and secret keys must also be
124/// available, either via the AUTH option or by environmental variable.
125///
126/// If you need to specify more than one option separate them by ' '
127/// (blank), for instance:
128/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+"
129///
130/// Examples:
131/// ```
132/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
133/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
134/// TFile* f2 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
135/// "AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
136/// TFile* f3 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
137/// "TOKEN=AQoDYXdzEM///////////wEa8AHEYmCinjD+TsGEjtgKSMAT6wnY");
138/// ```
139///
140/// If there is no authentication information in the 'options' argument
141/// (i.e. not AUTH="....") the values of the environmental variables
142/// S3_ACCESS_KEY and S3_SECRET_KEY (if set) are expected to contain
143/// the access key id and the secret access key, respectively. You have
144/// been provided with these credentials by your S3 service provider.
145///
146/// If neither the AUTH information is provided in the 'options' argument
147/// nor the environmental variables are set, we try to open the file
148/// without providing any authentication information to the server. This
149/// is useful when the file is set an access control that allows for
150/// any unidentified user to read the file.
151
152TS3WebFile::TS3WebFile(const char* path, Option_t* options)
153 : TWebFile(path, "IO")
154{
155 // Make sure this is a valid S3 path. We accept 'as3' as a scheme, for
156 // backwards compatibility
157 Bool_t doMakeZombie = kFALSE;
158 TString errorMsg;
159 TString accessKey;
160 TString secretKey;
161 TString token;
162 TPMERegexp rex("^([a]?s3|s3http[s]?|gs|gshttp[s]?){1}://([^/]+)/([^/]+)/([^/].*)", "i");
163 if (rex.Match(TString(path)) != 5) {
164 errorMsg = TString::Format("invalid S3 path '%s'", path);
165 doMakeZombie = kTRUE;
166 }
167 else if (!ParseOptions(options, accessKey, secretKey, token)) {
168 errorMsg = TString::Format("could not parse options '%s'", options);
169 doMakeZombie = kTRUE;
170 }
171
172 // Should we stop initializing this object?
173 if (doMakeZombie) {
174 Error("TS3WebFile", "%s", (const char*)errorMsg);
175 MakeZombie();
177 return;
178 }
179
180 // Set this S3 object's URL, the bucket name this file is located in
181 // and the object key
182 fS3Request.SetBucket(rex[3]);
183 fS3Request.SetObjectKey(TString::Format("/%s", (const char*)rex[4]));
184
185 // Initialize super-classes data members (fUrl is a data member of
186 // super-super class TFile)
187 TString protocol = "https";
188 if (rex[1].EndsWith("http", TString::kIgnoreCase) ||
189 rex[1].EqualTo("as3", TString::kIgnoreCase))
190 protocol = "http";
191 fUrl.SetUrl(TString::Format("%s://%s/%s/%s", (const char*)protocol,
192 (const char*)rex[2], (const char*)rex[3], (const char*)rex[4]));
193
194 // Set S3-specific data members. If the access and secret keys are not
195 // provided in the 'options' argument we look in the environmental
196 // variables.
197 const char* kAccessKeyEnv = "S3_ACCESS_KEY";
198 const char* kSecretKeyEnv = "S3_SECRET_KEY";
199 const char* kSessionToken = "S3_SESSION_TOKEN";
200 if (accessKey.IsNull())
201 GetCredentialsFromEnv(kAccessKeyEnv, kSecretKeyEnv, kSessionToken,
202 accessKey, secretKey, token);
203
204 // Initialize the S3 HTTP request
206 if (accessKey.IsNull() || secretKey.IsNull()) {
207 // We have no authentication information, neither in the options
208 // nor in the enviromental variables. So may be this is a
209 // world-readable file, so let's continue and see if
210 // we can open it.
212 } else {
213 // Set the authentication information we need to use
214 // for this file
215 fS3Request.SetAuthKeys(accessKey, secretKey);
216 if (!token.IsNull())
218 if (rex[1].BeginsWith("gs"))
220 else
222 }
223
224 // Assume this server does not serve multi-range HTTP GET requests. We
225 // will detect this when the HTTP headers of this files are retrieved
226 // later in the initialization process
228
229 // Call super-class initializer
231
232 // Were there some errors opening this file?
233 if (IsZombie() && (accessKey.IsNull() || secretKey.IsNull())) {
234 // We could not open the file and we have no authentication information
235 // so inform the user so that they can check.
236 Error("TS3WebFile", "could not find authentication info in "\
237 "'options' argument and at least one of the environment variables '%s' or '%s' is not set",
238 kAccessKeyEnv, kSecretKeyEnv);
239 }
240}
241
242
243////////////////////////////////////////////////////////////////////////////////
244/// Extracts the S3 authentication key pair (access key and secret key)
245/// from the options. The authentication credentials can be specified in
246/// the options provided to the constructor of this class as a string
247/// containing: "AUTH=<access key>:<secret key>" and can include other
248/// options, for instance "NOPROXY" for not using the HTTP proxy for
249/// accessing this file's contents.
250/// For instance:
251/// "NOPROXY AUTH=F38XYZABCDeFgHiJkLm:V+frt4re7J1euSNFnmaf8wwmI401234E7kzxZ/TTM+"
252/// A security token may be given by the TOKEN option, in order to allow the
253/// use of a temporary key pair.
254
255Bool_t TS3WebFile::ParseOptions(Option_t* options, TString& accessKey, TString& secretKey, TString& token)
256{
257 TString optStr = (const char*)options;
258 if (optStr.IsNull())
259 return kTRUE;
260
262 if (optStr.Contains("NOPROXY", TString::kIgnoreCase))
263 fNoProxy = kTRUE;
264 CheckProxy();
265
266 // Look in the options string for the authentication information.
267 TPMERegexp rex_token("(^TOKEN=|^.* TOKEN=)([\\S]+)[\\s]*.*$", "i");
268 if (rex_token.Match(optStr) == 3) {
269 token = rex_token[2];
270 }
271 TPMERegexp rex("(^AUTH=|^.* AUTH=)([a-z0-9]+):([a-z0-9+/]+)[\\s]*.*$", "i");
272 if (rex.Match(optStr) == 4) {
273 accessKey = rex[2];
274 secretKey = rex[3];
275 }
276 if (gDebug > 0)
277 Info("ParseOptions", "using authentication information from 'options' argument");
278 return kTRUE;
279}
280
281
282////////////////////////////////////////////////////////////////////////////////
283/// Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this
284/// file. Uses TS3HTTPRequest to generate an HTTP HEAD request which includes
285/// the authorization header expected by the S3 server.
286
288{
290 return TWebFile::GetHead();
291}
292
293
294////////////////////////////////////////////////////////////////////////////////
295/// Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET
296/// request compliant to the authentication mechanism used by the S3
297/// protocol. The GET request must contain an "Authorization" header with
298/// the signature of the request, generated using the user's secret access
299/// key.
300
301void TS3WebFile::SetMsgReadBuffer10(const char* redirectLocation, Bool_t tempRedirect)
302{
303 TWebFile::SetMsgReadBuffer10(redirectLocation, tempRedirect);
305 return;
306}
307
308
309////////////////////////////////////////////////////////////////////////////////
310
312{
313 // Overwrites TWebFile::ReadBuffers() for reading specified byte ranges.
314 // According to the kind of server this file is hosted by, we use a
315 // single HTTP request with a muti-range header or we generate multiple
316 // requests with a single range each.
317
318 // Does this server support multi-range GET requests?
319 if (fUseMultiRange)
320 return TWebFile::ReadBuffers(buf, pos, len, nbuf);
321
322 // Send multiple GET requests with a single range of bytes
323 // Adapted from original version by Wang Lu
324 for (Int_t i=0, offset=0; i < nbuf; i++) {
325 TString rangeHeader = TString::Format("Range: bytes=%lld-%lld\r\n\r\n",
326 pos[i], pos[i] + len[i] - 1);
327 TString s3Request = fS3Request.GetRequest(TS3HTTPRequest::kGET, kFALSE) + rangeHeader;
328 if (GetFromWeb10(&buf[offset], len[i], s3Request) == -1)
329 return kTRUE;
330 offset += len[i];
331 }
332 return kFALSE;
333}
334
335
336////////////////////////////////////////////////////////////////////////////////
337/// This method is called by the super-class TWebFile when a HTTP header
338/// for this file is retrieved. We scan the 'Server' header to detect the
339/// type of S3 server this file is hosted on and to determine if it is
340/// known to support multi-range HTTP GET requests. Some S3 servers (for
341/// instance Amazon's) do not support that feature and when they
342/// receive a multi-range request they sent back the whole file contents.
343/// For this class, if the server do not support multirange requests
344/// we issue multiple single-range requests instead.
345
347{
348 TPMERegexp rex("^Server: (.+)", "i");
349 if (rex.Match(headerLine) != 2)
350 return;
351
352 // Extract the identity of this server and compare it to the
353 // identify of the servers known to support multi-range requests.
354 // The list of server identities is expected to be found in ROOT
355 // configuration.
356 TString serverId = rex[1].ReplaceAll("\r", "").ReplaceAll("\n", "");
357 TString multirangeServers(gEnv->GetValue("TS3WebFile.Root.MultiRangeServer", ""));
358 fUseMultiRange = multirangeServers.Contains(serverId, TString::kIgnoreCase) ? kTRUE : kFALSE;
359}
360
361
362////////////////////////////////////////////////////////////////////////////////
363/// Sets the access and secret keys from the environmental variables, if
364/// they are both set. Sets the security session token if it is given.
365
366Bool_t TS3WebFile::GetCredentialsFromEnv(const char* accessKeyEnv, const char* secretKeyEnv,
367 const char* tokenEnv, TString& outAccessKey,
368 TString& outSecretKey, TString& outToken)
369{
370 // Look first in the recommended environmental variables. Both variables
371 // must be set.
372 TString accKey = gSystem->Getenv(accessKeyEnv);
373 TString secKey = gSystem->Getenv(secretKeyEnv);
374 TString token = gSystem->Getenv(tokenEnv);
375 if (!token.IsNull()) {
376 outToken = token;
377 }
378 if (!accKey.IsNull() && !secKey.IsNull()) {
379 outAccessKey = accKey;
380 outSecretKey = secKey;
381 if (gDebug > 0)
382 Info("GetCredentialsFromEnv", "using authentication information from environmental variables '%s' and '%s'",
383 accessKeyEnv, secretKeyEnv);
384 return kTRUE;
385 }
386
387 // Look now in the legacy environmental variables, for keeping backwards
388 // compatibility.
389 accKey = gSystem->Getenv("S3_ACCESS_ID"); // Legacy access key
390 secKey = gSystem->Getenv("S3_ACCESS_KEY"); // Legacy secret key
391 if (!accKey.IsNull() && !secKey.IsNull()) {
392 Warning("SetAuthKeys", "usage of S3_ACCESS_ID and S3_ACCESS_KEY environmental variables is deprecated.");
393 Warning("SetAuthKeys", "please use S3_ACCESS_KEY and S3_SECRET_KEY environmental variables.");
394 outAccessKey = accKey;
395 outSecretKey = secKey;
396 return kTRUE;
397 }
398
399 return kFALSE;
400}
401
constexpr Bool_t kFALSE
Definition RtypesCore.h:101
long long Long64_t
Definition RtypesCore.h:80
constexpr Bool_t kTRUE
Definition RtypesCore.h:100
const char Option_t
Definition RtypesCore.h:66
#define ClassImp(name)
Definition Rtypes.h:377
#define gDirectory
Definition TDirectory.h:384
R__EXTERN TEnv * gEnv
Definition TEnv.h:170
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:218
void Error(const char *location, const char *msgfmt,...)
Use this function in case an error occurred.
Definition TError.cxx:185
void Warning(const char *location, const char *msgfmt,...)
Use this function in warning situations.
Definition TError.cxx:229
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h offset
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Int_t gDebug
Definition TROOT.cxx:597
#define gROOT
Definition TROOT.h:407
R__EXTERN TSystem * gSystem
Definition TSystem.h:560
virtual Int_t GetValue(const char *name, Int_t dflt) const
Returns the integer value for a resource.
Definition TEnv.cxx:491
TUrl fUrl
!URL of file
Definition TFile.h:110
R__ALWAYS_INLINE Bool_t IsZombie() const
Definition TObject.h:153
void MakeZombie()
Definition TObject.h:53
Wrapper for PCRE library (Perl Compatible Regular Expressions).
Definition TPRegexp.h:97
Int_t Match(const TString &s, UInt_t start=0)
Runs a match on s against the regex 'this' was created with.
Definition TPRegexp.cxx:706
TS3HTTPRequest & SetObjectKey(const TString &objectKey)
TString GetRequest(TS3HTTPRequest::EHTTPVerb httpVerb, Bool_t appendCRLF=kTRUE)
Returns the HTTP request ready to be sent to the server.
TS3HTTPRequest & SetAuthKeys(const TString &accessKey, const TString &secretKey)
TS3HTTPRequest & SetBucket(const TString &bucket)
TS3HTTPRequest & SetSessionToken(const TString &token)
TS3HTTPRequest & SetAuthType(TS3HTTPRequest::EAuthType authType)
TS3HTTPRequest & SetHost(const TString &host)
A TS3WebFile is a TWebFile which retrieves the file contents from a web server implementing the REST ...
Definition TS3WebFile.h:68
TS3HTTPRequest fS3Request
Definition TS3WebFile.h:87
Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf) override
Read the nbuf blocks described in arrays pos and len.
Bool_t ParseOptions(Option_t *options, TString &accessKey, TString &secretKey, TString &token)
Extracts the S3 authentication key pair (access key and secret key) from the options.
void SetMsgReadBuffer10(const char *redirectLocation=nullptr, Bool_t tempRedirect=kFALSE) override
Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET request compliant to the authentic...
Int_t GetHead() override
Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this file.
Bool_t fUseMultiRange
Definition TS3WebFile.h:88
void ProcessHttpHeader(const TString &headerLine) override
This method is called by the super-class TWebFile when a HTTP header for this file is retrieved.
Bool_t GetCredentialsFromEnv(const char *accessKeyEnv, const char *secretKeyEnv, const char *tokenEnv, TString &outAccessKey, TString &outSecretKey, TString &outToken)
Sets the access and secret keys from the environmental variables, if they are both set.
Basic string class.
Definition TString.h:139
@ kIgnoreCase
Definition TString.h:279
Bool_t IsNull() const
Definition TString.h:418
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2356
Bool_t Contains(const char *pat, ECaseCompare cmp=kExact) const
Definition TString.h:636
virtual const char * Getenv(const char *env)
Get environment variable.
Definition TSystem.cxx:1650
void SetUrl(const char *url, Bool_t defaultIsFile=kFALSE)
Parse url character string and split in its different subcomponents.
Definition TUrl.cxx:110
const char * GetHost() const
Definition TUrl.h:67
virtual Int_t GetHead()
Get the HTTP header.
virtual void CheckProxy()
Check if shell var "http_proxy" has been set and should be used.
Definition TWebFile.cxx:353
Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf) override
Read specified byte ranges from remote file via HTTP daemon.
Definition TWebFile.cxx:518
TString fMsgGetHead
Definition TWebFile.h:50
void Init(Bool_t readHeadOnly) override
Initialize a TWebFile object.
Definition TWebFile.cxx:212
virtual Int_t GetFromWeb10(char *buf, Int_t len, const TString &msg, Int_t nseg=0, Long64_t *seg_pos=nullptr, Int_t *seg_len=nullptr)
Read multiple byte range request from web server.
Definition TWebFile.cxx:693
virtual void SetMsgReadBuffer10(const char *redirectLocation=nullptr, Bool_t tempRedirect=kFALSE)
Set GET command for use by ReadBuffer(s)10(), handle redirection if needed.
Definition TWebFile.cxx:268
TString fMsgReadBuffer10
Definition TWebFile.h:49
Bool_t fNoProxy
Definition TWebFile.h:47