Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TS3WebFile.cxx
Go to the documentation of this file.
1// @(#)root/net:$Id$
2// Author: Fabio Hernandez 22/01/2013
3// extending an initial version by Marcelo Sousa (class TAS3File)
4
5/*************************************************************************
6 * Copyright (C) 1995-2011, Rene Brun and Fons Rademakers. *
7 * All rights reserved. *
8 * *
9 * For the licensing terms see $ROOTSYS/LICENSE. *
10 * For the list of contributors see $ROOTSYS/README/CREDITS. *
11 *************************************************************************/
12
13/**
14\file TS3WebFile.cxx
15\class TS3WebFile
16\ingroup IO
17
18A TS3WebFile is a TWebFile which retrieves the file contents from a
19web server implementing the REST API of the Amazon S3 protocol. This
20class is meant to be as generic as possible to be used with files
21hosted not only by Amazon S3 servers but also by other providers
22implementing the core of the S3 protocol.
23
24The S3 protocol works on top of HTTPS (and HTTP) and imposes that
25each HTTP request be signed using a specific convention: the request
26must include an 'Authorization' header which contains the signature
27of a concatenation of selected request fields. For signing the
28request, an 'Access Key Id' and a 'Secret Access Key' need to be
29known. These keys are used by the S3 servers to identify the client
30and to authenticate the request as genuine.
31
32As an end user, you must know the Access Key and Secret Access Key
33in order to access each S3 file. They are provided to you by your S3
34service provider. Those two keys can be provided to ROOT when
35initializing an object of this class by two means:
36a. by using the environmental variables S3_ACCESS_KEY and
37 S3_SECRET_KEY, or
38b. by specifying them when opening each file.
39
40You can use AWS temporary security credentials (temporary access key
41and secret access key), but you must also give the associated
42session token. The token may be set in the S3_SESSION_TOKEN
43environmental variable, or on open in the TOKEN option.
44
45The first method is convenient if all the S3 files you want to
46access are hosted by a single provider. The second one is more
47flexible as it allows you to specify which credentials to use
48on a per-file basis. See the documentation of the constructor of
49this class for details on the syntax.
50
51For generating and signing the HTTP request, this class uses
52TS3HTTPRequest.
53
54For more information on the details of S3 protocol please refer to:
55"Amazon Simple Storage Service Developer Guide":
56http://docs.amazonwebservices.com/AmazonS3/latest/dev/Welcome.html
57
58"Amazon Simple Storage Service REST API Reference"
59 http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
60
61**/
62
63#include "TS3WebFile.h"
64#include "TROOT.h"
65#include "TError.h"
66#include "TSystem.h"
67#include "TPRegexp.h"
68#include "TEnv.h"
69
70
71
72////////////////////////////////////////////////////////////////////////////////
73/// Construct a TS3WebFile object. The path argument is a URL of one of the
74/// following forms:
75///
76/// ```
77/// s3://host.example.com/bucket/path/to/my/file
78/// s3http://host.example.com/bucket/path/to/my/file
79/// s3https://host.example.com/bucket/path/to/my/file
80/// as3://host.example.com/bucket/path/to/my/file
81/// ```
82///
83/// For files hosted by Google Storage, use the following forms:
84///
85/// ```
86/// gs://storage.googleapis.com/bucket/path/to/my/file
87/// gshttp://storage.googleapis.com/bucket/path/to/my/file
88/// gsthttps://storage.googleapis.com/bucket/path/to/my/file
89/// ```
90///
91/// The 'as3' scheme is accepted for backwards compatibility but its usage is
92/// deprecated.
93///
94/// The recommended way to create an instance of this class is through
95/// TFile::Open, for instance:
96///
97/// ```c++
98/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file")
99/// TFile* f2 = TFile::Open("gs://storage.googleapis.com/bucket/path/to/my/file")
100/// ```
101///
102/// The specified scheme (i.e. s3, s3http, s3https, ...) determines the underlying
103/// transport protocol to use for downloading the file contents, namely HTTP or HTTPS.
104/// The 's3', 's3https', 'gs' and 'gshttps' schemes imply using HTTPS as the transport
105/// protocol. The 's3http', 'as3' and 'gshttp' schemes imply using HTTP as the transport
106/// protocol.
107///
108/// The 'options' argument can contain 'NOPROXY' if you want to bypass
109/// the HTTP proxy when retrieving this file's contents. As for any TWebFile-derived
110/// object, the URL of the web proxy can be specified by setting an environmental
111/// variable 'http_proxy'. If this variable is set, we ask that proxy to route our
112/// requests HTTP(S) requests to the file server.
113///
114/// In addition, you can also use the 'options' argument to provide the access key
115/// and secret key to be used for authentication purposes for this file by using a
116/// string of the form "AUTH=myAccessKey:mySecretkey". This may be useful to
117/// open several files hosted by different providers in the same program/macro,
118/// where the environemntal variables solution is not convenient (see below).
119///
120/// To use AWS temporary security credentials you need to specify the session
121/// token. This can be added to the options argument with a string of the form
122/// TOKEN=mySessionToken. The temporary access and secret keys must also be
123/// available, either via the AUTH option or by environmental variable.
124///
125/// If you need to specify more than one option separate them by ' '
126/// (blank), for instance:
127/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+"
128///
129/// Examples:
130/// ```
131/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
132/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
133/// TFile* f2 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
134/// "AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
135/// TFile* f3 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
136/// "TOKEN=AQoDYXdzEM///////////wEa8AHEYmCinjD+TsGEjtgKSMAT6wnY");
137/// ```
138///
139/// If there is no authentication information in the 'options' argument
140/// (i.e. not AUTH="....") the values of the environmental variables
141/// S3_ACCESS_KEY and S3_SECRET_KEY (if set) are expected to contain
142/// the access key id and the secret access key, respectively. You have
143/// been provided with these credentials by your S3 service provider.
144///
145/// If neither the AUTH information is provided in the 'options' argument
146/// nor the environmental variables are set, we try to open the file
147/// without providing any authentication information to the server. This
148/// is useful when the file is set an access control that allows for
149/// any unidentified user to read the file.
150
151TS3WebFile::TS3WebFile(const char* path, Option_t* options)
152 : TWebFile(path, "IO")
153{
154 // Make sure this is a valid S3 path. We accept 'as3' as a scheme, for
155 // backwards compatibility
160 TString token;
161 TPMERegexp rex("^([a]?s3|s3http[s]?|gs|gshttp[s]?){1}://([^/]+)/([^/]+)/([^/].*)", "i");
162 if (rex.Match(TString(path)) != 5) {
163 errorMsg = TString::Format("invalid S3 path '%s'", path);
165 }
166 else if (!ParseOptions(options, accessKey, secretKey, token)) {
167 errorMsg = TString::Format("could not parse options '%s'", options);
169 }
170
171 // Should we stop initializing this object?
172 if (doMakeZombie) {
173 Error("TS3WebFile", "%s", (const char*)errorMsg);
174 MakeZombie();
176 return;
177 }
178
179 // Set this S3 object's URL, the bucket name this file is located in
180 // and the object key
182 fS3Request.SetObjectKey(TString::Format("/%s", (const char*)rex[4]));
183
184 // Initialize super-classes data members (fUrl is a data member of
185 // super-super class TFile)
186 TString protocol = "https";
187 if (rex[1].EndsWith("http", TString::kIgnoreCase) ||
188 rex[1].EqualTo("as3", TString::kIgnoreCase))
189 protocol = "http";
190 fUrl.SetUrl(TString::Format("%s://%s/%s/%s", (const char*)protocol,
191 (const char*)rex[2], (const char*)rex[3], (const char*)rex[4]));
192
193 // Set S3-specific data members. If the access and secret keys are not
194 // provided in the 'options' argument we look in the environmental
195 // variables.
196 const char* kAccessKeyEnv = "S3_ACCESS_KEY";
197 const char* kSecretKeyEnv = "S3_SECRET_KEY";
198 const char* kSessionToken = "S3_SESSION_TOKEN";
199 if (accessKey.IsNull())
201 accessKey, secretKey, token);
202
203 // Initialize the S3 HTTP request
205 if (accessKey.IsNull() || secretKey.IsNull()) {
206 // We have no authentication information, neither in the options
207 // nor in the enviromental variables. So may be this is a
208 // world-readable file, so let's continue and see if
209 // we can open it.
211 } else {
212 // Set the authentication information we need to use
213 // for this file
215 if (!token.IsNull())
217 if (rex[1].BeginsWith("gs"))
219 else
221 }
222
223 // Assume this server does not serve multi-range HTTP GET requests. We
224 // will detect this when the HTTP headers of this files are retrieved
225 // later in the initialization process
227
228 // Call super-class initializer
230
231 // Were there some errors opening this file?
232 if (IsZombie() && (accessKey.IsNull() || secretKey.IsNull())) {
233 // We could not open the file and we have no authentication information
234 // so inform the user so that they can check.
235 Error("TS3WebFile", "could not find authentication info in "\
236 "'options' argument and at least one of the environment variables '%s' or '%s' is not set",
238 }
239}
240
241
242////////////////////////////////////////////////////////////////////////////////
243/// Extracts the S3 authentication key pair (access key and secret key)
244/// from the options. The authentication credentials can be specified in
245/// the options provided to the constructor of this class as a string
246/// containing: "AUTH=<access key>:<secret key>" and can include other
247/// options, for instance "NOPROXY" for not using the HTTP proxy for
248/// accessing this file's contents.
249/// For instance:
250/// "NOPROXY AUTH=F38XYZABCDeFgHiJkLm:V+frt4re7J1euSNFnmaf8wwmI401234E7kzxZ/TTM+"
251/// A security token may be given by the TOKEN option, in order to allow the
252/// use of a temporary key pair.
253
255{
256 TString optStr = (const char*)options;
257 if (optStr.IsNull())
258 return kTRUE;
259
261 if (optStr.Contains("NOPROXY", TString::kIgnoreCase))
262 fNoProxy = kTRUE;
263 CheckProxy();
264
265 // Look in the options string for the authentication information.
266 TPMERegexp rex_token("(^TOKEN=|^.* TOKEN=)([\\S]+)[\\s]*.*$", "i");
267 if (rex_token.Match(optStr) == 3) {
268 token = rex_token[2];
269 }
270 TPMERegexp rex("(^AUTH=|^.* AUTH=)([a-z0-9]+):([a-z0-9+/]+)[\\s]*.*$", "i");
271 if (rex.Match(optStr) == 4) {
272 accessKey = rex[2];
273 secretKey = rex[3];
274 }
275 if (gDebug > 0)
276 Info("ParseOptions", "using authentication information from 'options' argument");
277 return kTRUE;
278}
279
280
281////////////////////////////////////////////////////////////////////////////////
282/// Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this
283/// file. Uses TS3HTTPRequest to generate an HTTP HEAD request which includes
284/// the authorization header expected by the S3 server.
285
291
292
293////////////////////////////////////////////////////////////////////////////////
294/// Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET
295/// request compliant to the authentication mechanism used by the S3
296/// protocol. The GET request must contain an "Authorization" header with
297/// the signature of the request, generated using the user's secret access
298/// key.
299
306
307
308////////////////////////////////////////////////////////////////////////////////
309
311{
312 // Overwrites TWebFile::ReadBuffers() for reading specified byte ranges.
313 // According to the kind of server this file is hosted by, we use a
314 // single HTTP request with a muti-range header or we generate multiple
315 // requests with a single range each.
316
317 // Does this server support multi-range GET requests?
318 if (fUseMultiRange)
319 return TWebFile::ReadBuffers(buf, pos, len, nbuf);
320
321 // Send multiple GET requests with a single range of bytes
322 // Adapted from original version by Wang Lu
323 for (Int_t i=0, offset=0; i < nbuf; i++) {
324 TString rangeHeader = TString::Format("Range: bytes=%lld-%lld\r\n\r\n",
325 pos[i], pos[i] + len[i] - 1);
327 if (GetFromWeb10(&buf[offset], len[i], s3Request) == -1)
328 return kTRUE;
329 offset += len[i];
330 }
331 return kFALSE;
332}
333
334
335////////////////////////////////////////////////////////////////////////////////
336/// This method is called by the super-class TWebFile when a HTTP header
337/// for this file is retrieved. We scan the 'Server' header to detect the
338/// type of S3 server this file is hosted on and to determine if it is
339/// known to support multi-range HTTP GET requests. Some S3 servers (for
340/// instance Amazon's) do not support that feature and when they
341/// receive a multi-range request they sent back the whole file contents.
342/// For this class, if the server do not support multirange requests
343/// we issue multiple single-range requests instead.
344
346{
347 TPMERegexp rex("^Server: (.+)", "i");
348 if (rex.Match(headerLine) != 2)
349 return;
350
351 // Extract the identity of this server and compare it to the
352 // identify of the servers known to support multi-range requests.
353 // The list of server identities is expected to be found in ROOT
354 // configuration.
355 TString serverId = rex[1].ReplaceAll("\r", "").ReplaceAll("\n", "");
356 TString multirangeServers(gEnv->GetValue("TS3WebFile.Root.MultiRangeServer", ""));
358}
359
360
361////////////////////////////////////////////////////////////////////////////////
362/// Sets the access and secret keys from the environmental variables, if
363/// they are both set. Sets the security session token if it is given.
364
366 const char* tokenEnv, TString& outAccessKey,
368{
369 // Look first in the recommended environmental variables. Both variables
370 // must be set.
373 TString token = gSystem->Getenv(tokenEnv);
374 if (!token.IsNull()) {
375 outToken = token;
376 }
377 if (!accKey.IsNull() && !secKey.IsNull()) {
380 if (gDebug > 0)
381 Info("GetCredentialsFromEnv", "using authentication information from environmental variables '%s' and '%s'",
383 return kTRUE;
384 }
385
386 // Look now in the legacy environmental variables, for keeping backwards
387 // compatibility.
388 accKey = gSystem->Getenv("S3_ACCESS_ID"); // Legacy access key
389 secKey = gSystem->Getenv("S3_ACCESS_KEY"); // Legacy secret key
390 if (!accKey.IsNull() && !secKey.IsNull()) {
391 Warning("SetAuthKeys", "usage of S3_ACCESS_ID and S3_ACCESS_KEY environmental variables is deprecated.");
392 Warning("SetAuthKeys", "please use S3_ACCESS_KEY and S3_SECRET_KEY environmental variables.");
395 return kTRUE;
396 }
397
398 return kFALSE;
399}
400
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
long long Long64_t
Portable signed long integer 8 bytes.
Definition RtypesCore.h:83
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
const char Option_t
Option string (const char)
Definition RtypesCore.h:80
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
#define gDirectory
Definition TDirectory.h:385
R__EXTERN TEnv * gEnv
Definition TEnv.h:170
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
void Error(const char *location, const char *msgfmt,...)
Use this function in case an error occurred.
Definition TError.cxx:208
void Warning(const char *location, const char *msgfmt,...)
Use this function in warning situations.
Definition TError.cxx:252
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h offset
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Int_t gDebug
Global variable setting the debug level. Set to 0 to disable, increase it in steps of 1 to increase t...
Definition TROOT.cxx:627
#define gROOT
Definition TROOT.h:411
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
virtual Int_t GetValue(const char *name, Int_t dflt) const
Returns the integer value for a resource.
Definition TEnv.cxx:490
TUrl fUrl
!URL of file
Definition TFile.h:189
R__ALWAYS_INLINE Bool_t IsZombie() const
Definition TObject.h:159
void MakeZombie()
Definition TObject.h:53
Wrapper for PCRE library (Perl Compatible Regular Expressions).
Definition TPRegexp.h:97
TS3HTTPRequest & SetObjectKey(const TString &objectKey)
TString GetRequest(TS3HTTPRequest::EHTTPVerb httpVerb, Bool_t appendCRLF=kTRUE)
Returns the HTTP request ready to be sent to the server.
TS3HTTPRequest & SetAuthKeys(const TString &accessKey, const TString &secretKey)
TS3HTTPRequest & SetBucket(const TString &bucket)
TS3HTTPRequest & SetSessionToken(const TString &token)
TS3HTTPRequest & SetAuthType(TS3HTTPRequest::EAuthType authType)
TS3HTTPRequest & SetHost(const TString &host)
TS3HTTPRequest fS3Request
Definition TS3WebFile.h:87
Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf) override
Read the nbuf blocks described in arrays pos and len.
Bool_t ParseOptions(Option_t *options, TString &accessKey, TString &secretKey, TString &token)
Extracts the S3 authentication key pair (access key and secret key) from the options.
void SetMsgReadBuffer10(const char *redirectLocation=nullptr, Bool_t tempRedirect=kFALSE) override
Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET request compliant to the authentic...
Int_t GetHead() override
Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this file.
Bool_t fUseMultiRange
Definition TS3WebFile.h:88
void ProcessHttpHeader(const TString &headerLine) override
This method is called by the super-class TWebFile when a HTTP header for this file is retrieved.
Bool_t GetCredentialsFromEnv(const char *accessKeyEnv, const char *secretKeyEnv, const char *tokenEnv, TString &outAccessKey, TString &outSecretKey, TString &outToken)
Sets the access and secret keys from the environmental variables, if they are both set.
Basic string class.
Definition TString.h:138
@ kIgnoreCase
Definition TString.h:285
Bool_t IsNull() const
Definition TString.h:422
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2384
virtual const char * Getenv(const char *env)
Get environment variable.
Definition TSystem.cxx:1676
void SetUrl(const char *url, Bool_t defaultIsFile=kFALSE)
Parse url character string and split in its different subcomponents.
Definition TUrl.cxx:107
const char * GetHost() const
Definition TUrl.h:67
virtual Int_t GetHead()
Get the HTTP header.
virtual void CheckProxy()
Check if shell var "http_proxy" has been set and should be used.
Definition TWebFile.cxx:358
Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf) override
Read specified byte ranges from remote file via HTTP daemon.
Definition TWebFile.cxx:522
TString fMsgGetHead
Definition TWebFile.h:50
void Init(Bool_t readHeadOnly) override
Initialize a TWebFile object.
Definition TWebFile.cxx:211
virtual Int_t GetFromWeb10(char *buf, Int_t len, const TString &msg, Int_t nseg=0, Long64_t *seg_pos=nullptr, Int_t *seg_len=nullptr)
Read multiple byte range request from web server.
Definition TWebFile.cxx:697
virtual void SetMsgReadBuffer10(const char *redirectLocation=nullptr, Bool_t tempRedirect=kFALSE)
Set GET command for use by ReadBuffer(s)10(), handle redirection if needed.
Definition TWebFile.cxx:271
TString fMsgReadBuffer10
Definition TWebFile.h:49
Bool_t fNoProxy
Definition TWebFile.h:47