Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TS3WebFile.cxx
Go to the documentation of this file.
1// @(#)root/net:$Id$
2// Author: Fabio Hernandez 22/01/2013
3// extending an initial version by Marcelo Sousa (class TAS3File)
4
5/*************************************************************************
6 * Copyright (C) 1995-2011, Rene Brun and Fons Rademakers. *
7 * All rights reserved. *
8 * *
9 * For the licensing terms see $ROOTSYS/LICENSE. *
10 * For the list of contributors see $ROOTSYS/README/CREDITS. *
11 *************************************************************************/
12
13/**
14\file TS3WebFile.cxx
15\class TS3WebFile
16\ingroup IO
17
18A TS3WebFile is a TWebFile which retrieves the file contents from a
19web server implementing the REST API of the Amazon S3 protocol. This
20class is meant to be as generic as possible to be used with files
21hosted not only by Amazon S3 servers but also by other providers
22implementing the core of the S3 protocol.
23
24The S3 protocol works on top of HTTPS (and HTTP) and imposes that
25each HTTP request be signed using a specific convention: the request
26must include an 'Authorization' header which contains the signature
27of a concatenation of selected request fields. For signing the
28request, an 'Access Key Id' and a 'Secret Access Key' need to be
29known. These keys are used by the S3 servers to identify the client
30and to authenticate the request as genuine.
31
32As an end user, you must know the Access Key and Secret Access Key
33in order to access each S3 file. They are provided to you by your S3
34service provider. Those two keys can be provided to ROOT when
35initializing an object of this class by two means:
36a. by using the environmental variables S3_ACCESS_KEY and
37 S3_SECRET_KEY, or
38b. by specifying them when opening each file.
39
40You can use AWS temporary security credentials (temporary access key
41and secret access key), but you must also give the associated
42session token. The token may be set in the S3_SESSION_TOKEN
43environmental variable, or on open in the TOKEN option.
44
45The first method is convenient if all the S3 files you want to
46access are hosted by a single provider. The second one is more
47flexible as it allows you to specify which credentials to use
48on a per-file basis. See the documentation of the constructor of
49this class for details on the syntax.
50
51For generating and signing the HTTP request, this class uses
52TS3HTTPRequest.
53
54For more information on the details of S3 protocol please refer to:
55"Amazon Simple Storage Service Developer Guide":
56http://docs.amazonwebservices.com/AmazonS3/latest/dev/Welcome.html
57
58"Amazon Simple Storage Service REST API Reference"
59 http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
60
61**/
62
63#include "TS3WebFile.h"
64#include "TROOT.h"
65#include "TError.h"
66#include "TSystem.h"
67#include "TPRegexp.h"
68#include "TEnv.h"
69
70
71
72////////////////////////////////////////////////////////////////////////////////
73/// Construct a TS3WebFile object. The path argument is a URL of one of the
74/// following forms:
75///
76/// ```
77/// s3://host.example.com/bucket/path/to/my/file
78/// s3http://host.example.com/bucket/path/to/my/file
79/// s3https://host.example.com/bucket/path/to/my/file
80/// as3://host.example.com/bucket/path/to/my/file
81/// ```
82///
83/// For files hosted by Google Storage, use the following forms:
84///
85/// ```
86/// gs://storage.googleapis.com/bucket/path/to/my/file
87/// gshttp://storage.googleapis.com/bucket/path/to/my/file
88/// gsthttps://storage.googleapis.com/bucket/path/to/my/file
89/// ```
90///
91/// The 'as3' scheme is accepted for backwards compatibility but its usage is
92/// deprecated.
93///
94/// The recommended way to create an instance of this class is through
95/// TFile::Open, for instance:
96///
97/// ```c++
98/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file")
99/// TFile* f2 = TFile::Open("gs://storage.googleapis.com/bucket/path/to/my/file")
100/// ```
101///
102/// The specified scheme (i.e. s3, s3http, s3https, ...) determines the underlying
103/// transport protocol to use for downloading the file contents, namely HTTP or HTTPS.
104/// The 's3', 's3https', 'gs' and 'gshttps' schemes imply using HTTPS as the transport
105/// protocol. The 's3http', 'as3' and 'gshttp' schemes imply using HTTP as the transport
106/// protocol.
107///
108/// The 'options' argument can contain 'NOPROXY' if you want to bypass
109/// the HTTP proxy when retrieving this file's contents. As for any TWebFile-derived
110/// object, the URL of the web proxy can be specified by setting an environmental
111/// variable 'http_proxy'. If this variable is set, we ask that proxy to route our
112/// requests HTTP(S) requests to the file server.
113///
114/// In addition, you can also use the 'options' argument to provide the access key
115/// and secret key to be used for authentication purposes for this file by using a
116/// string of the form "AUTH=myAccessKey:mySecretkey". This may be useful to
117/// open several files hosted by different providers in the same program/macro,
118/// where the environemntal variables solution is not convenient (see below).
119///
120/// To use AWS temporary security credentials you need to specify the session
121/// token. This can be added to the options argument with a string of the form
122/// TOKEN=mySessionToken. The temporary access and secret keys must also be
123/// available, either via the AUTH option or by environmental variable.
124///
125/// If you need to specify more than one option separate them by ' '
126/// (blank), for instance:
127/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+"
128///
129/// Examples:
130/// ```
131/// TFile* f1 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
132/// "NOPROXY AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
133/// TFile* f2 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
134/// "AUTH=F38XYZABCDeFgH4D0E1F:V+frt4re7J1euSNFnmaf8wwmI4AAAE7kzxZ/TTM+");
135/// TFile* f3 = TFile::Open("s3://host.example.com/bucket/path/to/my/file",
136/// "TOKEN=AQoDYXdzEM///////////wEa8AHEYmCinjD+TsGEjtgKSMAT6wnY");
137/// ```
138///
139/// If there is no authentication information in the 'options' argument
140/// (i.e. not AUTH="....") the values of the environmental variables
141/// S3_ACCESS_KEY and S3_SECRET_KEY (if set) are expected to contain
142/// the access key id and the secret access key, respectively. You have
143/// been provided with these credentials by your S3 service provider.
144///
145/// If neither the AUTH information is provided in the 'options' argument
146/// nor the environmental variables are set, we try to open the file
147/// without providing any authentication information to the server. This
148/// is useful when the file is set an access control that allows for
149/// any unidentified user to read the file.
150
151ROOT::Deprecated::TS3WebFile::TS3WebFile(const char* path, Option_t* options) : TWebFile(path, "IO")
152{
153 // Make sure this is a valid S3 path. We accept 'as3' as a scheme, for
154 // backwards compatibility
159 TString token;
160 TPMERegexp rex("^([a]?s3|s3http[s]?|gs|gshttp[s]?){1}://([^/]+)/([^/]+)/([^/].*)", "i");
161 if (rex.Match(TString(path)) != 5) {
162 errorMsg = TString::Format("invalid S3 path '%s'", path);
164 }
165 else if (!ParseOptions(options, accessKey, secretKey, token)) {
166 errorMsg = TString::Format("could not parse options '%s'", options);
168 }
169
170 // Should we stop initializing this object?
171 if (doMakeZombie) {
172 Error("TS3WebFile", "%s", (const char*)errorMsg);
173 MakeZombie();
175 return;
176 }
177
178 // Set this S3 object's URL, the bucket name this file is located in
179 // and the object key
181 fS3Request.SetObjectKey(TString::Format("/%s", (const char*)rex[4]));
182
183 // Initialize super-classes data members (fUrl is a data member of
184 // super-super class TFile)
185 TString protocol = "https";
186 if (rex[1].EndsWith("http", TString::kIgnoreCase) ||
187 rex[1].EqualTo("as3", TString::kIgnoreCase))
188 protocol = "http";
189 fUrl.SetUrl(TString::Format("%s://%s/%s/%s", (const char*)protocol,
190 (const char*)rex[2], (const char*)rex[3], (const char*)rex[4]));
191
192 // Set S3-specific data members. If the access and secret keys are not
193 // provided in the 'options' argument we look in the environmental
194 // variables.
195 const char* kAccessKeyEnv = "S3_ACCESS_KEY";
196 const char* kSecretKeyEnv = "S3_SECRET_KEY";
197 const char* kSessionToken = "S3_SESSION_TOKEN";
198 if (accessKey.IsNull())
200 accessKey, secretKey, token);
201
202 // Initialize the S3 HTTP request
204 if (accessKey.IsNull() || secretKey.IsNull()) {
205 // We have no authentication information, neither in the options
206 // nor in the enviromental variables. So may be this is a
207 // world-readable file, so let's continue and see if
208 // we can open it.
210 } else {
211 // Set the authentication information we need to use
212 // for this file
214 if (!token.IsNull())
216 if (rex[1].BeginsWith("gs"))
218 else
220 }
221
222 // Assume this server does not serve multi-range HTTP GET requests. We
223 // will detect this when the HTTP headers of this files are retrieved
224 // later in the initialization process
226
227 // Call super-class initializer
229
230 // Were there some errors opening this file?
231 if (IsZombie() && (accessKey.IsNull() || secretKey.IsNull())) {
232 // We could not open the file and we have no authentication information
233 // so inform the user so that they can check.
234 Error("TS3WebFile", "could not find authentication info in "\
235 "'options' argument and at least one of the environment variables '%s' or '%s' is not set",
237 }
238}
239
240
241////////////////////////////////////////////////////////////////////////////////
242/// Extracts the S3 authentication key pair (access key and secret key)
243/// from the options. The authentication credentials can be specified in
244/// the options provided to the constructor of this class as a string
245/// containing: "AUTH=<access key>:<secret key>" and can include other
246/// options, for instance "NOPROXY" for not using the HTTP proxy for
247/// accessing this file's contents.
248/// For instance:
249/// "NOPROXY AUTH=F38XYZABCDeFgHiJkLm:V+frt4re7J1euSNFnmaf8wwmI401234E7kzxZ/TTM+"
250/// A security token may be given by the TOKEN option, in order to allow the
251/// use of a temporary key pair.
252
253Bool_t
255{
256 TString optStr = (const char*)options;
257 if (optStr.IsNull())
258 return kTRUE;
259
260 fNoProxy = kFALSE;
261 if (optStr.Contains("NOPROXY", TString::kIgnoreCase))
262 fNoProxy = kTRUE;
263 CheckProxy();
264
265 // Look in the options string for the authentication information.
266 TPMERegexp rex_token("(^TOKEN=|^.* TOKEN=)([\\S]+)[\\s]*.*$", "i");
267 if (rex_token.Match(optStr) == 3) {
268 token = rex_token[2];
269 }
270 TPMERegexp rex("(^AUTH=|^.* AUTH=)([a-z0-9]+):([a-z0-9+/]+)[\\s]*.*$", "i");
271 if (rex.Match(optStr) == 4) {
272 accessKey = rex[2];
273 secretKey = rex[3];
274 }
275 if (gDebug > 0)
276 Info("ParseOptions", "using authentication information from 'options' argument");
277 return kTRUE;
278}
279
280
281////////////////////////////////////////////////////////////////////////////////
282/// Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this
283/// file. Uses TS3HTTPRequest to generate an HTTP HEAD request which includes
284/// the authorization header expected by the S3 server.
285
287{
288 fMsgGetHead = fS3Request.GetRequest(TS3HTTPRequest::kHEAD);
289 return TWebFile::GetHead();
290}
291
292
293////////////////////////////////////////////////////////////////////////////////
294/// Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET
295/// request compliant to the authentication mechanism used by the S3
296/// protocol. The GET request must contain an "Authorization" header with
297/// the signature of the request, generated using the user's secret access
298/// key.
299
301{
303 fMsgReadBuffer10 = fS3Request.GetRequest(TS3HTTPRequest::kGET, kFALSE) + "Range: bytes=";
304 return;
305}
306
307
308////////////////////////////////////////////////////////////////////////////////
309
311{
312 // Overwrites TWebFile::ReadBuffers() for reading specified byte ranges.
313 // According to the kind of server this file is hosted by, we use a
314 // single HTTP request with a muti-range header or we generate multiple
315 // requests with a single range each.
316
317 // Does this server support multi-range GET requests?
318 if (fUseMultiRange)
319 return TWebFile::ReadBuffers(buf, pos, len, nbuf);
320
321 // Send multiple GET requests with a single range of bytes
322 // Adapted from original version by Wang Lu
323 for (Int_t i=0, offset=0; i < nbuf; i++) {
324 TString rangeHeader = TString::Format("Range: bytes=%lld-%lld\r\n\r\n",
325 pos[i], pos[i] + len[i] - 1);
326 TString s3Request = fS3Request.GetRequest(TS3HTTPRequest::kGET, kFALSE) + rangeHeader;
327 if (GetFromWeb10(&buf[offset], len[i], s3Request) == -1)
328 return kTRUE;
329 offset += len[i];
330 }
331 return kFALSE;
332}
333
334
335////////////////////////////////////////////////////////////////////////////////
336/// This method is called by the super-class TWebFile when a HTTP header
337/// for this file is retrieved. We scan the 'Server' header to detect the
338/// type of S3 server this file is hosted on and to determine if it is
339/// known to support multi-range HTTP GET requests. Some S3 servers (for
340/// instance Amazon's) do not support that feature and when they
341/// receive a multi-range request they sent back the whole file contents.
342/// For this class, if the server do not support multirange requests
343/// we issue multiple single-range requests instead.
344
346{
347 TPMERegexp rex("^Server: (.+)", "i");
348 if (rex.Match(headerLine) != 2)
349 return;
350
351 // Extract the identity of this server and compare it to the
352 // identify of the servers known to support multi-range requests.
353 // The list of server identities is expected to be found in ROOT
354 // configuration.
355 TString serverId = rex[1].ReplaceAll("\r", "").ReplaceAll("\n", "");
356 TString multirangeServers(gEnv->GetValue("TS3WebFile.Root.MultiRangeServer", ""));
357 fUseMultiRange = multirangeServers.Contains(serverId, TString::kIgnoreCase) ? kTRUE : kFALSE;
358}
359
360
361////////////////////////////////////////////////////////////////////////////////
362/// Sets the access and secret keys from the environmental variables, if
363/// they are both set. Sets the security session token if it is given.
364
366 const char* tokenEnv, TString& outAccessKey,
368{
369 // Look first in the recommended environmental variables. Both variables
370 // must be set.
373 TString token = gSystem->Getenv(tokenEnv);
374 if (!token.IsNull()) {
375 outToken = token;
376 }
377 if (!accKey.IsNull() && !secKey.IsNull()) {
380 if (gDebug > 0)
381 Info("GetCredentialsFromEnv", "using authentication information from environmental variables '%s' and '%s'",
383 return kTRUE;
384 }
385
386 // Look now in the legacy environmental variables, for keeping backwards
387 // compatibility.
388 accKey = gSystem->Getenv("S3_ACCESS_ID"); // Legacy access key
389 secKey = gSystem->Getenv("S3_ACCESS_KEY"); // Legacy secret key
390 if (!accKey.IsNull() && !secKey.IsNull()) {
391 Warning("SetAuthKeys", "usage of S3_ACCESS_ID and S3_ACCESS_KEY environmental variables is deprecated.");
392 Warning("SetAuthKeys", "please use S3_ACCESS_KEY and S3_SECRET_KEY environmental variables.");
395 return kTRUE;
396 }
397
398 return kFALSE;
399}
400
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
long long Long64_t
Portable signed long integer 8 bytes.
Definition RtypesCore.h:83
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
const char Option_t
Option string (const char)
Definition RtypesCore.h:80
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
#define gDirectory
Definition TDirectory.h:385
R__EXTERN TEnv * gEnv
Definition TEnv.h:170
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
void Warning(const char *location, const char *msgfmt,...)
Use this function in warning situations.
Definition TError.cxx:252
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h offset
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Int_t gDebug
Global variable setting the debug level. Set to 0 to disable, increase it in steps of 1 to increase t...
Definition TROOT.cxx:783
#define gROOT
Definition TROOT.h:426
R__EXTERN TSystem * gSystem
Definition TSystem.h:582
TS3HTTPRequest & SetObjectKey(const TString &objectKey)
TS3HTTPRequest & SetAuthKeys(const TString &accessKey, const TString &secretKey)
TS3HTTPRequest & SetBucket(const TString &bucket)
TS3HTTPRequest & SetSessionToken(const TString &token)
TS3HTTPRequest & SetAuthType(TS3HTTPRequest::EAuthType authType)
TS3HTTPRequest & SetHost(const TString &host)
Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf) override
Read the nbuf blocks described in arrays pos and len.
Bool_t ParseOptions(Option_t *options, TString &accessKey, TString &secretKey, TString &token)
Extracts the S3 authentication key pair (access key and secret key) from the options.
void SetMsgReadBuffer10(const char *redirectLocation=nullptr, Bool_t tempRedirect=kFALSE) override
Overwrites TWebFile::SetMsgReadBuffer10() for setting the HTTP GET request compliant to the authentic...
Int_t GetHead() override
Overwrites TWebFile::GetHead() for retrieving the HTTP headers of this file.
void ProcessHttpHeader(const TString &headerLine) override
This method is called by the super-class TWebFile when a HTTP header for this file is retrieved.
Bool_t GetCredentialsFromEnv(const char *accessKeyEnv, const char *secretKeyEnv, const char *tokenEnv, TString &outAccessKey, TString &outSecretKey, TString &outToken)
Sets the access and secret keys from the environmental variables, if they are both set.
void Init(Bool_t readHeadOnly) override
Initialize a TWebFile object.
Definition TWebFile.cxx:214
virtual Int_t GetHead()
Get the HTTP header.
virtual void SetMsgReadBuffer10(const char *redirectLocation=nullptr, Bool_t tempRedirect=kFALSE)
Set GET command for use by ReadBuffer(s)10(), handle redirection if needed.
Definition TWebFile.cxx:274
Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf) override
Read specified byte ranges from remote file via HTTP daemon.
Definition TWebFile.cxx:525
virtual Int_t GetValue(const char *name, Int_t dflt) const
Returns the integer value for a resource.
Definition TEnv.cxx:503
TUrl fUrl
!URL of file
Definition TFile.h:188
R__ALWAYS_INLINE Bool_t IsZombie() const
Definition TObject.h:161
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:1095
void MakeZombie()
Definition TObject.h:55
Wrapper for PCRE library (Perl Compatible Regular Expressions).
Definition TPRegexp.h:97
Basic string class.
Definition TString.h:138
@ kIgnoreCase
Definition TString.h:285
Bool_t IsNull() const
Definition TString.h:422
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2384
virtual const char * Getenv(const char *env)
Get environment variable.
Definition TSystem.cxx:1676
void SetUrl(const char *url, Bool_t defaultIsFile=kFALSE)
Parse url character string and split in its different subcomponents.
Definition TUrl.cxx:107
const char * GetHost() const
Definition TUrl.h:67
bool EndsWith(std::string_view string, std::string_view suffix)