The Will Will Web

記載著 Will 在網路世界的學習心得與技術分享

使用新版 AzCopy v10 的注意事項與使用教學

最近在使用 AzCopy 的時候,發現怎麼跟以前差這麼多,這才發現原來最近出現了大改版,命令列的參數都跟以往不一樣了。這個新版改變蠻大的,我覺得對一個用過舊版的人來說,改用新版的第一印象真的不太好,研究的過程中發現了許多地雷,也發現許多優點,所以覺得有必要撰文分享一下心得。

安裝方式

在 Windows 我都是透過 Chocolatey 進行安裝:

  • 舊版 (v8)

    安裝舊版 AzCopy 並不會幫你註冊 PATH 環境變數,所以就算你安裝好,也必須透過 C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe 才能執行程式,否則你就叫自行設定 PATH 環境變數。

    choco install azcopy -y
    

    舊版查詢使用說明要用以下命令:

    azcopy /?
    
  • 新版 (v10)

    我今天才覺得奇怪,透過 choco install azcopy 安裝好之後,怎麼連基本的 PATH 環境變數都不幫我設定好,到後來才發現原來 AzCopy 推出了新版(v10)。新版 AzCopy v10 透過 azcopy10 來安裝就會自動設定好預設執行路徑了。

    choco install azcopy10 -y
    

    新版查詢使用說明要用以下命令:

    azcopy --help
    

新版 AzCopy v10 使用方式

秉持著台灣人從不看使用手冊,一種球來就打的精神,安裝好 azcopy10 之後,當然就是直接用 azcopy login --help 查詢登入用法。使用新版 AzCopy v10 最讓人崩潰的地方,就是連「登入」都有門檻,登入過程極其複雜,除了有非常多授權的選項外,連最簡單的選項基本上都沒辦法很順利的使用,而且發生錯誤的時候,也沒有清楚的指引。

  1. 登入 Azure AD 帳戶

    使用 AzCopy 登入最簡單的方法,就是使用 Azure AD 帳戶登入,只要執行 azcopy login 就好,不過我卻怎樣都無法登入成功!

    我接著看 Authorize access to blobs with AzCopy and Azure Active Directory (Azure AD) 這份文件,他說如果你無法用 azcopy login 登入的話,請改用 azcopy login --tenant-id=<tenant-id> 來登入,而這招確實可以順利登入成功!

  2. 列出 Blob Storage 特定 Container 下的檔案清單

    雖然我可以登入成功,但我執行 azcopy list https://xxxxx.blob.core.windows.net/site 卻會得到以下錯誤訊息:

    INFO: Authenticating to source using Azure AD
    
    failed to traverse container: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/azure-storage-blob-go@v0.10.1-0.20210407023846-16cf969ec1c3/azblob/zc_storage_error.go:42
    ===== RESPONSE ERROR (ServiceCode=AuthorizationPermissionMismatch) =====
    Description=This request is not authorized to perform this operation using this permission.
    RequestId:76f314d5-c01e-002f-064b-9763e0000000
    Time:2021-08-22T11:49:43.1775028Z, Details:
      Code: AuthorizationPermissionMismatch
      GET https://xxxxx.blob.core.windows.net/site?comp=list&delimiter=%2F&include=metadata&restype=container&timeout=901
      Authorization: REDACTED
      User-Agent: [AzCopy/10.11.0 Azure-Storage/0.13 (go1.15; Windows_NT)]
      X-Ms-Client-Request-Id: [6de2cb93-6ca7-4adc-450f-2ec3b65c058d]
      X-Ms-Version: [2019-12-12]
      --------------------------------------------------------------------------------
      RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.
      Content-Length: [279]
      Content-Type: [application/xml]
      Date: [Sun, 22 Aug 2021 11:49:42 GMT]
      Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
      X-Ms-Client-Request-Id: [6de2cb93-6ca7-4adc-450f-2ec3b65c058d]
      X-Ms-Error-Code: [AuthorizationPermissionMismatch]
      X-Ms-Request-Id: [76f314d5-c01e-002f-064b-9763e0000000]
      X-Ms-Version: [2019-12-12]
    

    這真的是一段相當難以理解的訊息啊! 🙄

    我再執行一次 azcopy login --help,可以從第一段文字看到一段重點:

    To be authorized to your Azure Storage account, you must assign the Storage Blob Data Contributor role to your user account in the context of either the Storage account, parent resource group or parent subscription.

    原來身為 OwnerService administrator 是沒有用的,還要從 Azure Storage account 額外授權 Storage Blob Data Contributor 角色才行。然而,我授權了這個角色給我自己,結果一樣不能用。原來還要重新用 azcopy login --tenant-id=<tenant-id> 登入一次,才能取得完整的存取授權! 🔥

    官網文件說我至少要授權 Storage Blob Data ContributorStorage Blob Data Owner 角色才能上傳檔案,但沒叫我要重新登入啊!原來是因為更新 RBAC 授權後,要重新取得 Access Token 才能擁有新的權限,所以要重新登入才能取得新的 Access Token!

  3. 上傳資料夾到 Blob Storage 特定 Container 下

    如果要將目前資料夾所有檔案與完整的子資料夾與檔案都上傳,請用以下命令:

    azcopy cp * https://xxxxx.blob.core.windows.net/site/ --recursive
    

    注意:使用 azcopy cp 的時候,千萬不要用 . 來代表目前所在資料,因為這樣寫會把當前資料夾目錄名稱也複製進去。例如你在 C:\site 資料夾下執行上述命令,就會上傳到 https://xxxxx.blob.core.windows.net/site/site/ 目錄下,這可能不是你想要的結果。

    azcopy cp . https://xxxxx.blob.core.windows.net/site/ --recursive
    

    如果你路徑混用 /\ 的話,將會得到一個 panic 錯誤,AzCopy 的 Exit Code 為 2

    C:\> azcopy sync D:\a\r1\a/QA-CI/drop/site "https://xxxxx.blob.core.windows.net/site"
    panic: inconsistent path separators. Some are forward, some are back. This is not supported.
    
  4. 刪除 Blob Storage 特定 Container 下所有檔案

    azcopy rm https://xxxxx.blob.core.windows.net/site/* --recursive
    
  5. 建立一個 Blob Container

    我們在使用 Static website hosting in Azure Storage 的時候,會需要建立一個名為 $web 的 Blob Container,此時你可以很簡便的使用 azcopy make 來建立。

    在 Command Prompt 底下,你要這樣執行:

    azcopy make "https://xxxxx.blob.core.windows.net/$web"
    

    在 PowerShell 底下,你一定要用「單引號」框住網址才行,不然 $web 會變成去讀取一個名為 web 的變數:

    azcopy make 'https://xxxxx.blob.core.windows.net/$web'
    

    如果要啟用 Static website hosting in Azure Storage 功能,還需要執行以下命令才能啟用:

    # 啟用 Static website hosting
    az storage blob service-properties update --auth-mode login \
        --account-name 'xxxxx' \
        --static-website \
        --404-document 'index.html' \
        --index-document 'index.html'
    
    # 取得對外公開的主要端點 (網址)
    az storage account show --name 'xxxxx' --query 'primaryEndpoints.web' -o tsv
    
  6. 同步兩個資料夾

    目前 AzCopy v10 支援以下四種來源目的的組合:

    1. Local <-> Azure Blob / Azure File (either SAS or OAuth authentication can be used)
    2. Azure Blob <-> Azure Blob (Source must include a SAS or is publicly accessible; either SAS or OAuth authentication can be used for destination)
    3. Azure File <-> Azure File (Source must include a SAS or is publicly accessible; SAS authentication should be used for destination)
    4. Azure Blob <-> Azure File

    從本機資料夾同步到 Blob Container 中

    azcopy sync . https://xxxxx.blob.core.windows.net/site/
    

    從本機資料夾同步到 Blob Container 中,但本機資料夾如果有刪除檔案,遠端也會跟著刪除

    azcopy sync . https://xxxxx.blob.core.windows.net/site/ --delete-destination=true
    

    然而使用同步功能有以下注意事項:

    1. 預設是以「資料夾」為單位,所以 --recursive 預設是啟用的
    2. 同步的過程主要是比對來源端目的端檔案修改時間而定,如果目的端的檔案比較新的話,檔案就不會同步過去,可以大幅節省同步的時間。
  7. 取得可用的環境變數清單

    AzCopy v10 有許多環境變數可以用來改變 AzCopy 的行為,完整的清單說明請見 AzCopy v10 configuration settings (Azure Storage) 官方文件。

    azcopy env
    
  8. 管理 AzCopy 作業 (Jobs)

    AzCopy 在執行的時候,會將預計進行的複製工作建立所謂的 Jobs (作業),當你遇到任何錯誤,作業會中斷,但是會記憶上次尚未完成的進度,並在下次執行相同參數命令的時候,自動接續上次未完成的作業,將還沒有複製完成的檔案持續嘗試複製成功。所以 Jobs 管理有好幾種使用案例。

    列出所有 AzCopy Jobs

    azcopy jobs list
    

    顯示特定一個 AzCopy Jobs 作業內容

    azcopy jobs show [jobID]
    

    接續特定一個 AzCopy Jobs 作業

    azcopy jobs resume [jobID]
    

    刪除特定一個 AzCopy Jobs 作業

    azcopy jobs remove [jobID]
    

    清空所有 AzCopy Jobs 紀錄

    azcopy jobs clean
    
  9. 登出 AzCopy (清空認證快取)

    azcopy logout
    
  10. 產生 Markdown 格式的 AzCopy 文件

    你可指定一個目錄,儲存完整的 AzCopy 說明文件 (Markdown format),方便你查詢相當完整的 AzCopy 使用方法!

    azcopy doc --output-location ./doc
    

使用 SAS 的注意事項

SAS (shared access signature) 是唯一不用登入就可以存取 Azure Storage 的方法,但是 Blob SAS token 的內容可能會包含 % 符號,這個符號在 Command Prompt 執行沒問題,但是寫在 批次檔 (*.bat) 就不一樣了,因為 % 是特殊字元,直接寫在批次檔中,執行時會出現以下錯誤:

AuthenticationErrorDetail: Signature size is invalid

你要將 Blob URL (Blob SAS token) 上面的 % 全部換成 %% 才可以正常執行!

關於 Azure Pipelines 的補充說明

Azure Pipelines 提供的 Hosted Agent 有提供兩個不同的 AzCopy 版本:

  1. 舊版 AzCopy v3.1.0

    "%AGENT_HOMEDIRECTORY%\externals\azcopy\azcopy.exe" /?
    
    D:\a\r1\a>"C:\agents\2.190.0\externals\azcopy\azcopy.exe" /?
    ------------------------------------------------------------------------------
    AzCopy 3.1.0 Copyright (c) 2014 Microsoft Corp. All Rights Reserved.
    ------------------------------------------------------------------------------
    
    AzCopy </Source:> </Dest:> [/SourceKey:] [/DestKey:] [/V:] [/Z:] [/@:] [/Y]
          [/SourceSAS:] [/DestSAS:] [/SourceType:] [/DestType:] [/S] [/Pattern:]
          [/CheckMD5] [/L] [/MT] [/XN] [/XO] [/A] [/IA] [/XA] [/NC:] [/BlobType:]
          [/Delimiter:] [/Snapshot] [/SyncCopy] [/SetContentType]
    
    
    /Source:<source>              Specifies the source data from which to copy.
                                  The source can be a file system directory, a
                                  blob container or a blob virtual directory.
    
    /Dest:<destination>           Specifies the destination to copy to. The
                                  destination can be a file system directory,
                                  a blob container or a blob virtual directory.
    
    /SourceKey:<storage-key>      Specifies the storage account key for the
                                  source resource.
    
    /DestKey:<storage-key>        Specifies the storage account key for the
                                  destination resource.
    
    /V:[verbose-log-file]         Outputs verbose status messages into a log
                                  file.
                                  By default, the verbose log file is named
                                  AzCopyVerbose.log in
                                  %LocalAppData%\Microsoft\Azure\AzCopy. If you
                                  specify an existing file location for this
                                  option, the verbose log will be appended to
                                  that file.
    
    /Z:[journal-file-folder]      Specifies a journal file folder for resuming an
                                  operation.
                                  AzCopy always supports resuming if an
                                  operation has been interrupted.
                                  If this option is not specified, or it is
                                  specified without a folder path, then AzCopy
                                  will create the journal file in the default
                                  location, which is
                                  %LocalAppData%\Microsoft\Azure\AzCopy.
                                  Each time you issue a command to AzCopy, it
                                  checks whether a journal file exists in the
                                  default folder, or whether it exists in a
                                  folder that you specified via this option. If
                                  the journal file does not exist in either
                                  place, AzCopy treats the operation as new and
                                  generates a new journal file.
                                  If the journal file does exist, AzCopy will
                                  check whether the command line that you input
                                  matches the command line in the journal file.
                                  If the two command lines match, AzCopy resumes
                                  the incomplete operation. If they do not match,
                                  you will be prompted to either overwrite the
                                  journal file to start a new operation, or to
                                  cancel the current operation.
                                  The journal file is deleted upon successful
                                  completion of the operation.
                                  Note that resuming an operation from a journal
                                  file created by a previous version of AzCopy
                                  is not supported.
    
    /@:<parameter-file>           Specifies a file that contains parameters.
                                  AzCopy processes the parameters in the file
                                  just as if they had been specified on the
                                  command line.
                                  In a response file, you can either specify
                                  multiple parameters on a single line, or
                                  specify each parameter on its own line. Note
                                  that an individual parameter cannot span
                                  multiple lines.
                                  Response files can include comments lines that
                                  begin with the # symbol.
                                  You can specify multiple response files.
                                  However, note that AzCopy does not support
                                  nested response files.
    
    /Y                            Suppresses all AzCopy confirmation prompts.
    
    /SourceSAS:<SAS-Token>        Specifies a Shared Access Signature with READ
                                  and LIST permissions for the source (if
                                  applicable). Surround the SAS with double
                                  quotes, as it may contains special command-line
                                  characters.
                                  If the source resource is a blob container,
                                  and neither a key nor a SAS is provided, then
                                  the blob container will be read via anonymous
                                  access.
    
    /DestSAS:<SAS-Token>          Specifies a Shared Access Signature (SAS) with
                                  READ and WRITE permissions for the
                                  destination (if applicable).
                                  Surround the SAS with double quotes, as it may
                                  contains special command-line characters.
                                  If the destination resource is a blob
                                  container, you can either specify this option
                                  followed by the SAS token, or you can specify
                                  the SAS as part of the destination blob
                                  container, without this option.
                                  If the source and destination are both blobs,
                                  then the destination blob must reside within
                                  the same storage account as the source blob.
    
    /SourceType:<blob>            Specifies that the source resource is a blob
                                  available in the local development environment,
                                  running in the storage emulator.
    
    /DestType:<blob>              Specifies that the destination resource is a
                                  blob available in the local development
                                  environment, running in the storage emulator.
    
    /S                            Specifies recursive mode for copy operations.
                                  In recursive mode, AzCopy will copy all blobs
                                  that match the specified file pattern,
                                  including those in subfolders.
    
    /Pattern:<file-pattern>       Specifies a file pattern that indicates which
                                  files to copy.
                                  The behavior of the /Pattern parameter is
                                  determined by the location of the source data,
                                  and the presence of the recursive mode option.
                                  Recursive mode is specified via option /S.
    
                                  If the specified source is a directory in
                                  the file system, then standard wildcards are
                                  in effect, and the file pattern provided is
                                  matched against files within the directory.
                                  If option /S is specified, then AzCopy also
                                  matches the specified pattern against all
                                  files in any subfolders beneath the directory.
    
                                  If the specified source is a blob container or
                                  virtual directory, then wildcards are not
                                  applied. If option /S is specified, then AzCopy
                                  interprets the specified file pattern as a blob
                                  prefix. If option /S is not specified, then
                                  AzCopy matches the file pattern against exact
                                  blob names.
    
                                  The default file pattern used when no file
                                  pattern is specified is *.* for a file system
                                  location or an empty prefix for an Azure
                                  Storage location.
                                  Specifying multiple file patterns is not
                                  supported.
    
    /CheckMD5                     Calculates an MD5 hash for downloaded data and
                                  verifies that the MD5 hash stored in the blob
                                  or file's Content-MD5 property matches the
                                  calculated hash. The MD5 check is turned off by
                                  default, so you must specify this option to
                                  perform the MD5 check when downloading data.
                                  Note that Azure Storage doesn't guarantee that
                                  the MD5 hash stored for the blob is
                                  up-to-date. It is client's responsibility to
                                  update the MD5 whenever the blob is
                                  modified.
                                  AzCopy always sets the Content-MD5 property for
                                  an Azure blob after uploading it to the
                                  service.
    
    /L                            Specifies a listing operation only; no data is
                                  copied.
    
    /MT                           Sets the downloaded file's last-modified time
                                  to be the same as the source blob's.
    
    /XN                           Excludes a newer source resource. The resource
                                  will not be copied if the source is newer than
                                  destination.
    
    /XO                           Excludes an older source resource. The resource
                                  will not be copied if the source resource is
                                  older than destination.
    
    /A                            Uploads only files that have the Archive
                                  attribute set.
    
    /IA:[RASHCNETOI]              Uploads only files that have any of the
                                  specified attributes set.
                                  Available attributes include:
                                  R     Read-only files
                                  A     Files ready for archiving
                                  S     System files
                                  H     Hidden files
                                  C     Compressed file
                                  N     Normal files
                                  E     Encrypted files
                                  T     Temporary files
                                  O     Offline files
                                  I     Not content indexed Files
    
    /XA:[RASHCNETOI]              Excludes files from upload that have any of the
                                  specified attributes set.
                                  Available attributes include:
                                  R     Read-only files
                                  A     Files ready for archiving
                                  S     System files
                                  H     Hidden files
                                  C     Compressed file
                                  N     Normal files
                                  E     Encrypted files
                                  T     Temporary files
                                  O     Offline files
                                  I     Not content indexed Files
    
    /NC:<number-of-concurrent>    Specifies the number of concurrent operations.
                                  AzCopy by default starts a certain number of
                                  concurrent operations to increase the data
                                  transfer throughput.
                                  Note that large number of concurrent operations
                                  in a low-bandwidth environment may overwhelm
                                  the network connection and prevent the
                                  operations from fully completing. Throttle
                                  concurrent operations based on actual available
                                  network bandwidth.
                                  The upper limit for concurrent operations is
                                  512.
    
    /BlobType:<page | block>      Specifies whether the destination blob is a
                                  block blob or a page blob.
                                  If the destination is a blob and this option
                                  is not specified, then by default AzCopy will
                                  create a block blob.
    
    /Delimiter:<delimiter>        Indicates the delimiter character used to
                                  delimit virtual directories in a blob name.
                                  By default, AzCopy uses / as the delimiter
                                  character. However, AzCopy supports using any
                                  common character (such as @, #, or %) as a
                                  delimiter. If you need to include one of these
                                  special characters on the command line, enclose
                                  it with double quotes.
                                  This option is only applicable for downloading
                                  blobs.
    
    /Snapshot                     Indicates whether to transfer snapshots. This
                                  option is only valid when the source is a blob.
                                  The transferred blob snapshots are renamed in
                                  this format: [blob-name] (snapshot-time)
                                  [extension].
                                  By default, snapshots are not copied.
    
    /SyncCopy                     Indicates whether to synchronously copy blobs
                                  among two Azure Storage end points.
                                  AzCopy by default uses server-side asynchronous
                                  copy. Specify this option to download the blobs
                                  from the service to local memory and then
                                  upload them to the service.
    
    /SetContentType:[content-
    type]                         Specifies the content type of the destination
                                  blobs.
                                  AzCopy by default uses
                                  "application/octet-stream" as the content type
                                  for the destination blobs. If option
                                  /SetContentType is specified without a value
                                  for "content-type", then AzCopy will set each
                                  blob's content type according to its file
                                  extension. To set same content type for all the
                                  blobs, you must explicitly specify a value for
                                  "content-type".
    
    
    ##
    ## Samples ##
    ##
    
    
    #1 - Download a blob from Blob storage to the file system, for example,
    download 'https://myaccount.blob.core.windows.net/mycontainer/abc.txt'
    to 'D:\test\'
        AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer/
        /Dest:D:\test\ /SourceKey:key /Pattern:"abc.txt"
    
    #2 - Copy a blob within a storage count
        AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer1/
        /Dest:https://myaccount.blob.core.windows.net/mycontainer2/
        /SourceKey:key /DestKey:key /Pattern:"abc.txt"
    
    #3 - Upload files and subfolders in a directory to a container, recursively
        AzCopy /Source:D:\test\
        /Dest:https://myaccount.blob.core.windows.net/mycontainer/
        /DestKey:key /S
    
    #4 - Upload files matching the specified file pattern to a container,
    recursively.
        AzCopy /Source:D:\test\
        /Dest:https://myaccount.blob.core.windows.net/mycontainer/ /DestKey:key
        /Pattern:*ab* /S
    
    #5 - Download blobs with the specified prefix to the file system, recursively
        AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer/
        /Dest:D:\test\ /SourceKey:key /Pattern:"a" /S
    
    ------------------------------------------------------------------------------
    Learn more about AzCopy at
    http://aka.ms/azcopy.
    ------------------------------------------------------------------------------
    
  2. 新版 AzCopy v10.11.0 (有在預設 PATH 路徑內)

    C:\ProgramData\Chocolatey\bin\azcopy.exe --help
    
    AzCopy 10.11.0
    Project URL: github.com/Azure/azure-storage-azcopy
    
    AzCopy is a command line tool that moves data into and out of Azure Storage.
    To report issues or to learn more about the tool, go to github.com/Azure/
    
    The general format of the commands is: 'azcopy [command] [arguments] --[flag-name]=
    
    Usage:
      azcopy [command]
    
    Available Commands:
      bench       Performs a performance benchmark
      copy        Copies source data to a destination location
      doc         Generates documentation for the tool in Markdown format
      env         Shows the environment variables that you can use to configure the behavior
      help        Help about any command
      jobs        Sub-commands related to managing jobs
      list        List the entities in a given resource
      login       Log in to Azure Active Directory (AD) to access Azure Storage resources.
      logout      Log out to terminate access to Azure Storage resources.
      make        Create a container or file share.
      remove      Delete blobs or files from an Azure storage account
      sync        Replicate source to the destination location
    
    Flags:
          --cap-mbps float                      Caps the transfer rate, in megabits per ughput might vary slightly from the cap. If this option is set to zero, or it is omitted,
      -h, --help                                help for azcopy
          --output-type string                  Format of the command's output. The choices ult value is 'text'. (default "text")
          --trusted-microsoft-suffixes string   Specifies additional domain suffixes where tokens may be sent.  The default is '*.core.windows.net;*.core.chinacloudapi.cn;*.core.api.net;*.storage.azure.net'. Any listed here are added to the default. For security, you re domains here. Separate multiple entries with semi-colons.
      -v, --version                             version for azcopy
    
    Use "azcopy [command] --help" for more information about a command.
    

後記

我有特別注意到 AzCopy v10 的執行效率比起以往高出許多,功能也比以前強大,同步功能也相當便利。除此之外,CLI 命令的使用上比以往簡單太多,這點要大大加分。

舊版的 AzCopy 可以直接用 Storage account 的 Access key 來存取 Blob 中的檔案,不用額外授權就可以使用。但是新版不能這樣用了,不用額外授權就可以使用的方式,必須靠 SAS (shared access signature) 來存取。任何其他的存取方式,像是 user identity, managed identity (還有分 system-wide managed identity 與 user-assigned managed identity 兩種) 與 service principal (還有分 Client Secret 與 Certificate 兩種) 都需要特別授權才能存取,即便你是 Owner 權限都無法使用,這部分應該是唯一要抱怨的地方了吧!雖然對 Azure 新手來說門檻太高,但我也覺得這樣是比較安全的設計。

相關連結

留言評論