This article will be about some library methods of Azure Storage .NET Client Library often misused. I have seen several cases people used these in an improper way and doing so would often introduce bugs that you wouldn’t see on day one but appear much later on –which is the most dangerous kind.
The first method ListBlobsSegmentedAsync is used to list blobs (files) in a container (folder or bucket) on an Azure Cloud Storage Service account. The second method ListContainersSegmentedAsync lists the containers (folders) in the account.
Since these are almost doing the same thing, I will be talking only about ListBlobsSegmentedAsync
.
This method normally has a blocking alternative, called ListBlobs
. The difference of these
two is, obviously, one uses asynchronous networking methods to make async I/O and the other one does not,
which would block the entire thread (something bad if you’re running in a thread pool).
However, the more significant difference happens when your container has more than 5,000 blobs to list,
that’s when you get a pagination for the results, because it’s the API limit.
In this case, ListBlobs
will call each page sequentially will block the entire thread while doing so
and the thread will be blocked with just that job until list of all blobs are downloaded, which could take
several minutes very easily if you have some 6-digit of blobs in the container.
ListBlobsSegmentedAsync
method, on the other hand, will intiate the REST API request using asynchronous
networking libraries and the control of the thread will be yielded to other tasks running, since the
implementation of this uses await
method of .NET. When this task is completed, it will return you a
maximum of 5,000 blobs and a continuation token if there are more results,hich you are supposed to
call this method again with this token if you want to get the “Page 2”. This is the point where people
usually forget calling this method because while they’re testing there are probably less than 5,000 blobs
in the container. Later on when the results are paginated, the code will start seeing the first 5,000 results
only.
Why is this a problem?
Naming. If I would be designing this library and there was a method called ListBlobs
, I would provide
its async equivalent as ListBlobsAsync
which uses ListBlobsSegmentedAsync
and it would look like this:
public Task<List<IListBlobItem> ListBlobsAsync(BlobContinuationToken currentToken){
BlobContinuationToken continuationToken = null;
List<IListBlobItem> results = new List<IListBlobItem>();
do
{
var response = await ListBlobsSegmentedAsync(continuationToken);
continuationToken = response.ContinuationToken;
results.AddRange(response.Results);
}
while (continuationToken != null);
return results;
}
This way, developers could go on coding non-stop instead of taking a step backwards and figuring out what
segmented means and what’s the difference between these two methods. They would just pick ListBlobsAsync
assuming it’s the exact async equivalent of ListAsync
by name convention and move on.
Same goes for ListContainersSegmentedAsync
, a more helpful ListContainersAsync
method would look like:
public async Task<List<CloudBlobContainer>> ListContainersAsync()
{
BlobContinuationToken continuationToken = null;
List<CloudBlobContainer> results = new List<CloudBlobContainer>();
do
{
var response = await ListContainersSegmentedAsync(continuationToken);
continuationToken = response.ContinuationToken;
results.AddRange(response.Results);
}
while (continuationToken != null);
return results;
}
You can add those as extenstion methods to CloudBlobContainer and CloudBlobClient classees, respectively.
It is just some arguably missing helper methods in the client library which, again, arguably introduces some
friction and are not exactly behaving according to Principle of least astonishment. People usually end up
using it improperly by just making a call to ListBlobsSegmentedAsync
hoping it will return all the blobs at once.
So now you know how to correcly use these kind of cryptic methods. Happy coding!
Leave your thoughts