Speeding Up Search in Azure Table Storage


Azure Table Storage is cheap and in some simple uses, as good as CosmosDB. However, when searching a single table storage (Standard Performance) with millions of rows, then the key to speeding up searching for specific entities or traversal through all entities, requires you to use both the partition key and row key values, but none of the other field values. Obviously, if you have multiple different values in either or both of the partition key and row key fields, you’ll have a problem with the speed of a search. In that case, CosmosDB will be a better option.


The code below shows how to search the a storage table using the partition key and row key fields. A look up table is used for the partition key and the date is used for the row key. Obviously, the storage table, which holds the different partition keys, needs to be maintained. The “duplicated” date in the row key mimics the timestamp field, but you can use pretty much any date type and format instead, as long as you a have simple way of searching this field.


There’s also an upload function, if you want to take advantage of a very cheap storage option, even for million of entities…


I have an Azure function that takes care of updating the partition keys table, using a timer trigger on a monthly basis. I will add this shortly.


public class AzureTableStorageReader {
    private static readonly CloudStorageAccount StorageAccount = CloudStorageAccount.Parse(
        "DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxxx==;TableEndpoint=https://xxxxx.table.core.windows.net/;");
    private static readonly CloudTableClient TableClient = StorageAccount.CreateCloudTableClient();

    public async Task<IEnumerable<TableLog>> ReadTableLogsByDateTimes(DateTime[] logsDates) {
        if (!logsDates.Any() && logsDates.Count() != 2)
            throw new ArgumentException("Log dates are not provided.");

        try {
            var storageTable = TableClient.GetTableReference("logs");
            var storagePartKeysTable = TableClient.GetTableReference("logsPartitionKeys");
            TableContinuationToken contToken = null;

            var filter = string.Empty;

            // Get all partition keys
            foreach (var apk in await storagePartKeysTable.ExecuteQuerySegmentedAsync(new TableQuery<TableLogPartitionKey>(), contToken)) {
                filter += (filter != string.Empty) ? " or " : "(";

                // Is the full partition key text stored in the RowKey?
                filter += (apk.FullText == "0") ? "PartitionKey eq '" + apk.RowKey + "'" : "PartitionKey eq '" + apk.RowKey + "%60'";
            }

            // Add start date and end date we're searching for (both inclusive)
            filter += ") and (RowKey ge '" +
                      logsDates[0].ToString("yyyy-MM-dd", CultureInfo.InvariantCulture) +
                      "' and RowKey lt '" + logsDates[1].AddDays(1).ToString("yyyy-MM-dd", CultureInfo.InvariantCulture) + "')";

            var storageTableQuery = new TableQuery<TableLog>();
            var fetchedLogs = new List<TableLog>();

            contToken = null;

            // Execute the query async and segmented, fetching rows in chunks, until the last segment is fetched
            do {
                var seq = await storageTable.ExecuteQuerySegmentedAsync(storageTableQuery.Where(filter), contToken);

                fetchedLogs.Capacity += seq.Count();
                contToken = seq.ContinuationToken;
                fetchedLogs.AddRange(seq.Select(a => a));
            } while (contToken != null);

            if (fetchedLogs.Count == 0) {
                // Log error...
            }

            // Process the rows as needed...
            return fetchedLogs;
        }
        catch (Exception ex) {
            // Log error...
            return null;
        }
    }
}
public class AzureTableStorageWriter {
    private static readonly CloudStorageAccount StorageAccount = CloudStorageAccount.Parse(
        "DefaultEndpointsProtocol=https;AccountName=logsxx;AccountKey=xxx==;TableEndpoint=https://logsxx.table.core.windows.net/;");
    private static readonly CloudTableClient TableClient = StorageAccount.CreateCloudTableClient();

    public async void UploadTableLogs() {
        try {
            var storageTable = TableClient.GetTableReference("logsxx");
            var storagePartKeysTable = TableClient.GetTableReference("sogsPartitionKeys");
            var contToken = new TableContinuationToken();

            // Get all partition keys
            var lpk = await storagePartKeysTable.ExecuteQuerySegmentedAsync(new TableQuery<TableLogPartitionKey>(),
                contToken);

            // Execute the query async
            for (var counter = 0; counter <= 1000000; counter++) {
                var date = DateTime.UtcNow.ToString("yyyy-MM-ddThh:mm:ss.fffZ", CultureInfo.InvariantCulture);
                try {
                    await storageTable.ExecuteAsync(TableOperation.Insert(new TableLog(
                        date, (from l in lpk orderby Guid.NewGuid() select l.PartitionKey).FirstOrDefault(),
                        date)));
                }
                catch (Exception ex) {
                    // Log error...
                    if (ex.Message == "Conflict") continue;
                }

            };
        }
        catch (Exception ex) {
            // Log error...
            Console.WriteLine(ex.Message);
        }
    }
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s