Beware: Hadoop C# SDK Inserts Line Breaks, Tabs That Break Your Queries

After banging my head against the wall for many hours, I finally figured out that .NET is adding escaped carriage returns, aka \r\n when the queries are sent to HDInsight, which is causing the queries to fail. My code was loading the queries from files on disk like this: string query = string.Empty; using (var fs = new StreamReader("CreateTempTable.hql")) { query = fs.ReadToEnd(); } I figured this out [Read More]

Submitting HDInsight Jobs From An Azure Webjob or WorkerRole Using the C# Hadoop SDK

All the samples for submitting jobs programmatically to HDInsight assume that you are doing so from a desktop working station that has been set up with a management certificate. The code gets your cert out of the cert store and creates a JobSubmissionCertificateCredential as such: // Get the certificate object from certificate store using the friendly name to identify it X509Store store = new X509Store(); store.Open(OpenFlags.ReadOnly); X509Certificate2 cert = store. [Read More]

HDInsight Hadoop Hive Job Decompresses CSV GZIP Files By Default

Been working with Hadoop (2.4.0) and Hive (0.13.0) with HDInsight (3.1) and it decompresses GZIP files into CSV by default.  Nice!  So, loading data with a Hive query in Powershell: $response = Invoke-Hive -Query @" LOAD DATA INPATH 'wasb://$container@$storageAccountName.blob.core.windows.net/file.csv.gz' INTO TABLE logs; "@ No additional work or arguments to pass. I thought I had to do something [Read More]