One of the more useful, yet simple, classes of .NET is System.Net.WebClient. It basically simulates a simple browser to interact with web pages within your code. You can use it to GET pages or POST data to pages over http or https (SSL). The page response can be saved into a byte array, a string or a file. WebClient can even be used to pass authentication values to pages that require it, you know, those pages that pop up a user and password window before letting you in. For that, you’d supply the values to the Credentials property of WebClient and fire away.
Recently I was trying to access a protected page using WebClient. The code was pretty straightforward:
WebClient wc=new WebClient();
Pretty simple, eh? I've done this a million times, but in this case (the actual site shall remain nameless) it was throwing an exception. After some investigation, I noticed that the page was returning a 404 code (page not found) prompting the exception error. I also discovered that the call wasn't sending the appropriate Authorization header to the page. The site's documentation was clear about accessing the page using basic authorization header, but there was no getting past the exception. What was going on here?
After some fruitless troubleshooting, I decided to forego the Credentials property and manually craft the Authorization header. To do that I wrote the following code:
WebClient wc=new WebClient();
This time, to my delight, the page obliged and the string variable "a" received the page's content. Not being satisfied with merely solving the problem using a different route, I decided to dig in and find out why the original (more proper) code was failing. First I discovered that the page was designed to return a 404 code rather than the customary 401 (Not Authorized) when the credentials were missing.
I'm not sure what the RFC's position on this is, but according to MSDN documentation, when a protected URL receives no authorization header from a client, it should return a 401 code, signaling to the client that authentication is required. The client should then provide the authorization header with each access, satisfying the URL's demand. The WebClient class with its Credentials property is designed to do just that, but not in a straightforward manner.
Under the hood, WebClient constructs a HtttpWebRequest object and sends a plain request to the specified page. Upon receiving a 401 code, it crafts the authorization header using the Credentials property and hits the page again. That's two round trips for every request. Worse yet, it does that for every subsequent request. I fail to see why WebClient insists on not sending the authorization header in the first place. After all, if the coder specifies the Credentials property, he must already know that the page requires authorization and WebClient should just obey and send in the header without the fuss.
In this case, the site's response aggravated matters by sending back a 404 (rather than a 401) after the first request, sending the whole process into a tailspin and causing an exception to be raised.
If you use the WebClient class in your .NET code to access protected URLs, watch out for this little stumper. It wasted quite a bit of my time. Maybe my loss will be your gain.