What every dev must know about URL Encoding.

I recently stumbled on one of Stéphane Epardaud's tech notes on URL encoding pitfalls and can't recommend it enough.

Let's see how this translates in the .Net world.





using System.Collections.Generic;

namespace OneKStrongOxen
{
    public static class UriDemo
    {
        public static string Encode(this string str, bool doIt)
        {
            return doIt ? Uri.EscapeDataString(str) : str;
        }

        private static void Main(string[] args)
        {
            CreateUri("Encoding: pass, query, frag", true, true, true);
            CreateUri("Encoding: pass, query, !frag", true, true, false);
            CreateUri("Encoding: pass, !query, !frag", true, false, false);
            // CreateUri("Encoding: !pass, !query, !frag", false, false, false); // throws exception when parsing host

            string stephsMonstruosity =
                "http://example.com/:@-._~!$&'()*+,=;:@-._~!$&'()*+,=:@-._~!$&'()*+,==?/?:@-._~!$'()*+,;=/?:@-._~!$'()*+,;==#/?:@-._~!$&'()*+,;";
            Dump(new Uri(stephsMonstruosity));
            Dump(
                new Uri(
                    "http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454"));
        }

        private static Uri CreateUri(string name, bool encodePass, bool encodeQuery, bool encodeFragment)
        {
            Console.WriteLine(name);
            UriBuilder b = new UriBuilder();
            b.Scheme = "http";
            b.Host = "www.example.com";
            b.UserName = "Tony the pony";
            b.Password = "H#C@més!".Encode(encodePass);
            b.Query = "HE COMES=THE PONY COMES&+++?==".Encode(encodeQuery);
            b.Fragment = "C# Regexes can parse Uri!".Encode(encodeFragment);
            Dump(b.Uri);
            Console.Out.WriteLine("--Cloned");
            Dump(new Uri(b.Uri.ToString()));


            return b.Uri;
        }

        private static void Dump(Uri u)
        {
            Console.WriteLine(u);
            Console.WriteLine(" Scheme:{0}", u.Scheme);
            Console.WriteLine(" Host:{0}", u.Host);
            Console.WriteLine(" UserInfo:{0}", u.UserInfo);
            Console.WriteLine(" Path:{0}", String.Concat(u.Segments));

            string[] logAndPass = u.UserInfo.Split(':');
            for (int i = 0; i < logAndPass.Length; i++)
                Console.Out.WriteLine(" log/pass[{0}]='{1}' ==> Decoded:'{2}'", i, logAndPass[i],
                                      Uri.UnescapeDataString(logAndPass[i]));

            Console.WriteLine(" Query:{0}", u.Query);
            foreach (var pair in u.Query.Split('&'))
            {
                Console.Out.WriteLine("  Key/Value pair: " + pair);
                foreach (var part in pair.Split('='))
                {
                    if (!string.IsNullOrEmpty(part))
                        Console.Out.WriteLine("  raw:{0} - decoded {1}", part, Uri.UnescapeDataString(part));
                }
            }
            Console.WriteLine(" Fragment:{0}", u.Fragment);
        }
    }
}

So we've got a decent UriBuilder, but the support for encoding/decoding values is a bit lacking as we've got to use static methods on System.Uri for this. Still, the order of operations is the same, To create : Encode the parts then Build the URI, or to make sense of an URI, Parse then Decode.
But I'm not as masochistic as Stéphane, so I won't delve into UTF encoding. Especially since he did it with brio, and I remember fudging-up the password part of a Java URL builder once...

1 comment:

  1. Eh, thanks for the link :)

    You missed the point about each URI part having different encoding rules, different special characters that lose their meaning when URL-encoded, since I don't believe a single Uri.EscapeDataString is good for every URI part.

    ReplyDelete

Please leave your comments in English or French and I will be pleased to answer them if you have any questions.

Spammers will be walked down the plank matey. Arrr!