Validating user input is always a great idea from a usability and security point of view. However, when it comes to things like URLs, the data is complex and there is a very strict pattern that the data has to adhere to. From a data perspective, this is great news, since we can validate for what we want, not try to detect what we don't.

However, a lot of modern URLs don't always do a great job following RFC 1738. Specifically, I'm looking at you .Net guys who insist on putting UUIDs wrapped in curly brackets in query strings and the like. According to RFC 1738, curly brackets are "unsafe" within URLs and should be encoded to their URL-encoded entities.

So, technically, curly brackets are fine in URLs (if encoded), but when a user pastes their URL with curly brackets into your site and you pass it through your likely regex-based validation algorithm, you are likely to experience a validation failure since the curly brackets aren't allowed. Now, sure, you could pass the entire URL through something like urlencode() or rawurlencode() -- but that encodes everything! What I do is simple: I just replace any brackets and then continue on my merry way:

$replace = array('{' => '%7B', '}' => '%7D');
$newval = str_replace(array_keys($replace), array_values($replace), $user_url);
if (!my_url_validation_func($newval)) {
  my_error_function("Hey, you need a valid URL!");
}

If you're saving the URL, you should probably save the encoded version, but you don't have to, as long as you adhere to RFC 1738 on output (although most modern browsers are fine if you don't).

I've found this especially useful while doing URL validation in Drupal, as its built-in (and contributed) URL validation routine all seem to be pretty adherent to RFC 1738.

    Post new comment

    The content of this field is kept private and will not be shown publicly.
    • Web page addresses and e-mail addresses turn into links automatically.
    • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote>
    • Lines and paragraphs break automatically.
    • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <geshi>, <bash>, <c>, <cpp>, <csharp>, <css>, <drupal5>, <drupal6>, <html>, <js>, <mysql>, <php>, <python>, <rails>, <ruby>, <sql>, <text>, <mssql>, <xml>. Beside the tag style "<foo>" it is also possible to use "[foo]". PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

    More information about formatting options