Python web.py crashes due to malformed Google Analytics cookies with special characters

I’m running into a nasty issue with my Python web application built using the web.py framework. The problem happens when Google Analytics creates cookies that contain non-English characters in the campaign tracking parameters.

When users visit my site through links that have Cyrillic text in the UTM campaign names, Google Analytics generates a __utmz cookie with these special characters. This causes my Python backend to throw a CookieError because the standard Cookie module can’t parse it properly.

Here’s the error I keep getting:

Traceback (most recent call last):
  File "/app/libs/web/sessions.py", line 82, in _initialize  
    self.user_session = web.cookies().get(session_cookie)
  File "/app/libs/web/utils.py", line 245, in cookies
    cookie_jar.load(ctx.env.get('HTTP_COOKIE', ''))
  File "/usr/lib/python2.7/Cookie.py", line 627, in load
    self._parse_cookie_data(cookie_string)
  File "/usr/lib/python2.7/Cookie.py", line 660, in _parse_cookie_data
    self._assign(name, raw_val, encoded_val)
  File "/usr/lib/python2.7/Cookie.py", line 580, in _assign
    cookie.set(name, actual_value, encoded_value)
  File "/usr/lib/python2.7/Cookie.py", line 455, in set
    raise CookieError("Invalid cookie name: %s" % name)
CookieError: Invalid cookie name: )|utmcmd

I tried to work around this by catching the exception:

def handle_cookie_error():
    try:
        web.cookies()
    except CookieError:
        if not "fix_cookie" in web.input():
            web.setcookie("__utmz", None, domain=web.ctx.host)
            raise web.seeother(web.changequery(fix_cookie=1))
    return web.internalerror(render.templates.error500())
app.internalerror = handle_cookie_error

This worked for a while in Firefox but now Chrome is also having the same issue. The problematic tracking URL looks like this: utm_source=feedburner&utm_medium=social&utm_campaign=MyFeed%3A+blogname+%28%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B5+%D1%81%D0%BB%D0%BE%D0%B2%D0%B0%29

The bad cookie value ends up being: 12345678.1234567890.3.2.utmcsr=feedburner|utmccn=MyFeed:%20blogname%20(Русские%20слова)|utmcmd=social

Is this a bug in Python’s Cookie module or are browsers allowing invalid cookies? This seems like it could affect many Python websites using Google Analytics. What’s the best way to handle this?

web.py’s cookie parser is way too strict about RFC compliance - GA doesn’t give a damn. I just monkey-patch the cookie parsing function to skip the broken ones instead of letting it crash. Quick fix: override web.cookies() to catch the cookieerror and return an empty dict, then use regex to manually parse whatever cookies you actually need.

Had this exact problem two years ago with Django - same root cause. Google Analytics cookies use pipe characters and special symbols that break RFC 2965 naming rules. I fixed it by adding a cookie sanitization layer before web.py tries to parse anything. Built custom middleware that grabs the HTTP_COOKIE header and strips out (or URL-encodes) the bad characters in __utmz and __utma values. Your exception handler approach is reactive - you’re catching errors after they happen. Try preprocessing the cookie string in your WSGI middleware instead. Regex the Google Analytics cookies and clean their values before web.py’s parser sees them. This isn’t really a Python bug. Browsers are just more forgiving than the HTTP spec, while Python’s Cookie module follows standards strictly. Most production sites I’ve worked on need cookie sanitization for this reason.

Been dealing with this garbage for years at different companies. The issue isn’t just web.py - it’s Google Analytics being sloppy with cookie formatting while Python’s Cookie module actually follows the RFC specs.

Your current fix is way too heavy handed. Don’t redirect users and clear cookies - just intercept the raw cookie header before web.py touches it.

I patch this at the WSGI level:

def clean_cookies_middleware(app):
    def wrapper(environ, start_response):
        if 'HTTP_COOKIE' in environ:
            cookies = environ['HTTP_COOKIE']
            # Remove problematic GA cookies entirely
            cleaned = re.sub(r'__utm[^=]*=[^;]*[|)][^;]*;?\s*', '', cookies)
            environ['HTTP_COOKIE'] = cleaned
        return app(environ, start_response)
    return wrapper

This strips out any GA cookie with pipe or parenthesis characters before web.py sees them. You lose some analytics data but your app stays up.

If you need to keep the GA data, URL decode and re-encode those cookie values to make them RFC compliant. But honestly, most of the time you don’t need those tracking cookies server-side anyway.