How to transfer cookies from Go Rod headless browser to Colly cookiejar?

I’m working on a Go project where I need to extract cookies from a headless browser session and then use them in Colly for further web scraping. The headless browser generates some JavaScript cookies that I need to capture first.

Right now I have this function that exports cookies to JSON:

func extractBrowserCookies() {
    sessionCookies := (browserInstance.MustGetCookies())
    jsonData, _ := json.Marshal(sessionCookies)
}

The jsonData contains cookies in this structure:

{"cookieName":"testCookie","cookieValue":"testValue","domainName":"example.com","urlPath":"/","expirationTime":1234567890,"dataSize":42,"httpOnlyFlag":true,"secureFlag":false,"sessionFlag":true,"priorityLevel":"Medium","samePartyFlag":false,"schemeSource":"Secure","portSource":443}

How can I properly convert these cookies and add them to Colly’s cookie jar so I can continue scraping with the authenticated session?

there’s actually an easier way - use colly’s built-in cookie jar directly. unmarshal your json cookies first, then add them with a c.OnRequest callback. something like jar.SetCookies(parsedURL, convertedCookies) should work. just make sure you handle the expiration time conversion properly - rod uses a different format than standard http cookies.

You need to convert those JSON cookies to Go’s http.Cookie format before adding them to Colly’s jar. Here’s what worked for me:

func transferCookiesToColly(sessionCookies []rod.Cookie, c *colly.Collector) {
    for _, cookie := range sessionCookies {
        httpCookie := &http.Cookie{
            Name:     cookie.Name,
            Value:    cookie.Value,
            Domain:   cookie.Domain,
            Path:     cookie.Path,
            Expires:  time.Unix(int64(cookie.Expires), 0),
            HttpOnly: cookie.HTTPOnly,
            Secure:   cookie.Secure,
        }
        
        u, _ := url.Parse("https://" + cookie.Domain)
        c.OnRequest(func(r *colly.Request) {
            r.Headers.Set("Cookie", httpCookie.String())
        })
    }
}

Just map the Rod cookie fields to standard http.Cookie fields correctly. Watch out for the domain - sometimes you’ll need to add the protocol for URL parsing to work.

I faced a similar issue a while back, and I’ve found that using Colly’s cookie jar SetCookies method is significantly more efficient than manually setting cookie headers. Here’s how I resolved it:

func importRodCookiesToColly(jsonCookies []byte, collector *colly.Collector) error {
    var cookies []YourCookieStruct
    json.Unmarshal(jsonCookies, &cookies)
    
    jar := collector.Jar
    for _, cookie := range cookies {
        targetURL, _ := url.Parse(fmt.Sprintf("https://%s%s", cookie.DomainName, cookie.URLPath))
        
        httpCookie := &http.Cookie{
            Name:   cookie.CookieName,
            Value:  cookie.CookieValue,
            Domain: cookie.DomainName,
            Path:   cookie.URLPath,
        }
        
        jar.SetCookies(targetURL, []*http.Cookie{httpCookie})
    }
    return nil
}

Using the jar’s SetCookies method not only simplifies the process but also ensures that cookies are handled correctly and domains are matched accurately during Colly’s requests.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.