I’m working on a Go project where I need to extract cookies from a headless browser session and then use them in Colly for further web scraping. The headless browser generates some JavaScript cookies that I need to capture first.
Right now I have this function that exports cookies to JSON:
there’s actually an easier way - use colly’s built-in cookie jar directly. unmarshal your json cookies first, then add them with a c.OnRequest callback. something like jar.SetCookies(parsedURL, convertedCookies) should work. just make sure you handle the expiration time conversion properly - rod uses a different format than standard http cookies.
Just map the Rod cookie fields to standard http.Cookie fields correctly. Watch out for the domain - sometimes you’ll need to add the protocol for URL parsing to work.
I faced a similar issue a while back, and I’ve found that using Colly’s cookie jar SetCookies method is significantly more efficient than manually setting cookie headers. Here’s how I resolved it:
func importRodCookiesToColly(jsonCookies []byte, collector *colly.Collector) error {
var cookies []YourCookieStruct
json.Unmarshal(jsonCookies, &cookies)
jar := collector.Jar
for _, cookie := range cookies {
targetURL, _ := url.Parse(fmt.Sprintf("https://%s%s", cookie.DomainName, cookie.URLPath))
httpCookie := &http.Cookie{
Name: cookie.CookieName,
Value: cookie.CookieValue,
Domain: cookie.DomainName,
Path: cookie.URLPath,
}
jar.SetCookies(targetURL, []*http.Cookie{httpCookie})
}
return nil
}
Using the jar’s SetCookies method not only simplifies the process but also ensures that cookies are handled correctly and domains are matched accurately during Colly’s requests.