I’m trying to scrape AliExpress using a headless browser with chromedp, but I’m running into a problem. When I use my code to go to the main AliExpress page in English, it keeps sending me to the Arabic version instead.
I want to make sure I end up on the English page. Here’s a simplified version of what I’m doing:
chromedp.Run(ctx,
chromedp.Navigate(`https://www.aliexpress.com/?language=en®ion=US`),
chromedp.WaitVisible(`body`),
chromedp.Sleep(5*time.Second),
chromedp.Evaluate(`
// Code to get all links
`, &links),
)
Is there a way to add headers or something else to make sure I get the English page? Any help would be great!
Having dealt with similar challenges, I can suggest a couple of effective strategies. First, try setting up a proxy server in the US and routing your requests through it. This often tricks the site into thinking you’re accessing from a US IP address. Another approach is to use chromedp’s ExecAllocator option to set specific browser preferences. You can set the ‘accept-language’ preference to ‘en-US,en’ and the ‘intl.accept_languages’ to ‘en-US’. These settings can override the site’s language detection mechanism. Additionally, clearing cookies before each session might prevent the site from remembering previous language preferences. Remember to allow sufficient time for the page to load completely before scraping.
I’ve encountered similar issues when scraping AliExpress. One effective workaround I found is to set specific headers and cookies to mimic a US-based browser session. Here’s what worked for me:
Setting a custom user-agent header and applying specific cookies can help force the site to serve the English version. For instance, use chromedp.SetExtraHTTPHeaders() to set a common browser’s user-agent, and then specify a cookie, such as setting ‘aep_usuc_f’ with the appropriate settings (e.g., site=glo, region=US, and language parameters). This adjusts the session to appear US-based and should bypass the language redirection issue.
Give this a try and see if it consistently returns the English version.