Programmatic form submission with HtmlUnit headless browser in Java

I’m working with HtmlUnit for the first time and need help with handling multiple redirects to reach a final document.

I have a Crystal Reports server that exposes REST APIs for report generation. The process involves getting a URL from an API endpoint, and when I open this URL in a regular browser, it goes through about three redirects before finally displaying the PDF report.

I’m trying to replicate this browser behavior using HtmlUnit:

try (final WebClient client = new WebClient()) {
    client.getOptions().setJavaScriptEnabled(true);
    client.getOptions().setThrowExceptionOnScriptError(false);
    client.getOptions().setThrowExceptionOnFailingStatusCode(false);
    client.getOptions().setRedirectEnabled(true);
    resultPage = client.getPage(reportUrl);
}

This code gets me to the second redirect but stops there instead of reaching the actual PDF document. What’s the best approach to handle this situation? Should I manually capture the intermediate response and make another call with a fresh WebClient instance, or is there a simpler way to follow all redirects until the final document is reached?

set your user agent to mimic a real browser - lots of servers block headless clients. also double-check that htmlunit’s handling the pdf content-type right, you might need to configure the accept headers manually. crystal reports gets weird with automated access.

I’ve encountered this exact issue with HtmlUnit and report servers before. Typically, it’s related to timing or content-type problems rather than the redirects themselves. Implement a slight delay between redirects to allow the server to process before the next redirect is issued. Additionally, ensure that your timeout settings are configured appropriately. It’s also important to check if any intermediate responses include cookies or session data that need to be preserved. I found it helpful to log the actual redirect URLs at each step, as sometimes, servers require specific headers or parameters to be maintained throughout the entire process. By logging those intermediate URLs along with their response codes, you can pinpoint where the failure occurs.