File download REST service strips accented characters from filename

I’m working on a REST web service for file downloads using Java. The issue I’m facing is that when users download files through their browser, Spanish accented characters get removed from the filename.

@GET
@Path("/getfile")
@Produces(MediaType.APPLICATION_OCTET_STREAM)
public Response retrieveFile(@QueryParam("fileId") String fileId) throws IOException {
    try {
        DocumentStream result = DocumentService.fetchDocument(Integer.parseInt(fileId));
        ResponseBuilder builder = Response.ok((Object) result.getFileStream());
        System.out.println("Original name: " + result.getDocumentData().getFileName());
        System.out.println("Encoded name: " + DocumentModel.convertToISO(result.getDocumentData().getFileName()));
        builder.header("Content-Disposition", "attachment;filename=" + result.getDocumentData().getFileName());
        return builder.build();
    } catch (Exception ex) {
        return Response.status(500).entity("File download failed").build();
    }
}

Here’s my encoding utility:

public static String convertToISO(String text) {
    return new String(text.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
}

The database returns the correct filename: documentó año 2021.xlsx

But when I apply encoding: DocumentModel.convertToISO(...) I get: document� a�o 2021.xlsx

In the browser download, both approaches result in: document a o 2021.xlsx

I’ve tested different charset configurations including UTF-8 and ISO-8859-1 but the accented characters still disappear. What configuration am I missing to preserve Spanish characters in downloaded filenames?

Your convertToISO method is corrupting the filename by incorrectly converting between charsets. The Content-Disposition header needs proper RFC encoding for non-ASCII characters.

Use RFC 2231 encoding instead:

builder.header("Content-Disposition", "attachment; filename*=UTF-8''" + URLEncoder.encode(result.getDocumentData().getFileName(), StandardCharsets.UTF_8));

This tells the browser the filename is UTF-8 encoded. I had the same problem with Portuguese characters and this fixed it completely. Just remove that convertToISO method entirely - it’s what’s causing the corruption in your logs.

Your charset handling is the problem. For Content-Disposition headers with international characters, you need both a fallback filename and RFC 5987 encoding to work across all browsers. I ran into the same issue with German umlauts in my enterprise app. The fix is using two filename parameters - an ASCII fallback and a proper UTF-8 version:

String originalName = result.getDocumentData().getFileName();
String asciiName = originalName.replaceAll("[^\\x00-\\x7F]", "");
String encodedName = URLEncoder.encode(originalName, StandardCharsets.UTF_8);

builder.header("Content-Disposition", 
    "attachment; filename=\"" + asciiName + "\"; filename*=UTF-8''" + encodedName);

This gives you backward compatibility and makes sure modern browsers show accented characters right. Ditch your convertToISO method completely - it’s doing bad charset conversion that messes up the original UTF-8 data.

Had the same issue last month with French filenames. You’re mixing ISO and UTF-8 the wrong way. Try URLEncoder.encode(filename, "UTF-8").replace("+", "%20") and set your header like "attachment; filename=\"" + encodedName + "\"" - worked perfectly for château.pdf and other accented files.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.