Handling Unicode characters in Alfresco Web Service API with Java

Hey everyone! I’m stuck with a tricky issue while working on a Java project. I’m trying to import content into Alfresco using their web service API. The problem is that I need to set some NamedValue properties with UTF-8 strings containing Cyrillic characters.

When I run my code, I keep getting this annoying SAX parser exception:

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1b) was found in the element content of the document.

Here’s a simplified version of what I’m doing:

NamedValue[] props = new NamedValue[2];
props[0] = makeNamedValue("ItemName", itemName);
props[1] = makeNamedValue("CustomProperty", cyrillicText);

CreateItem create = new CreateItem("1", parentRef, null, null, null, itemType, props);
ItemList items = new ItemList();
items.setCreate(new CreateItem[]{create});
UpdateResult[] results = null;

try {
    results = RepoServiceFactory.getService().update(items);
} catch (Exception e) {
    // This is where the SAXParseException shows up
}

Does anyone have experience with this? How can I properly handle Unicode characters when using Alfresco’s web service API? Any help would be much appreciated!

hey finn, i’ve run into this before. try using Base64 encoding for ur cyrillic text before passing it to the API. something like this:

String encodedText = Base64.getEncoder().encodeToString(cyrillicText.getBytes(StandardCharsets.UTF_8));

then use encodedText in ur NamedValue. this should bypass the xml parsing issues. good luck!

I’ve encountered similar issues with Alfresco’s web service API and Unicode characters. The problem likely stems from how the API handles character encoding. To resolve this, try explicitly encoding your Cyrillic text to UTF-8 before passing it to the API. You can use the following approach:

String cyrillicText = new String(originalCyrillicText.getBytes("UTF-8"), "UTF-8");

Additionally, ensure your Java environment is set to use UTF-8 encoding. You can do this by adding the following VM argument when running your application:

-Dfile.encoding=UTF-8

If the issue persists, consider using Alfresco’s REST API instead, which generally handles Unicode characters more reliably. Remember to properly URL-encode any parameters containing non-ASCII characters when making REST calls.

I’ve dealt with similar Unicode headaches in Alfresco before. One thing that worked for me was using Apache Commons Lang’s StringEscapeUtils to escape the Cyrillic text before passing it to the API. Something like this:

String escapedCyrillicText = StringEscapeUtils.escapeXml(cyrillicText);

Then use escapedCyrillicText in your NamedValue. This should prevent those pesky SAX exceptions.

Another approach is to use CDATA sections for your Cyrillic content:

String wrappedCyrillicText = “”;

This tells the XML parser to treat the content as character data, bypassing special character issues.

If all else fails, you might want to look into using Alfresco’s CMIS API instead. It’s generally more robust when it comes to handling international characters and might save you some grief in the long run.