I’m working with Java to connect to an Alfresco content repository through its web service API. My goal is to upload documents and set custom properties that contain UTF-8 encoded text with Cyrillic characters.
However, I keep running into this SAX parser error when trying to process the request:
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1b) was found in the element content of the document.
Here’s a simplified version of my implementation:
NamedValue[] customProperties = new NamedValue[2];
customProperties[0] = Utils.createNamedValue(Constants.PROP_NAME, fileName);
customProperties[1] = Utils.createNamedValue("{custom.model}textProperty", russianText);
CMLCreate createNode = new CMLCreate("1", parentRef, null, null, null, nodeType, customProperties);
CML cmlRequest = new CML();
cmlRequest.setCreate(new CMLCreate[]{createNode});
UpdateResult[] response = null;
try {
response = WebServiceFactory.getRepositoryService().update(cmlRequest);
} catch (Exception e) {
// SAXParseException occurs here
}
Has anyone encountered this Unicode issue before? What’s the best approach to handle Cyrillic text in NamedValue properties?
try encoding your russian text with URLEncoder.encode() before creating the NamedValue - fixed the same cyrillic issue i had with alfresco webservices. also check for hidden characters in your text string. copy-pasting from docs sometimes adds weird control characters that break xml parsing.
I hit this exact error working with Alfresco’s web services a couple years back. The problem is how the SOAP envelope handles character encoding during serialization. Here’s what fixed it for me: validate your characters before creating the NamedValue objects. You’ll want to sanitize russianText by stripping out any characters below Unicode 0x20 (keep tab, line feed, and carriage return though). I built a simple method that loops through the string and filters invalid XML characters using Character.isISOControl(). Also make sure your web service client config specifies UTF-8 encoding in the SOAP binding properties. The default encoding was corrupting characters during transmission in my case. One more thing - if you’re pulling the Russian text from a file or database, double-check the source encoding matches what you expect.
The character 0x1b is an escape character that causes issues with your XML parser. I have encountered similar problems while working with legacy Alfresco web services and non-ASCII content. It’s likely that your Russian text contains control characters, or the encoding was altered during transmission. To resolve this, ensure you strip any control characters below Unicode 0x20 from your russianText before passing it to NamedValue. Make sure your web service client is set to use UTF-8 encoding explicitly, and verify that your SOAP headers specify UTF-8 as well. Additionally, consider applying a regex to filter out problematic characters before creating your NamedValue. Another solution that worked for me involved Base64 encoding the Cyrillic text prior to sending it, then decoding on the repository side if that’s feasible.