UTF-8

When I first developed the Project Organizer app in 2013, I was only focused on building the best app for my personal needs.

I needed to move data from one device to another and the icloud did not exist. Email, and SMS were available so I decided to use a file attachment as the means to send data. I was more familiar with XML than JSON so I chose that format.

It became apparent that I needed to encode certain characters to avoid Parsing errors so I researched this and found a library written in C that I was able to rewrite into Objective C to handle escaping characters using ASCII encoding.

This was good for a while until I began localizing and internationalizing my app into different Latin based languages.

As I added each new language I had to add special encoding to my export file to handle the new special characters. It was like playing wack a mole, but I was not getting ahead of the curve and I knew I could only to continue to expand through Latin based languages. This character set could not handle Chinese, Japanese, Arabic or several other popular languages that were not Latin based languages.

Again I began researching this and after much reading I found that UTF-8 should be able to support almost all languages. Converting my app to use UTF-8 was not as easy as I had hoped but could have been easier if I had some guidelines up front.

This is what ultimately worked for me, it may or may not work for your needs.

Do not do any UTF-8 encoding or deciding while manipulating strings in your app. NSString has a convenience method to convert a string to UTF-8 encoding. Do not use it. NSString represents all strings internally as UTF-16. Use NSString to format, concatenate and parse strings as you need to in your app. This will avoid a lot of defects.

When you write your data to a file encode it to UTF-8 at this time. UTF-8 will handle all your international characters like úáéöü etc. It also can handle most of the special Unicode characters that Apple has added since 2013 like 🍎 🍏 ✅ 😐😀. Most importantly the libxml2 parser was able to handle these strings successfully.

The biggest challenge that this left for me was handle the 5 characters used in XML that when they occur in your data cause parsing errors. They are &”’< and >.

If I escaped them using ASCII escape sequences like &amp; &quot; &apos; &lt; and &gt; the parser was not able to handle them without adding DOCTYPE and ENTITY tags to the document. Then I was back to the game of wack a mole and defects. This was largely because my prior libraries encoded so many more characters than these 5 characters.

What I came up with and worked for me was to build this small NSString extension that escaped these characters directly into the numerical code escape sequences so the DOCTYPE and ENTITY tags were not required. The code is shown below.

LRNSString+UTF8.h

//

//  LRNSString+UTF8.h

//  Project Organizer

//

//  Created by Lawrence Ricker on 5/28/18.

//  Copyright © 2018 Mobile Developer. All rights reserved.

//

#ifndef LRNSString_UTF8_h

#define LRNSString_UTF8_h

@interface NSString (LRNSStringUTF8Additions)

– (NSString *)lr_stringByEscapingFromXML;

@end

#endif /* LRNSString_UTF8_h */

LRNSString+UTF8.m

//

//  LRNSString+UTF8.m

//  Project Organizer

//

//  Created by Lawrence Ricker on 5/28/18.

//  Copyright © 2018 Mobile Developer. All rights reserved.

//

#import <Foundation/Foundation.h>

#import “LRNSString+UTF8.h”

@implementation NSString (LRNSStringUTF8Additions)

/*

*  Original CharacterXML entity replacementXML numeric replacement

*  &                              &amp;                               &

*  <                              &lt;                                    <

*  >                              &gt;                                   >

*  ”                               &quot;                               ”

*  ‘                               &apos;                               ‘

*/

– (NSString *)lr_stringByEscapingFromXML {

return [[[[[self stringByReplacingOccurrencesOfString:@”&” withString:@”&”] stringByReplacingOccurrencesOfString:@”<” withString:@”<“] stringByReplacingOccurrencesOfString:@”>” withString:@”>”] stringByReplacingOccurrencesOfString:@”‘” withString:@”‘”] stringByReplacingOccurrencesOfString:@”\”” withString:@”””];

}

@end

Advertisements

Author: larryricker

Project Organizer and Project Organizer Pro schedule reminders, take notes, record minutes of your meetings, track status, record decisions, identify team members and partners and have the answers. Download a copy today on iPhone and iPad from the Apple App Store.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s