In this article, we explain the SMS encoding and explore two common methods: GSM-7 and UCS-2
In the world of SMS messaging, understanding the nuances of GSM-7 and UCS-2 encoding is key to ensuring efficient and cost-effective communication.
When it comes to sending SMS messages, not all characters are created equal.
In fact, the way your text is encoded can greatly impact the number of text segments sent. It also impacts the cost of sending those messages. And even whether or not your recipients can properly receive your messages.
What is SMS Encoding
It refers to the process of converting text characters into a format that can be transmitted over a mobile network. Since SMS messages have limitations in terms of character count, encoding is necessary to ensure that messages are transmitted efficiently.
GSM-7 is a widely used encoding standard that allows for the transmission of text messages in a compact form.
It supports a set of 160 characters. It also includes uppercase letters, lowercase letters, numbers, and common punctuation marks.
However, not all characters are supported in GSM-7. Some special characters – accented letters – and non-Latin characters are not included in the GSM-7 character set. And they require a different encoding method.
Messages containing only GSM-7 characters can contain up to 160 characters in a single, non-segmented message. GSM-7 messages of more than 160 characters will be split into 153-character segments.
UCS-2 is a universal character encoding standard. It can represent almost any character from any writing system in the world.
It uses 16 bits to represent each character, making it capable of encoding a much wider range of characters compared to GSM-7.
And it’s used when messages contain characters that are not supported in GSM-7: such as non-Latin scripts, emoji, or special characters.
Messages with one or more UCS-2 characters can contain up to 70 characters in a single, non-segmented message. UCS-2 messages of more than 70 characters will be split into 67-character segments.
Implications of GSM-7 and UCS-2 Encoding
The choice of encoding can have significant implications for SMS messaging.
When using GSM-7, each character typically takes up one byte of space. This allows for up to 160 characters in a single SMS message.
However, when using UCS-2, each character takes up two bytes, reducing the maximum character count per message to 70.
This means that messages encoded in UCS-2 may be more expensive. They require more space and may be split into multiple messages (segments), resulting in higher costs.
It’s important to note that if a message contains a mix of GSM-7 and UCS-2 characters, the entire message may be automatically encoded in UCS-2.
Tips for Ensuring Correct SMS Encoding
Here are some tips to avoid unexpected message splitting or higher costs due to UCS-2 encoding:
- Use GSM-7 encoding whenever possible: Stick to the GSM-7 character set for your messages to maximize the character count and minimize costs.
- Be aware of special characters: Special characters, accented letters, and non-Latin characters may not be supported in GSM-7. And they may require UCS-2 encoding.
- Test your messages: Use tools or services that allow you to check if your message can be encoded in GSM-7 or if UCS-2.
- Be cautious with text editors: Some text editors may automatically add non-GSM-7 characters; such as angled smart quotes or non-standard spaces. Ensure that your text editor is not inadvertently adding characters that may trigger UCS-2 encoding.
Understanding SMS text segments and Billing
Increase deliverability of your text messages
Forbidden message categories for SMS
Avoid your texts being marked as SPAM