- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

Unicode operations can be performed by first fetching the length of the strings, and setting this to other values (the default value is ‘byte’). The ‘encode’ method is used to convert vector of code points into encoded string scalar. This is done to determine the Unicode code points in every encoded string.

**Read More:**
What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

Models which process natural language handle different languages that have different character sets. Unicode is considered as the standard encoding system which is used to represent character from almost all the languages. Every character is encoded with the help of a unique integer code point that is between 0 and 0x10FFFF. A Unicode string is a sequence of zero or more code values.

Let us understand how to represent Unicode strings using Python, and manipulate those using Unicode equivalents. First, we separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string ops.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

print("The final character takes about 4 bytes in UTF-8 encoding") thanks = u'Hello 😊'.encode('UTF-8') num_bytes = tf.strings.length(thanks).numpy() num_chars = tf.strings.length(thanks, unit='UTF8_CHAR').numpy() print('{} bytes; {} UTF-8 characters'.format(num_bytes, num_chars))

Code credit: https://www.tensorflow.org/tutorials/load_data/unicode

The final character takes about 4 bytes in UTF-8 encoding 10 bytes; 7 UTF-8 characters

- The tf.strings.length operation has a parameter unit that indicates the method in which lengths need to be computed.
- The unit default is "BYTE", but it can be set to other values, such as "UTF8_CHAR" or "UTF16_CHAR".
- This is done to find the number of Unicode codepoints in every encoded string.

- Related Questions & Answers
- How can Unicode strings be represented and manipulated in Tensorflow?
- How can Unicode string be split, and byte offset be specified with Tensorflow & Python?
- How can discrete Fourier transform be performed in SciPy Python?
- How can Tensorflow be used to compose layers using Python?
- How Decibel Operations are Performed Through Logarithms?
- How can element wise multiplication be done in Tensorflow using Python?
- How can Logistic Regression be implemented using TensorFlow?
- How can Linear Regression be implemented using TensorFlow?
- How can the preprocessed data be shuffled using Tensorflow and Python?
- How can Tensorflow be used to add two matrices using Python?
- How can Tensorflow be used to multiply two matrices using Python?
- How can Tensorflow be used to visualize the data using Python?
- How can Tensorflow be used to standardize the data using Python?
- How can Tensorflow be used to compile the model using Python?
- How can Tensorflow be used to train the model using Python?

Advertisements