Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Understanding Python String Conversion Between utf-8 and unicode_escape

Автор: vlogommentary

Загружено: 2026-01-09

Просмотров: 0

Описание:

Learn why mixing utf-8 encoding and unicode_escape decoding in Python can break your string, and how to properly convert strings to preserve Unicode characters.
---
This video is based on the question https://stackoverflow.com/q/79375917/ asked by the user 'Some Guy' ( https://stackoverflow.com/u/7376511/ ) and on the answer https://stackoverflow.com/a/79375997/ provided by the user 'Mark Tolonen' ( https://stackoverflow.com/u/235698/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.

Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Python: convert back and forth between utf-8 and unicode_escape, preserving character

Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.

If anything seems off to you, please feel free to drop me a comment under this video.
---
Introduction

When working with Python strings, you might encounter issues converting a string back and forth between utf-8 and unicode_escape — especially with non-ASCII characters like kanji. The operations are often not reversible as expected, causing confusing escape sequences instead of the original characters.

This post explains why this happens and how to correctly convert strings while preserving Unicode characters.



Why do utf-8 and unicode_escape mix poorly?

Consider this example:

[[See Video to Reveal this Text or Code Snippet]]

Result:

[[See Video to Reveal this Text or Code Snippet]]

Instead of the original Kanji character 隼, you get escaped byte sequences. Although these bytes are equivalent at the binary level, the terminal, URLs, and other contexts expect the actual character, not byte escapes.

Step-by-step explanation

Breaking down the chain:

Original string s = 'Hello隼'

s.encode('utf-8') converts to bytes representing UTF-8 encoding:

[[See Video to Reveal this Text or Code Snippet]]

.decode('unicode_escape') treats these byte values as escape sequences — but since the bytes are non-ASCII, it misinterprets them as Latin-1 characters leading to garbled output like Helloéš¼.

Re-encoding with .encode('unicode_escape') converts the garbled text to string containing escaped sequences literally, e.g., b'Hello\xe9\x9a\xbc'.

Decoding this back with .decode('utf-8') gives you the weird escaped string, not your original text.

What happens internally?

unicode_escape expects ASCII bytes with escape sequences like \uXXXX or \xXX.

Passing raw UTF-8 encoded bytes to unicode_escape breaks that expectation.



Correct approach

Use latin-1 decoding instead of unicode_escape

Because .encode('unicode_escape') always produces ASCII bytes, the proper way to decode it back is with latin-1:

[[See Video to Reveal this Text or Code Snippet]]

This step bypasses the incorrect handling of UTF-8 bytes by unicode_escape.

Better: avoid mixing utf-8 with unicode_escape

If you want reversible conversion using unicode_escape, work directly with text, not UTF-8 bytes:

[[See Video to Reveal this Text or Code Snippet]]

This cleanly escapes and unescapes Unicode code points without losing character integrity.



Summary

Don't use .decode('unicode_escape') on UTF-8 bytes; it misinterprets bytes as Latin-1, corrupting your string.

To reverse .encode('unicode_escape'), decode using 'latin-1', not 'unicode_escape'.

For reversible Unicode escaping, apply .encode('unicode_escape') and then decode with .decode('unicode_escape') on strings directly, not on bytes.

This approach preserves your Unicode characters accurately across conversions.

Understanding Python String Conversion Between utf-8 and unicode_escape

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Unicode Encoding! UTF-32, UCS-2, UTF-16, & UTF-8!

Unicode Encoding! UTF-32, UCS-2, UTF-16, & UTF-8!

Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more

Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more

UTF-8, простое объяснение

UTF-8, простое объяснение

Python 3.14: The NEW T-strings are Awesome

Python 3.14: The NEW T-strings are Awesome

Этот ракетный двигатель не был разработан людьми.

Этот ракетный двигатель не был разработан людьми.

Код работает в 100 раз медленнее из-за ложного разделения ресурсов.

Код работает в 100 раз медленнее из-за ложного разделения ресурсов.

Эпизод 021: Примеры кодировки UTF-8

Эпизод 021: Примеры кодировки UTF-8

Что такое UTF-8 и UTF-16? Работа с кодировками Unicode

Что такое UTF-8 и UTF-16? Работа с кодировками Unicode

str vs bytes in Python

str vs bytes in Python

Самый короткий тест на интеллект Задача Массачусетского профессора

Самый короткий тест на интеллект Задача Массачусетского профессора

Как компьютеры хранят текст — ASCII, Unicode, UTF-8, UTF-16 и UTF-32

Как компьютеры хранят текст — ASCII, Unicode, UTF-8, UTF-16 и UTF-32

Characters, Symbols and the Unicode Miracle - Computerphile

Characters, Symbols and the Unicode Miracle - Computerphile

⍼ — Почему никто не знает, что означает этот символ Unicode

⍼ — Почему никто не знает, что означает этот символ Unicode

Typst: Современная замена Word и LaTeX, которую ждали 40 лет

Typst: Современная замена Word и LaTeX, которую ждали 40 лет

Python Quick Tip: F-Strings - How to Use Them and Advanced String Formatting

Python Quick Tip: F-Strings - How to Use Them and Advanced String Formatting

Creating Your Own Programming Language - Computerphile

Creating Your Own Programming Language - Computerphile

Румынская математическая олимпиада

Румынская математическая олимпиада

If You Can't Explain UTF 8 vs Unicode, Watch This

If You Can't Explain UTF 8 vs Unicode, Watch This

ДАМПЫ В JAVA на практике, разбираем проблемы

ДАМПЫ В JAVA на практике, разбираем проблемы

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: infodtube@gmail.com