Bio line break will fail for certain unicode characters and change them into other characters

Impact

The Bio tokenURI function splits biography text into lines. Since the Bio contract supports unicode, the algorithm tries to avoid splitting the line in between bytes that represent a single unicode character.

However, the implementation fails for certain unicode characters and will insert a line break in the middle of a single character, which changes the character into others before or after the line break. See PoC for a better understanding of the issue.

Proof of Concept

In the following test, we use a series of the "👁️‍🗨️" character as the biography text.

Note: the snippet shows only the relevant code for the test. Full test file can be found here.

function test_Bio_tokenURI_BadLineBreak1() public {
    // The following text will be badly splitted
    string memory text = unicode"👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️👁️‍🗨️";
    bio.mint(string(text));
    console.log(bio.tokenURI(1));
}

As we can see in the resulting SVG, the algorithm splitted lines in between the character, which changes it into other unicode characters:

<svg xmlns="http://www.w3.org/2000/svg" preserveAspectRatio="xMinYMin meet" viewBox="0 0 400 100"><style>text { font-family: sans-serif; font-size: 12px; }</style><text x="50%" y="50%" dominant-baseline="middle" text-anchor="middle"><tspan x="50%" dy="20">👁️‍🗨️👁️‍🗨️👁️‍🗨</tspan><tspan x="50%" dy="20">️👁️‍🗨️👁️‍🗨</tspan><tspan x="50%" dy="20">️👁️‍🗨️👁️‍🗨️👁</tspan><tspan x="50%" dy="20">️‍🗨️👁️‍🗨️👁️‍🗨</tspan><tspan x="50%" dy="20">️👁️‍🗨️</tspan></text></svg>
``

In this other test, we use the england flag character "🏴󠁧󠁢󠁥󠁮󠁧󠁿":

```solidity
function test_Bio_tokenURI_BadLineBreak2() public {
    // The following text will be badly splitted
    string memory text = unicode"🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿";
    bio.mint(string(text));
    console.log(bio.tokenURI(1));
}

Again we see how the character is changed in between line breaks in the SVG:

<svg xmlns="http://www.w3.org/2000/svg" preserveAspectRatio="xMinYMin meet" viewBox="0 0 400 100"><style>text { font-family: sans-serif; font-size: 12px; }</style><text x="50%" y="50%" dominant-baseline="middle" text-anchor="middle"><tspan x="50%" dy="20">🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢</tspan><tspan x="50%" dy="20">󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧</tspan><tspan x="50%" dy="20">󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧</tspan><tspan x="50%" dy="20">󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁥󠁮</tspan><tspan x="50%" dy="20">󠁧󠁿🏴󠁧󠁢󠁥󠁮󠁧󠁿</tspan></text></svg>

Recommendation

The tokenURI function should implement proper unicode support if its intention is to manually handle line breaks. As this is a cumbersome task, the recommendation is to avoid splitting the text (or delegate it to the SVG in case such a feature exists).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M-02.md

M-02.md

Bio line break will fail for certain unicode characters and change them into other characters

Impact

Proof of Concept

Recommendation

Files

M-02.md

Latest commit

History

M-02.md

File metadata and controls

Bio line break will fail for certain unicode characters and change them into other characters

Impact

Proof of Concept

Recommendation