The Bio tokenURI
function splits biography text into lines. Since the Bio contract supports unicode, the algorithm tries to avoid splitting the line in between bytes that represent a single unicode character.
However, the implementation fails for certain unicode characters and will insert a line break in the middle of a single character, which changes the character into others before or after the line break. See PoC for a better understanding of the issue.
In the following test, we use a series of the "👁️🗨️" character as the biography text.
Note: the snippet shows only the relevant code for the test. Full test file can be found here.
function test_Bio_tokenURI_BadLineBreak1() public {
// The following text will be badly splitted
string memory text = unicode"👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️👁️🗨️";
bio.mint(string(text));
console.log(bio.tokenURI(1));
}
As we can see in the resulting SVG, the algorithm splitted lines in between the character, which changes it into other unicode characters:
<svg xmlns="http://www.w3.org/2000/svg" preserveAspectRatio="xMinYMin meet" viewBox="0 0 400 100"><style>text { font-family: sans-serif; font-size: 12px; }</style><text x="50%" y="50%" dominant-baseline="middle" text-anchor="middle"><tspan x="50%" dy="20">👁️🗨️👁️🗨️👁️🗨</tspan><tspan x="50%" dy="20">️👁️🗨️👁️🗨</tspan><tspan x="50%" dy="20">️👁️🗨️👁️🗨️👁</tspan><tspan x="50%" dy="20">️🗨️👁️🗨️👁️🗨</tspan><tspan x="50%" dy="20">️👁️🗨️</tspan></text></svg>
``
In this other test, we use the england flag character "🏴":
```solidity
function test_Bio_tokenURI_BadLineBreak2() public {
// The following text will be badly splitted
string memory text = unicode"🏴🏴🏴🏴🏴🏴🏴";
bio.mint(string(text));
console.log(bio.tokenURI(1));
}
Again we see how the character is changed in between line breaks in the SVG:
<svg xmlns="http://www.w3.org/2000/svg" preserveAspectRatio="xMinYMin meet" viewBox="0 0 400 100"><style>text { font-family: sans-serif; font-size: 12px; }</style><text x="50%" y="50%" dominant-baseline="middle" text-anchor="middle"><tspan x="50%" dy="20">🏴🏴</tspan><tspan x="50%" dy="20">🏴</tspan><tspan x="50%" dy="20">🏴🏴</tspan><tspan x="50%" dy="20">🏴</tspan><tspan x="50%" dy="20">🏴</tspan></text></svg>
The tokenURI
function should implement proper unicode support if its intention is to manually handle line breaks. As this is a cumbersome task, the recommendation is to avoid splitting the text (or delegate it to the SVG in case such a feature exists).