Skip to content

fix: preserve leading newline of multiline string built with string()#551

Open
Sanjays2402 wants to merge 1 commit into
python-poetry:masterfrom
Sanjays2402:fix/multiline-string-leading-newline
Open

fix: preserve leading newline of multiline string built with string()#551
Sanjays2402 wants to merge 1 commit into
python-poetry:masterfrom
Sanjays2402:fix/multiline-string-leading-newline

Conversation

@Sanjays2402

Copy link
Copy Markdown

Problem

tomlkit.string(value, multiline=True) produces output that silently loses a leading newline on round-trip.

>>> import tomlkit
>>> s = tomlkit.string("\nfoo", multiline=True)
>>> str(s)                 # the String value is correct
'\nfoo'
>>> s.as_string()          # ...but its serialized form is not
'"""\nfoo"""'
>>> str(tomlkit.parse(f'k = {s.as_string()}\n')["k"])
'foo'                      # <- leading newline dropped!

Per the TOML spec: "A newline immediately following the opening delimiter will be trimmed." So """\nfoo""" decodes to foo, not \nfoo. The String object reports the right value, but its rendered form is wrong, so the data loss only appears after the output is parsed again.

This is a genuine serialization bug, not a TOML 1.0 vs 1.1 difference — both tomllib (1.0.0) and tomlkit's own parser (1.1.0) decode the old output to the wrong value:

string(value, multiline=True).as_string() reparses to (before) reparses to (after)
"\n""""\n""" "" "\n"
"\nabc""""\nabc""" "abc" "\nabc"
"\r\nx""""\r\nx""" "x" "\r\nx"
"abc\n""""abc\n""" "abc\n" ✅ (unchanged) "abc\n"

Affects both basic (""") and literal (''') multiline strings, and values starting with \r\n.

Root cause

In String.from_raw (tomlkit/items.py), the rendered body is placed directly between the delimiters. For a multiline type, \n/\r are intentionally kept literal (not escaped) for readability — but nothing accounts for the parser's rule that trims the first newline after the opening delimiter, so a body beginning with a newline is one newline short after re-parsing.

Fix

When the string is multiline and its rendered body begins with a newline, emit an extra leading newline — the one the parser trims — so the intended value survives the round-trip. This is exactly the representation the spec prescribes for such values. Single-line strings and multiline strings that don't start with a newline are unchanged (surgical, +7 lines).

Tests

  • An existing test_create_string case asserted the old lossy output ("""\nMy\nString\n"""). It only checked as_string(), never a round-trip, so it had locked in the bug. Updated it to the correct, round-trippable rendering.
  • Added test_create_multiline_string_with_leading_newline_round_trips, covering basic/literal multiline and \r\n-leading values, asserting parse(render(v)) == v.

Proof the regression test guards the fix (source change stashed, tests kept):

# without the source fix:
6 failed, 32 passed        # 5 new round-trip cases + the corrected case
# with the fix restored:
38 passed

Full suite: 1040 passed (was 1035), ruff check clean, ruff format --check clean on changed files.

`tomlkit.string(value, multiline=True)` rendered the value as
`"""<value>"""`. When `value` starts with a newline the output looks
like `"""\n..."""`, and per the TOML spec a newline immediately
following the opening delimiter is trimmed on parse. So a value that
begins with a newline was silently lost on the very next round-trip:

    >>> s = tomlkit.string("\nfoo", multiline=True)
    >>> str(s)
    '\nfoo'
    >>> s.as_string()
    '"""\nfoo"""'
    >>> str(tomlkit.parse(f"k = {s.as_string()}")["k"])
    'foo'                      # leading newline dropped

The value the String object reports is correct; only its serialized
form was wrong, so the corruption only surfaced after re-parsing.
Both tomllib and tomlkit's own parser agree the old output decodes to
the wrong value, so this is a genuine serialization bug (not a TOML
1.0 vs 1.1 difference). Affects basic and literal multiline strings,
including values that start with `\r\n`.

Fix: in `String.from_raw`, when the string is multiline and its
rendered body starts with a newline, emit an extra leading newline
(the one the parser trims) so the value survives round-tripping. This
is exactly how the spec suggests representing such a value. Single-line
strings and multiline strings without a leading newline are unchanged.

An existing `test_create_string` case asserted the old lossy output
(`"""\nMy\nString\n"""`); it only checked `as_string()`, never a
round-trip, so it had locked in the bug. Updated it to the correct,
round-trippable rendering and added a dedicated round-trip regression
test covering basic/literal multiline and `\r\n`-leading values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant