Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked player-cli's parse_dockerfile_cmd-Function #7

Merged
merged 4 commits into from
Sep 16, 2023
Merged

Reworked player-cli's parse_dockerfile_cmd-Function #7

merged 4 commits into from
Sep 16, 2023

Conversation

MarkusOstermayer
Copy link
Contributor

This pull-requests reworks the parse_dockerfile_cmd-Function in the player-cli.
This was mainly done to improve the readability of the code. A nice side-effect that the runtime of the code improved.
For the performance test I used the template for the python exploit.

Please see the following code for the performance test.

import re
import timeit

EXAMPLE_DOCKERFILE = """FROM python:latest

RUN pip install --no-cache-dir --upgrade pip

RUN pip install --no-cache-dir pwntools cryptography beautifulsoup4 requests

WORKDIR /exploit

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD [ "python3", "exploit.py", "arg1" , "arg2","arg3"]
"""

CHECK_FOR_CMD = re.compile(r'CMD\s?\[\s?(.+)\s?\]')
EXTRACT_CMD = re.compile(r'"([\w.]+)"')

def parse_dockerfile_cmd_old(content):
    exec_args = None

    for line in content.splitlines():
        line = line.strip()

        if not line.startswith('CMD '):
            continue
        line = line[4:].strip()

        if line[0] != '[' or line[-1] != ']':
            continue
        line = line[1:-1].strip()

        args = []
        for arg in line.split(','):
            arg = arg.strip()
            if arg[0] != '"' or arg[-1] != '"':
                break
            args.append(arg[1:-1])
        else:
            exec_args = args

    return exec_args

def parse_dockerfile_cmd_new(content: str) -> list[str] | None:
    matches = CHECK_FOR_CMD.findall(content)
    if matches:
        return EXTRACT_CMD.findall(matches[0])
    return None

assert parse_dockerfile_cmd_new(EXAMPLE_DOCKERFILE) == parse_dockerfile_cmd_old(EXAMPLE_DOCKERFILE)
print("Performance with the Dockerfile containing CMD")
print("Old Function", timeit.timeit(
    "parse_dockerfile_cmd_old(EXAMPLE_DOCKERFILE)",
    setup="from __main__ import parse_dockerfile_cmd_old, EXAMPLE_DOCKERFILE")
)
print("New Function", timeit.timeit(
    "parse_dockerfile_cmd_new(EXAMPLE_DOCKERFILE)",
    setup="from __main__ import parse_dockerfile_cmd_new, EXAMPLE_DOCKERFILE")
)
print("Performance without the Dockerfile containing CMD")
print("Old Function", timeit.timeit(
    "parse_dockerfile_cmd_old(EXAMPLE_DOCKERFILE[:-55])",
    setup="from __main__ import parse_dockerfile_cmd_old, EXAMPLE_DOCKERFILE")
)
print("New Function", timeit.timeit(
    "parse_dockerfile_cmd_new(EXAMPLE_DOCKERFILE[:-55])",
    setup="from __main__ import parse_dockerfile_cmd_new, EXAMPLE_DOCKERFILE")
)
Performance with the Dockerfile containing CMD
Old Function 4.199408577002032
New Function 2.311218070997711
Performance without the Dockerfile containing CMD
Old Function 2.3191767030002666
New Function 0.45896310999887646

In my opinion, alphabetically sorted imports also increase the readability. Thats why I also sorted them.
Any feedback is appreciated.

@ItsShadowCone
Copy link
Member

Thank you for your contribution, this looks awesome!

Might take a while tho until I can merge it, I'm a little busy right now.

@aljazmedic
Copy link

This looks like a really nice speed up and readability update :)

Could we just change the matches[0] to take the last match of CMD?
As per this reference

I also tried json.loads, so we don't have to worry about escaped characters, such as

CMD [   "python", "script.py", "--blacklist-chars", "\"+'\t"   ]

It is a trade-off for speed however, depends on the wanted use-cases.

Also, what do you think about multi-whitespace regex ..\s+.. around the braces? since some of the folks don't have linters on all the time.

@MarkusOstermayer
Copy link
Contributor Author

MarkusOstermayer commented Aug 2, 2023

Thanks for the feedback guys.

@DaClemens: Take your time, it's just a readability improvement.

@aljazmedic:
Thanks for the Feedback, I somehow was under the impression that Docker will use the first CMD it encounters, I'll change it accordingly.
Also changing the whitespace matches to \s+ sounds like a good idea.

Regarding the special characters:
I experimented a little bit with the regex for matching special characters, however I wasn't really satisfied with anything I came up with.
What we can do is to find the CMD-Line using regex and then extract the arguments using string-functions (namely split and partition)
What do you think about the following approach?

CHECK_FOR_CMD = re.compile(r'CMD\s+\[\s+(.+)\s+\]')

def parse_dockerfile_cmd_new(content: str) -> list[str] | None:
    matches = CHECK_FOR_CMD.findall(content)
    if matches:
        ret_arguments = []
        for argument in matches[-1].split(","):
            ret_arguments.append(
                # Partition the string on the first "
                argument.partition("\"")[2]\
                # and on the last "
                .rpartition("\"")[0]
            )

        return ret_arguments
    return None

The performance isn't too bad after all.

Performance with the Dockerfile containing CMD
Old Function 4.655844093002088
New Function 2.8222548350022407

Performance without the Dockerfile containing CMD
Old Function 3.314049700005853
New Function 0.9741615590028232

@aljazmedic
Copy link

The partition trick is really neat!
I did not even know that such function exists in python.

But I think there are still some (really really rare) edge cases, e.g.
CMD ["python3", "exploit.py", "--ignore-chars", "\"'],;.CMD"] (Added a comma)
While the speed is much faster using splits and partitions, wouldn't it make sense to sacrifice a second on each runlocal and perhaps try with json library parser?
For the match parsing json.loads usually takes around 2.45s, but it's unexpected-character safe.

My last iteration for regex is currently:
re.compile(r"^CMD\s+(\[.*\])(?:[\s;])*(?:#.*)?$", re.MULTILINE) (With ^$ we eliminate fluke ]).
It would still choke on something like:
CMD ["ha!"] # [TODO]: won't work
But maybe if we strive for perfect match with docker parsing, we could just try to build local image and get its command through docker? Just an idea.

@ItsShadowCone
Copy link
Member

Since I'm still on vacation, I'll just leave two things to keep in mind here:

  • I don't think a perfect match for docker is desirable here, I prefer code simplicity, as long as common cases are gracefully handled (such as double white space, different quotations, etc).
  • Furthermore, if it's possible, I'd prefer a one line comment documenting the parser in the example Dockerfiles (that's a TODO for later, I have a couple changes improving the whole templating process that isn't upstreamed yet)
  • is this function performance-critical? While I never mind more efficiency, it shouldn't come at the expense of maintainability and simplicity. Iirc the function is regularly called only once per tick, per runlocal (if not once per runlocal overall). So unless performance for a single call is in the seconds range, I don't think that's an important argument; feel free to correct me tho, I'm only writing this off the top of my head.

@ItsShadowCone
Copy link
Member

by the looks, this shouldn't interfere with my recently published updates, but would you mind to rebase / update it anyways?

@MarkusOstermayer
Copy link
Contributor Author

Ups, I accidentally closed the PR while updating it.

@MarkusOstermayer
Copy link
Contributor Author

I found a small issue with the previously used regex and added a docstring.
If you do not want any further changes you can merge the PR.

@ItsShadowCone ItsShadowCone merged commit 51ef5f2 into OpenAttackDefenseTools:main Sep 16, 2023
@ItsShadowCone
Copy link
Member

Thank you very much!

ItsShadowCone pushed a commit that referenced this pull request Jul 10, 2024
Reordered Imports, reworked parse_dockerfile_cmd, added a docstring (incl. doctests)
ItsShadowCone pushed a commit that referenced this pull request Jul 10, 2024
Reordered Imports, reworked parse_dockerfile_cmd, added a docstring (incl. doctests)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants