Skip to content

Latest commit

 

History

History

opy

OPy Compiler and Byterun

The OPy compiler is a Python bytecode compiler written in Python. See Building Oil with the OPy Bytecode Compiler. It's currently used to translate Python source code in Oil to .pyc files.

The byterun/ directory is a fork of byterun. It's an experiment for learning what it will take to write a minimal interpreter for Oil. It can currently run all Oil unit tests, but isn't otherwise used.

Very Rough Outline of Future Plans

  • Use opy/callgraph.py to find the minimum amount of code we need to compile. (Tree shaking).
  • Produce a statically typed AST.
    • Add type annotations for functions/constructors, possibly with .pyi files generated by tests. NOTE: Dropbox's pyannotate has some restrictions related to the sys.setprofile() hook. byterun might be better!
    • Add type annotations for class members. Possibly with something like attrs?
    • Add type inference for local variables.
  • Use the statically typed AST to generate type-specialized bytecodes. For example, we could have variants of BINARY_ADD that know the types of their arguments.
  • Implement these bytecodes in byterun as a prototype. Do the spec tests still pass?
  • Rewrite specialized bytecodes in our own interpeter loop in C.
    • It will not use import.c. Imports are resolved at compile time.
    • This means we also get rid of the app bundle .zip format, which is causing problems.

Also:

  • Strip Oil of runtime dependencies like the Python re module. Rewrite with re2c.
  • Copy the stuff we use out of posixmodule.c, pwdmodule.c, etc.

Getting started

Do the "Quick Start" in "in https://github.com/oilshell/oil/wiki/Contributing .

Then build the py27.grammar file:

$ make _build/opy/py27.grammar.pickle

After Oil is setup, we can try out OPy. Run these commands (and let me know if any of them doesn't work):

oil$ cd opy
opy$ ../bin/opyc run gold/hello_py2.py  # basic test of compiler and runtime

Compile Oil with the OPy compiler:

$ ./build.sh oil-repo  # makes _tmp/repo-with-opy and _tmp/repo-with-cpython

Run Oil unit tests, compiled with OPy, under CPython:

$ ./test.sh oil-unit

Run Oil unit tests, compiled with OPy, under byterun:

$ ./test.sh oil-unit-byterun   # Run Oil unit tests, compiled with OPy, under CPython

Gold tests in gold/ compare the output of CPython vs. byterun:

$ ./test.sh gold

Oil spec tests under byterun (slow):

opy$ ./test.sh spec smoke  # like $REPO_ROOT/test/spec.sh smoke
opy$ ./test.sh spec all    # like $REPO_ROOT/test/spec.sh all

FYI, they can be run manually like this:

$ gold/regex_compile.py  # run with CPython
$ ../bin/opyc run gold/regex_compile.py

Demo of the speed difference between OSH under CPython and OSH under byterun:

./demo.sh osh-byterun-speed

OPy Compiler Regtest

This uses an old snapshot of the repo in _regtest/.

./regtest.sh compile
./regtest.sh verify-golden

Notes on Three OPy Builds

  • $REPO_ROOT/_build/oil/bytecode-opy: Bytecode for the release binary. Built by Makefile.
  • $REPO_ROOT/opy/_tmp/repo-with-opy: The entire repo with OPy. For running Oil unit/spec tests under byterun, etc. Built by ./build.sh oil-repo.
  • $REPO_ROOT/opy/_tmp/regtest: The snapshot of Python files in opy/_regtest are compiled, so we are insensitive to repo changes. Built by ./regtest.sh compile.

opy/callgraph.py demo

This is currently completely separate than the rest of the OPy compiler. The idea is to find the exact set of symbols that the compiler needs to handle by walking a static callgraph using bytecode disassembly heuristics. This means that if we import os, we don't need to compile everything in os.py, etc.

oil$ scripts/count.sh oil-python-symbols
oil$ scripts/count.sh opy-python-symbols

OPy Compiler Divergences from CPython

Lexer

  • I don't remember where exactly, but I ran into a bug lexing the CPython test suite. IIRC, CPython's lexer was more lenient about adjacent tokens without spaces than tokenize.py.
  • heapq.py had -*- coding: latin-1 -*-, which causes problems. OPy should require utf-8 source anyway.

Parser

  • I ran into a bug where a file like d = {}, without a trailing newline, gives a parse error. Adding the newline fixes it.
  • print statements aren't allowed; we force Python 3-style print(x, y, file=sys.stderr). I think this is because the parser doesn't know about __future__ statements, so it can't change the parsing mode on the fly.

Bytecode Compiler

  • I think there are no LOAD_FAST bytecodes generated? TODO: Make a bytecode histogram using opy/misc/inspect_pyc.
  • The OPy bytecode is bigger than the CPython bytecode! Why is that?