The OPy compiler is a Python bytecode compiler written in Python. See
Building Oil with the OPy Bytecode Compiler. It's currently
used to translate Python source code in Oil to .pyc
files.
The byterun/
directory is a fork of byterun. It's an experiment for
learning what it will take to write a minimal interpreter for Oil. It can
currently run all Oil unit tests, but isn't otherwise used.
- Use
opy/callgraph.py
to find the minimum amount of code we need to compile. (Tree shaking). - Produce a statically typed AST.
- Add type annotations for functions/constructors, possibly with
.pyi
files generated by tests. NOTE: Dropbox's pyannotate has some restrictions related to thesys.setprofile()
hook.byterun
might be better! - Add type annotations for class members. Possibly with something like attrs?
- Add type inference for local variables.
- Add type annotations for functions/constructors, possibly with
- Use the statically typed AST to generate type-specialized bytecodes. For
example, we could have variants of
BINARY_ADD
that know the types of their arguments. - Implement these bytecodes in byterun as a prototype. Do the spec tests still pass?
- Rewrite specialized bytecodes in our own interpeter loop in C.
- It will not use
import.c
. Imports are resolved at compile time. - This means we also get rid of the app bundle
.zip
format, which is causing problems.
- It will not use
Also:
- Strip Oil of runtime dependencies like the Python
re
module. Rewrite withre2c
. - Copy the stuff we use out of
posixmodule.c
,pwdmodule.c
, etc.
Do the "Quick Start" in "in https://github.com/oilshell/oil/wiki/Contributing .
Then build the py27.grammar
file:
$ make _build/opy/py27.grammar.pickle
After Oil is setup, we can try out OPy. Run these commands (and let me know if any of them doesn't work):
oil$ cd opy
opy$ ../bin/opyc run gold/hello_py2.py # basic test of compiler and runtime
Compile Oil with the OPy compiler:
$ ./build.sh oil-repo # makes _tmp/repo-with-opy and _tmp/repo-with-cpython
Run Oil unit tests, compiled with OPy, under CPython:
$ ./test.sh oil-unit
Run Oil unit tests, compiled with OPy, under byterun:
$ ./test.sh oil-unit-byterun # Run Oil unit tests, compiled with OPy, under CPython
Gold tests in gold/
compare the output of CPython vs. byterun:
$ ./test.sh gold
Oil spec tests under byterun (slow):
opy$ ./test.sh spec smoke # like $REPO_ROOT/test/spec.sh smoke
opy$ ./test.sh spec all # like $REPO_ROOT/test/spec.sh all
FYI, they can be run manually like this:
$ gold/regex_compile.py # run with CPython
$ ../bin/opyc run gold/regex_compile.py
Demo of the speed difference between OSH under CPython and OSH under byterun:
./demo.sh osh-byterun-speed
This uses an old snapshot of the repo in _regtest/
.
./regtest.sh compile
./regtest.sh verify-golden
$REPO_ROOT/_build/oil/bytecode-opy
: Bytecode for the release binary. Built byMakefile
.$REPO_ROOT/opy/_tmp/repo-with-opy
: The entire repo with OPy. For running Oil unit/spec tests under byterun, etc. Built by./build.sh oil-repo
.$REPO_ROOT/opy/_tmp/regtest
: The snapshot of Python files inopy/_regtest
are compiled, so we are insensitive to repo changes. Built by./regtest.sh compile
.
This is currently completely separate than the rest of the OPy compiler. The idea is to find the exact set of symbols that the compiler needs to handle by walking a static callgraph using bytecode disassembly heuristics. This means that if we import os
, we don't need to compile everything in os.py
, etc.
oil$ scripts/count.sh oil-python-symbols
oil$ scripts/count.sh opy-python-symbols
- I don't remember where exactly, but I ran into a bug lexing the CPython test
suite. IIRC, CPython's lexer was more lenient about adjacent tokens without
spaces than
tokenize.py
. heapq.py
had-*- coding: latin-1 -*-
, which causes problems. OPy should requireutf-8
source anyway.
- I ran into a bug where a file like
d = {}
, without a trailing newline, gives a parse error. Adding the newline fixes it. - print statements aren't allowed; we force Python 3-style
print(x, y, file=sys.stderr)
. I think this is because the parser doesn't know about__future__
statements, so it can't change the parsing mode on the fly.
- I think there are no
LOAD_FAST
bytecodes generated? TODO: Make a bytecode histogram usingopy/misc/inspect_pyc
. - The OPy bytecode is bigger than the CPython bytecode! Why is that?