Experiments and dataset for the paper Rethinking Evaluation Methodology for Audio-to-Score Alignment.
To get started, first clone a copy of the Bach WTC scores:
https://github.com/humdrum-tools/bach-wtc
You'll also need a copy of the MAESTRO (v2.0) dataset:
https://magenta.tensorflow.org/datasets/maestro#v200
After downloading the scores and MAESTRO dataset, you can extract the aligmnent dataset
by calling the extract
script from the root of this repository:
python3 extract.py {path-to-scores}/bach-wtc/ {path-to-maestro}/maestro-v2.0.0
The script will extract pairs of KernScores and MAESTRO performances to the data/ subfolder.
To generate the ground-truth alignments, run the following:
python3 align.py ground data/score data/perf N
The first argument specifes the alignment algorithm (written to an output directory of the same name). The fourth argument 'N' specifies the number of parallel processes to run (N = 0 runs non-parallel)
You can compute audio-to-score alignments by specifying a particular alignment algorithm:
python3 align.py {spectra,chroma,cqt} data/score data/perf N
The alignments generated by the alignment script are stored in align/{ground,spectra,chroma,cqt} as plaintext files with two columns: the first column indicates time in the score, and the second column indicates time in the performance.
To evaluate the results of a particular alignment algorithm:
python3 eval.py {spectra,chroma,cqt} data/score data/perf
To understand the behavior of the ground-truth alignments, we can visually compare the piano-roll performance (subplot 1) captured by the Yamaha Disklavier to the performance-aligned score created by warping the score according to the ground-truth alignment (subplot 2). In the comparison plot (subplot 3) we use red to identify notes that are indicated by the performance-aligned score but not performed and yellow to identify notes that are performed but not indicated by the performance-aligned score. This example visualizes the beginning of a performance of the Bach's Prelude and Fugue in G-sharp minor (BWV 863).
We can also use these visualizations to compare the results of an candidate alignment algorithm to the ground-truth alignment. In each case, red is used to identify notes that are indicated by the candidate alignment algorithm, but not by the ground-truth alignment, and yellow is used to identify notes that are indicated by the ground-truth alignment, but not by the candidate alignment.
To reference this work, please cite
@article{thickstun2020rethinking,
author = {John Thickstun and Jennifer Brennan and Harsh Verma},
title = {Rethinking Evaluation Methodology for Audio-to-Score Alignment},
journal = {arXiv preprint arXiv:2009.14374},
year = {2020},
}