This is an implementation for vision and language multimodal research developed on top of Pythia

Model Implementation

We have proposed few major improvements over Pythia, which is an implementation for the LoRRA model. The dynamic answering space is expanded by adding an Instance segmentation module. We have also replaced the existing OCR with spell correcting OCR and add the spatial features of the OCR. We have also implemented n-gram to the modified OCR.

Test

For testing run our notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Model Implementation

Test

Files

README.md

Latest commit

History

README.md

File metadata and controls

Model Implementation

Test