Skip to content

Latest commit

 

History

History
30 lines (9 loc) · 893 Bytes

README.md

File metadata and controls

30 lines (9 loc) · 893 Bytes

This is an implementation for vision and language multimodal research developed on top of Pythia

Pythia

Model Implementation

We have proposed few major improvements over Pythia, which is an implementation for the LoRRA model. The dynamic answering space is expanded by adding an Instance segmentation module. We have also replaced the existing OCR with spell correcting OCR and add the spatial features of the OCR. We have also implemented n-gram to the modified OCR.

Pythia

Test

For testing run our notebook