In this paper, we present the GraphDoc, a multimodal graph attention-based model for various document understanding tasks. GraphDoc learns a generic representation from only 320k unlabeled documents via the Masked Sentence Modeling task. The code is available at
https://github.com/ZZR8066/GraphDoc.