We built a large-scale dataset named HRDoc, which consists of 2,500 multi-page documents with nearly 2 million semantic units. Moreover, we proposed an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle document structure reconstruction task. Code and dataset are available at
https://github.com/jfma-USTC/HRDoc.