Skip to content
Change the repository type filter

All

    Repositories list

    • [PR 2025] The official GitHub page of "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories"
      Python
      57820Updated Apr 13, 2026Apr 13, 2026
    • AutoHDR

      Public
      [ACL 2025 main] The official GitHub page of "Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration"
      Python
      55820Updated Apr 13, 2026Apr 13, 2026
    • [arXiv 25] OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities
      Python
      Apache License 2.0
      4000Updated Apr 9, 2026Apr 9, 2026
    • [PRCV 25] Towards Real-World Document Specular Highlight Removal: The DocHighlight Dataset and DocSHRNet Method
      0400Updated Jan 14, 2026Jan 14, 2026
    • HisDoc1B

      Public
      12110Updated Dec 18, 2025Dec 18, 2025
    • WenMind

      Public
      WenMind benchmark.
      Python
      1800Updated Dec 17, 2025Dec 17, 2025
    • MCS-Bench

      Public
      Python
      1500Updated Dec 17, 2025Dec 17, 2025
    • ACP-RAG

      Public
      [NAACL 2025] Large-Scale Corpus Construction and Retrieval-Augmented Generation for Ancient Chinese Poetry: New Method and Data Insights (ACP-Corpus; ACP-QA; AC…
      Python
      0600Updated Dec 17, 2025Dec 17, 2025
    • [ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
      Python
      Apache License 2.0
      37420Updated Dec 17, 2025Dec 17, 2025
    • TongGu-VL

      Public
      A Multimodal large language model for Classical Chinese Studies
      0100Updated Dec 16, 2025Dec 16, 2025
    • TVSIP

      Public
      [ACM MM 2025] The official GitHub page of "From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text Detection"
      Python
      01010Updated Dec 10, 2025Dec 10, 2025
    • A Comprehensive Benchmark for Chinese Long Historical Document Understanding
      Python
      0500Updated Sep 23, 2025Sep 23, 2025
    • MCCD

      Public
      [ICDAR 2025] The official GitHub page of "MCCD: A Multi-Attribute Chinese Calligraphy Character Dataset Annotated with Script Styles, Dynasties, and Calligraphe…
      Python
      02720Updated Sep 2, 2025Sep 2, 2025
    • DOLPHIN

      Public
      [IEEE TIFS 2024] Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach
      Python
      GNU General Public License v3.0
      15710Updated Aug 3, 2025Aug 3, 2025
    • PAVENet

      Public
      [IEEE TPAMI 2025] Privacy-Preserving Biometric Verification With Handwritten Random Digit String
      Python
      GNU General Public License v3.0
      06710Updated Aug 3, 2025Aug 3, 2025
    • SigBench

      Public
      GNU General Public License v3.0
      0000Updated Jun 19, 2025Jun 19, 2025
    • [PR 2026] The official GitHub page of "AutoScaler: Self Scale Alignment for Handwritten Mathematical Expression Recognition"
      Python
      0910Updated Jun 8, 2025Jun 8, 2025
    • C3bench

      Public
      C3 benchmark
      0310Updated Mar 30, 2025Mar 30, 2025
    • Algorithms, papers, datasets, performance comparisons for Document AI.
      920700Updated Mar 1, 2025Mar 1, 2025
    • DCOH-120K

      Public
      1500Updated Feb 20, 2025Feb 20, 2025
    • RFUND

      Public
      [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document …
      02100Updated Dec 4, 2024Dec 4, 2024
    • [EMNLP 2024] TongGu, a classical Chinese language model.
      46560Updated Sep 28, 2024Sep 28, 2024
    • .github

      Public
      0000Updated Jun 4, 2024Jun 4, 2024
    • SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is ra…
      01900Updated Dec 5, 2023Dec 5, 2023
    • Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
      Python
      412600Updated Nov 13, 2023Nov 13, 2023
    • A CNN model builds with Pytorch and reaches 99.7% accuracy
      Python
      3500Updated May 1, 2021May 1, 2021
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.