TFG: Visual Document Understanding

TFG: Visual Document Understanding

external-link

Mar 2025

Finished

Bachelor's thesis addressing VDU (Visual Document Understanding): extracting structured data (JSON) from document images. Implements and compares three approaches: (1) Azure OCR + LLM pipeline for prompt-guided extraction, (2) fine-tuning of custom OCR models adapted to the specific task, and (3) DONUT (Document Understanding Transformer by Clova AI), an end-to-end encoder-decoder transformer that directly processes the document image and generates structured output. DONUT is trained with PyTorch Lightning, with custom tokenizer and image processor. Includes ground-truth validation, edit distance metrics, and sync scripts with private company repository. Full containerization with Docker Compose.

Technologies
ai

AI

docker

Docker

huggingface

HuggingFace

jupyter

Jupyter

matplot

Matplot

numpy

NumPy

pandas

Pandas

plotly

Plotly

python

Python

pytorch

PyTorch

https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image.png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (1).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (2).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (3).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (4).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (5).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (6).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (7).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image.png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (1).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (2).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (3).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (4).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (5).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (6).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (7).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image.png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (1).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (2).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (3).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (4).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (5).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (6).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (7).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image.png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (1).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (2).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (3).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (4).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (5).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (6).png
https://raw.githubusercontent.com/MiquelGomezCorral/TFG_Miquel/main/readme-images/image (7).png