PDF Parser

This project is a PDF parser that allows you to extract text from PDF exam file and return the PaperExam ID.

Installation

To use this project, follow these steps:

Prepare the PDF exam and save it in a folder, for example: pdf_parser/pdf_exams/exam_001/exam.pdf
Run the following command, wait for the text extraction process, finally you will get an ID of PaperExam (a system of storing structured exam).

python scripts/pipeline.py --pdf_file_path `path to pdf file` --prompt_collection_path `path to prompt collection file`

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
pdf_parser		pdf_parser
scripts		scripts
.env.dev		.env.dev
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml