Key Libraries Employed
DataFrame Creation
Pandas
Text Extraction
Regex
PDF Reading & Extraction
PDF Query
Robust. Responsive. Reliable.
An efficient, error-minimizing PDF parser.
250+ lines.
PDF Extraction
- We employ PDF Query to extract information from 9000+ CommonApp student applications, each having 15+ pages.
- 213,000,000+ words extracted.


Text Extraction
- We employ Regex to generate specific patterns to extract key words to be later inputted into a data frame.
- 78+ Regex patterns written.
DataFrame Creator
- We employ Pandas to ready a dataframe for final delivery.
- 101 Columns.

Due to privacy concerns, the final sheet cannot be shown. I implore you to place your trust in me instead, and perhaps now visit the gallery!
Excel Converter
- We employ Pandas to convert dataframe to Excel for delivery.
- 9000+ PDFs =>9000+ Rows