Near duplicate detection in an academic digital library