In this article, you will learn how Cleardox can automatically OCR process your documents, making them searchable and ensuring efficient redaction. This includes understanding the benefits and limitations of OCR.
What is OCR?
OCR stands for optical character recognition, a technology that makes your PDF searchable if there is no text in it. Cleardox automatically performs OCR processing when you upload documents.
To increase performance, Cleardox OCR processes only the pages that need it. We can determine if there is text content in the document or not. This increases speed and is environmentally friendly (since OCR processing is resource-intensive).
Important Considerations
You should be aware that OCR-processed documents carry a higher risk of data leakage than pure text PDFs (e.g., PDFs generated from a Word file). Even the best OCR processor in the world may struggle to identify sensitive content in poor-quality documents (e.g., handwritten or heavily scanned documents).
Therefore, you may need to manually anonymize the content. Learn how to manually anonymize content here.
To highlight this, Cleardox flags all OCR-processed pages so you know where to be particularly careful.
Downloading OCR-Processed Documents
For security reasons, we offer you different options when downloading an already OCR-processed document. As a user, you can choose to keep the existing OCR processing. However, since there can be differences between what you see in the PDF and the underlying hidden text (which may be misaligned), we recommend downloading with Cleardox OCR. Alternatively, you can choose to download as an image, but then the PDF is no longer searchable.
Summary
In this article, you have learned that Cleardox can automatically OCR process your documents, making them searchable. You now understand the benefits and limitations of OCR, and how to handle OCR-processed documents securely. Cleardox ensures efficient and environmentally friendly processing, but always stay vigilant with OCR-processed pages for potential data leaks.