DocuFilter is a text and image extraction SDK solution that reliably extracts text and image data from a variety of documents and compressed files. It supports most document formats, including MS Office, Hancom Office, OpenOffice, PDF, EML, MSG, and over 10 compressed formats. This allows you to effectively utilize document data in various systems, including document-based search, data analysis, and data loss prevention (DLP).
Key Advantages
01. Proven stability and performance
SDK with proven fast and stable text and image extraction performance compared to existing commercial products
02. Wide range of document format support
Supports text and image extraction from various document formats and compressed formats, including MS Office, PDF, EML/MSG, etc.
03. Image data extraction support
Improving the usability of unstructured data by extracting image data contained within documents.
04. Large file processing
Supports fast and stable text and image extraction even in large document environments exceeding 2GB.
05. Compatibility with various platforms
Supports Windows, Linux, and 32/64-bit environments, allowing for use without platform restrictions.
06. Providing various development interfaces
Easy SDK integration through support for various development language interfaces, including C/C++, Java, Python, and C#.
Core Functions
Extract document text
Accurately extract all text content contained within a document
Extract document images
Extract image data contained in documents
Parsing multiple document formats
Analysis and extraction of various document formats including MS Office, Open Office, PDF, EML, and MSG.
Encryption/DRM document detection
Identify and process encrypted and DRM-enabled documents
Internal processing of compressed files
Analyze and process documents inside various compressed structures such as zip, Egg, and Alz.
Language/format independent extraction
Expand the scope of extraction regardless of the language and special formats within the document.
Analysis Process
Deployment Environment
Preview attached files from web, groupware, and e-mail
Prevent document leakage in separated internal and external networks
Integrates with document centralization and personal data protection solutions