DocuFilter Solution

DocuFilter is a document text extraction SDK solution that has been tested and proven to be reliable and technical.
It supports most document formats such as MS Office, HancomOffice, Open Office, PDF, EML, MSG, compressed (10 types), and even extract images embedded in documents.

Document filtering process

문서 텍스트 추출 SDK 솔루션

Let's take a look at the key features.

Fast and reliable performance

  • Filtering speeds many times faster than existing commercial products

  • Stable performance backed by years of research and analysis experience

  • Filter large files over 2GB

  • Zero memory leak and exception handling for stability

Identify and extract various document formats

  • Extract text from various document format types

  • Extract image data embedded within documents

  • Detect encrypted document files

  • Identification of DRM-enabled files (10 types)

  • Provides filtering of multiple (10 types, including Alz, Egg, etc.) compressed files

Supports various OS and platforms

  • Supports Windows, Linux 32Bit/64Bit

  • Mobile environment (Android, iOS) available

Easy and convenient interface

  • Provides various interfaces such as C/C++, Java, Python, C#, etc.

  • Provides libraries and executable files suitable for your environment

  • Supports memory and file interfaces

Usage environment

Linux Server
  • Operating system
  • Ubuntu 16.04.3
  • CentOS 7.0
  • RHEL 7.0
  • Kernel 2.6.18.xx or later
  • GCC Lib 2.x ~ 4.x
  • CPU
  • Intel Xeon 4Core or higher
  • Memory
  • Above 8GB
  • Hard disk
  • More than 1TB of free space
  • Interface
  • C/C++, Java, Python, C# etc.
Windows Server
  • Operating system
  • Windows 2003 SP2 or later
  • CPU
  • Intel Xeon 4Core or higher
  • Memory
  • Above 8GB
  • Hard disk
  • More than 500GB of free space
  • Interface
  • C/C++, Java, Python, C# etc.
Windows PC
  • Operating system
  • Windows 7 or later
  • CPU
  • Intel Core i3 2.9GHz or higher
  • Memory
  • Above 4GB
  • Hard disk
  • More than 10GB of free space

Frequently asked questions and answers

Where should I use document filters?

Anywhere you need a preview of document content, such as internal privacy, search, mail, etc. Here are some examples

웹/그룹웨어/E-mail 등의 첨부파일 미리보기
내 외부망 분리에 따른 문서유출 방지
문서 중앙화/개인정보보호 솔루션 연동

What file formats are supported?

Document editors

  • MS Word (97, 2003, 2007, 2010, 2013, 2016)

  • OpenOffice Word Document (ODT)

  • Hancom HWP (2007,2010, 2014), including documentation for distribution

  • Ichitaro

Spreadsheet

  • MS Excel (97, 2003, 2007, 2010, 2013, 2016) - supports xlsb, xlsm

  • OpenOffice Excel Document (ODS)

  • Hancom CELL (2007,2010, 2014)

Presentation

  • MS PowerPoint (97, 2003, 2007, 2010, 2013,2016)

  • OpenOffice PowerPoint Document (ODP)

  • Hancom SHOW (2007,2010, 2014)

Compression

  • Zip, Egg, Alz, gzip, Tar, 7z, gz, rar, tbz, jar

Viewers

  • Portable Document Format(PDF)

  • Electronic PUBlication Format(EPUB)

Text

  • Portable Document Format(PDF)

  • Electronic PUBlication Format(EPUB)

Other

  • Support for Open Office ODF files

  • Added filtering capabilities for embedded OLE object documents

  • Added tag filtering for HTML documents

  • eml, rtf, msg, mp3, mime, chm

  • Files whose file format is unknown but whose internal strings can be extracted

Image extraction formats

  • HWP, DOC, DOCX, XLS, XLSX, PPT, PPTX, PDF

  • ODT, ODS, ODP, MP3