Convert Normal and Tagged PDF to XML and also Convert PDF to DOCX

**aspose** · June 14th, 2013, 05:00 AM

The long awaited version of Aspose.Pdf for .NET 8.1.0 has been released. This release includes some great features like converting normal PDF document to searchable PDF, converting tagged PDF to XML and the capability to get the dimensions of a source SVG file before its conversion to PDF.

In order to convert a normal image PDF to a searchable PDF file, we need to perform OCR on the image inside the PDF file. For recognition, you may use outer OCR supports HOCR standard. One of the features introduced in this release is PDF to DOCX conversion. The PDF to DOC conversion has been supported for quite some time and now we are very much excited to announce support for PDF to DOCX conversion. A new property named Format has been added in the DocSaveOptions class. This property is used to specify the format of the output document: DOC or DOCX. In order to convert the PDF file to DOCX format, please pass Docx as the value from the DocSaveOptions.DocFormat enumeration. Among these great features, we also have made some improvements to the PDF to HTML, PDF to XPS, PDF to TIFF and PDF to DOC conversion features. Printing PDF documents has also been improved. This release includes plenty of new and improved features as listed below:

- Support PDF to DOCX Conversion
- Convert normal PDF to searchable PDF
- Create Searchable (Indexed) PDF documents
- SVG to PDF - get source SVG dimensions
- Implement Tagged PDF to XML conversion
- PDF Text Extraction Not Respecting White-space is now fixed
- PdfPageStamp should support the feature to set its dimensions
- PDF to HTML Spaces being ignored is now fixed
- Printing the PDF file containing non-English characters, corrupts the contents is now fixed in this release.
- Extra Space issue in Pdf to Html conversion is resolved
- Conversion of pdf to jpg with some details lost is fixed
- PDF to XPS :- Contents are corrected
- Resultant document corruption is resolved, while replacing text using PdfContentEditor class
- PDF to TIFF - Fidelity of resultant TIFF is now improved
- Selecting page range from PDF produces file with size same as input document
- Wrong text extraction is resolved
- First character of every subsequent line in Stamp object is being chopped off is now fixed
- Traditional Chinese characters issue is resolved, while converting PDF to TIFF
- Korean characters are now rendered properly, while converting PDF to TIFF
- PDF to DOC - Header text copied to the bottom of the previous page
- PDF printing issues
- Difference in table,row and cell border in 6.4 and 7.9 is now resolved
- Creating Portfolio feature is improved
- Metadata properties are changed after stamping the document
- Customer need text wrapping feature in TextFragment class
- Stamping and saving scanned PDF failing is corrected.
- Problem with RowSpan for Cell in First Row of Table is resolved
- Multipage Tiff to PDF conversion issue is fixed

Other most recent bug fixes are also included in this release.

Thread: Convert Normal and Tagged PDF to XML and also Convert PDF to DOCX

Thread Tools

Display

Threaded View

Convert Normal and Tagged PDF to XML and also Convert PDF to DOCX

Tags for this Thread

Posting Permissions