How to Split Word File into Multiple PDF Files using C#

I came cross this topic when I was browsing codeproject.com, and it aroused my interest in finding a solution. Through a few hours of effort, I got following two solutions based on Free Spire.Office for .NET.

The first solution converts Word document to PDF and splits the PDF document into multiple files using Split() method, however, this solution doesn’t allow to specify names for each generated file. The second solution is provided for someone who also wants to specify file names while splitting.

Note: After installing the library,  the Spire.Doc.dll and Spire.Pdf.dll file should be added as references in your VS project.

Solution 1:


using System;
using Spire.Doc;
using Spire.Pdf;
using System.IO;

namespace SplitWord
{
    class Program
    {
        static void Main(string[] args)
        {
            //load word document
            Document doc = new Document("WordDocument.docx");
            //save to stream in pdf file format
            var stream = new MemoryStream();
            doc.SaveToStream(stream, Spire.Doc.FileFormat.PDF);
            //load pdf from stream
            PdfDocument pdf = new PdfDocument(stream);
            //split pdf to multiple documents based on every single page
            String pattern = "SplitDocument-{0}.pdf";
            pdf.Split(pattern);
            //close documents
            doc.Close();
            pdf.Close();
        }
    }
}

2017-12-06_103437

Solution 2:


using Spire.Doc;
using Spire.Pdf;
using System.IO;


namespace SplitWord
{
    class Program
    {
        static void Main(string[] args)
        {
            //load word document
            Document doc = new Document("WordDocument.docx");
            //save to stream in pdf file format
            var stream = new MemoryStream();
            doc.SaveToStream(stream, Spire.Doc.FileFormat.PDF);
            //load pdf from stream
            PdfDocument pdf = new PdfDocument(stream);
            //get page count
            int pageCount = pdf.Pages.Count;
            PdfPageBase page;
            //define file names
            string[] destFileName = new string[] { "ABC", "DEF", "GHI" };
            for (int i = 0; i < pageCount; i++)
            {
                //create new pdf document
                PdfDocument newPdf = new PdfDocument();
                //add a page to newPdf
                page = newPdf.Pages.Add(pdf.Pages[i].Size, new Spire.Pdf.Graphics.PdfMargins(0));
                //draw content on the page
                pdf.Pages[i].CreateTemplate().Draw(page, new System.Drawing.PointF(0, 0));
                //save file with specified name
                newPdf.SaveToFile(destFileName[i] + ".pdf");
            }
            //close documents
            doc.Close();
            pdf.Close();
        }
    }
}

2017-12-06_115859

Advertisements

Split Word Documents by Page Breaks or Section Breaks in C#

Page breaks and section breaks are two useful features for controlling page layout in MS Word and other desktop publishing programs. Page breaks are used to end a page without filling it with text. Section breaks are used to allow formatting changes (i.e., different margins, page number styles, etc.) in the same document.

Sometimes, we may want to pull the different parts that are separated by page breaks or section breaks out of the whole document storing as several individual files. This article provides two brilliant solutions in C# to split a Word document by page breaks and section breaks.

Split by Page Breaks

using System;
using Spire.Doc;
using Spire.Doc.Documents;

namespace Split_Word_Document_by_Page_Break
{
    class Program
    {
        static void Main(string[] args)
        {
            Document original = new Document();
            original.LoadFromFile("New Zealand.docx");
            Document newWord = new Document();
            Section section = newWord.AddSection();

            int index = 0;
            foreach (Section sec in original.Sections)
            {
                foreach (DocumentObject obj in sec.Body.ChildObjects)
                {
                    if (obj is Paragraph)
                    {
                        Paragraph para = obj as Paragraph;
                        section.Body.ChildObjects.Add(para.Clone());

                        foreach (DocumentObject parobj in para.ChildObjects)
                        {
                            if (parobj is Break && (parobj as Break).BreakType == BreakType.PageBreak)
                            {
                                int i = para.ChildObjects.IndexOf(parobj);
                                section.Body.LastParagraph.ChildObjects.RemoveAt(i);
                                newWord.SaveToFile(String.Format("result/out-{0}.docx", index), FileFormat.Docx);
                                index++;

                                newWord = new Document();
                                section = newWord.AddSection();
                                section.Body.ChildObjects.Add(para.Clone());
                                if (section.Paragraphs[0].ChildObjects.Count == 0)
                                {
                                    section.Body.ChildObjects.RemoveAt(0);
                                }
                                else
                                {
                                    while (i >= 0)
                                    {
                                        section.Paragraphs[0].ChildObjects.RemoveAt(i);
                                        i--;
                                    }
                                }
                            }
                        }
                    }
                    if (obj is Table)
                    {
                        section.Body.ChildObjects.Add(obj.Clone());
                    }
                }
            }
            newWord.SaveToFile(String.Format("result/out-{0}.docx", index), FileFormat.Docx);
        }
    }
}

Split by Section Breaks

using System;
using Spire.Doc;

namespace Split_Word_Document
{
    class Program
    {
        static void Main(string[] args)
        {
            Document document = new Document();
            document.LoadFromFile("Test.doc");
            Document newWord;
            for (int i = 0; i < document.Sections.Count; i++)
            {
                newWord = new Document();
                newWord.Sections.Add(document.Sections[i].Clone());
                newWord.SaveToFile(String.Format(@"test\out_{0}.docx", i));
            }
        }
    }
}

Note: These solutions are relied on free .NET Word Component, you can download it from here and reference the DLL file to your own project.