How to Extract Text from Word Document with Silverlight

Generally speaking, we will format text in Word document to make document appearance more appealed. However, we may find that unknown characters appear and format disappears when we copy contents in Word to blog. At that time, plain text is much more useful than formatted text in Word. So, we need to extract text from Word.

In this post, I will share the method about how to extract text from Word and save text in notepad with Silverlight. In order to realize this function more easily and quickly, I use a component, Spire.XLS for Silverlight. So you can find that I have added its dll file as reference in my project.

STEPS

Step 1. Design UserControl

Rename MainPage.xaml as TextExtraction.xaml. Double click it to design UserControl. Firstly, add a label. Change its contents as topic I will talk “Extract Text from Word”. Set contents format, including font style, size and color. Secondly, add a button to run. Finally, set background image for whole UserControl.

Step 2. Declare SaveFileDialog

Declare a SaveFileDialog to save text which will be extracted. Then, set SaveFileDialog filter. Because I want to save text in notepad, so the filter format is set as .txt.

Step 3. Load Document

Right click project name to add existed item (Word document). Change its Build Action as Embedded Resource. Double click run button and write code. Declare a document. Then, use foreach sentence to get name string in assembly. If the name is the same as embedded resource name, load this document.

Step 4. Extract Text

Declare a StringBuilder for saving text at first. Then, use foreach sentence to get each section in document and use another foreach sentence to get each paragraph in section. After that, use sb.AppendLine() method to get text in document.

Step 5. Save Text

Firstly, convert StringBuilder to byte. Secondly, judge if the SaveFileDialog can pop up. Thirdly, if the result is true, use steram.Write() method to save text. Three parameter passed to this method are byte buffer, int offset and int count.

Full Extract.xaml

using System;

using System.Windows;

using System.Windows.Controls;

using System.Reflection;

using System.IO;

using Spire.Doc;

using Spire.Doc.Documents;

using System.Text;

 

namespace ExtractText

{

    public partial class MainPage : UserControl

    {

        //Declare SaveFileDialog

        private SaveFileDialog saveFiledialog = new SaveFileDialog();

        public MainPage()

        {

            InitializeComponent();

            this.saveFiledialog.Filter = “Plain Text (*.txt)|*.txt”;

        }

 

        private void button1_Click(object sender, RoutedEventArgs e)

        {

            //Load Document

            Document document = new Document();

            Assembly assembly = this.GetType().Assembly;

            foreach (String name in assembly.GetManifestResourceNames())

            {

                if (name.EndsWith(“Antarctic.docx”))

                {

                    using (Stream docStream = assembly.GetManifestResourceStream(name))

                    {

                        document = new Document(docStream, FileFormat.Docx);

                    }

                }

            }

 

            //Extract Text

            StringBuilder sb = new StringBuilder();

 

            foreach (Section section in document.Sections)

            {

                foreach (Paragraph paragraph in section.Paragraphs)

                {

                    sb.AppendLine(paragraph.Text);

                }

            }

 

            //Save Text

            Byte[] bytes= (new UnicodeEncoding()).GetBytes(sb.ToString());

            bool? result = this.saveFiledialog.ShowDialog();

            if (result.HasValue && result.Value)

            {

                using (Stream stream = this.saveFiledialog.OpenFile())

                {

                    stream.Write(bytes, 0, bytes.Length);

                }

            }

        }

    }

}

Result

 ______________________________________________________________________________________________________

Click Here to LEARN MORE about Spire.Doc for Silverlight

Click Here to DOWNLOAD Spire.Doc for Silverlight

Advertisements

Author: janewdaisy

.NET Program Beginner. Share methods about how to use C#/VB.NET to export data, operate Word, Excel, PDF and other useful skills.

1 thought on “How to Extract Text from Word Document with Silverlight”

  1. Thank you for this post. I have being struggling with how to read word documents in Silverlight for over a week now. And your post has saved the day. Thanks again and keep up the good work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s