2005.02.14 11:00 AM

Deleting Word Document Sections

[Updated 8/29/05: JL's comment (see below) caused me to look harder at something I only skimmed during my first investigation of Word sections. Specifically, JL pointed out that section headers/footers are not preserved like other section characteristics when programmatically "deleting" the last section in the manner described by my sample code. I don't have time to write up why (I think) this happens, but I did make a slight mod to the sample code to handle the easiest case. See the code's comments for more info.]

The following Microsoft Word document contains three sections:

Let's try to delete the third section.

Move the insertion point to the end of the contents of the second section, just after the section's last paragraph mark but before the section break line that reads "=== Section Break (Next Page) ===", and press the Delete key.

Here's the result:

Oops. We deleted section 2, not section 3.

In my experience, most Word users are surprised by this behavior. Normally, the Delete key removes things ahead of the insertion point, so they naturally expect that pressing the Delete key just before a section break that marks the beginning of the third section should get them a document having two sections, with the second containing the contents of the now deleted third, like so:

Unfortunately, that's not the way sections work in Word.

The section break we lined up behind didn't mark the beginning of section 3 (contrary to its description); it marked the end of section 2. So when we deleted it, we actually deleted the definition of section 2. With section 2 gone, the beginning of section 3 became the end of section 1. And, because the content in section 2, which we deleted, was located after the end of section 1, it became a part of section 3, which, if you're counting, is now section 2. Got it?

I think this odd section handling behavior makes a little more sense if one first tries to understand how Word handles sections internally. Of course, I could be wrong, both about whether it helps and about how they're handled internally, but I'll take a stab.

Many of the entities Word employs to organize and define document content, like sections, pages, paragraphs, and styles, can be described by the following rules:

  1. Entity starts with the document and continues until it is stopped.
  2. Entity can only be stopped by starting another entity of the same type or by the end of the document.
  3. If another entity is started, return to #2 and repeat.

According to these rules, every new Word document should begin life with entities already in place representing a section, page, paragraph, and style. We can confirm this by opening a new Word document and examining its definition from the VBA Immediate window:

?ActiveDocument.Sections.Count
 1 
?ActiveDocument.ActiveWindow.Panes(1).Pages.Count
 1
?ActiveDocument.Paragraphs.Count
 1 
?Selection.Style
Normal

According to the rules, short of deleting the document, it should be impossible to eliminate these first entity instances. In other words, it should be impossible to produce even one of the following results:

?ActiveDocument.Sections.Count
 0 
?ActiveDocument.ActiveWindow.Panes(1).Pages.Count
 0
?ActiveDocument.Paragraphs.Count
 0 
?Selection.Style

Also, according to the rules, these entities should remain in effect until they are terminated by a new entity of the same type or we reach the end of the document.

Finally, the rules make it impossible for there to be gaps between entities. In other words, there can be no content between paragraphs, paragraphs between sections, content between pages, or content without a governing style.

Interestingly, because of these rules, Word does not have to keep track of where in the content each entity begins. It only has to keep track of where each ends, and it doesn't even have to do that for the last entity, as the end of the last one always coincides with the document's end. This means that the data describing these entities can be managed by Word using a simple array of nothing more than the position at which each entity terminates relative to the content (actually, it must also track a pointer to a data structure describing each entity's properties, and the end positions must be updated with the addition and removal of content, but those things aren't important to us right now).

The point is, with only a simple array of end positions, Word can quickly calculate everything it needs to know about these types of entities.

Here's what an array of section entities might look like for a new document (note that I've used a 1-based array for clarity; an end position of -1 denotes the end of the document):

Section    End Position
1          -1

Our array has a single element representing a single section that continues to the end of the document.

If we press Enter a few times, so we're on the fourth line of the document, and insert a new continuous section (ignore for a moment how it looks on screen), our array will look like this:

Section    End Position
1          4
2          -1

The first section was given an actual end position, and a new element was added to the end of the array. Using this information, Word can tell us the following things about the document's sections (for the moment, ignore the fact that the first section is 4 characters long when it clearly only encompasses 3 carriage returns):

?ActiveDocument.Sections.Count
 2 
?ActiveDocument.Sections(1).Range.Start, ActiveDocument.Sections(1).Range.End
 0             4 
?ActiveDocument.Sections(2).Range.Start, ActiveDocument.Sections(2).Range.End
 4             5

Note that there is no gap between the sections. A section's starting position is always calculated by Word as being equal to the prior section's ending position or, if it's the first section, the start of the document (position 0).

Now, if we move up a couple of lines, so we're back in section 1 on line 2, and insert another new continuous section, our array will look like this:

Section    End Position
1          2
2          5
3          -1

The first section's end position was changed to reflect its new point of termination, and a new array element was inserted in the second position to represent the new section. The new section ends at the same position the first one did before it was terminated by the new section (again, ignore the extra character for a moment). The last section's array element did not change, except that it now occupies the third position in the array.

We now have a 3 section document that looks like this:

Using only the information in our array, Word can now tell us the following about the document's sections:

?ActiveDocument.Sections.Count
 3 
?ActiveDocument.Sections(1).Range.Start, ActiveDocument.Sections(1).Range.End
 0             2 
?ActiveDocument.Sections(2).Range.Start, ActiveDocument.Sections(2).Range.End
 2             5 
?ActiveDocument.Sections(3).Range.Start, ActiveDocument.Sections(3).Range.End
 5             6

Now let's address why the first and second sections appear to be one character longer than we know them to be based on what we typed and what we see on screen. First, let's confirm that section 1 is indeed one character longer than expected:

?Len(ActiveDocument.Sections(1).Range.Text)
 2 

Let's see what those two characters are by examining their character codes. Here's the first one:

?AscW(Mid$(ActiveDocument.Sections(1).Range.Text, 1, 1))
 13 

Character code 13, a carriage return, as expected. It's represented on screen with a paragraph mark. Here's the second one:

?AscW(Mid$(ActiveDocument.Sections(1).Range.Text, 2, 1))
 12   

Character code 12, not expected. Seems it was added by Word to represent the end of the section. This explains why everything keeps getting one character longer with each section addition. Normally in Word, character code 12 represents a page break. However, if the position of a character code 12 corresponds with the end position of a section (per the section array), Word treats it like a section break and renders it on screen as a double dotted line.

If we go on and examine the characters in section 2 we will find that it also has a trailing character code 12, like in section 1, making its length 3 instead of 2. However, we won't find a character code 12 at the end of section 3, and its length is reported as just one character long as expected. Why?

Well, if the end of the last section was marked with a character code 12, it would have to come after the carriage return that marks the end of the document, so it would never be accessible or rendered. However, my guess is that it doesn't have one anyway, as it isn't really needed. After all, this is the end of the document. And, more importantly, the last section cannot be deleted, at least not in the sense of deleting a section by deleting its trailing character code 12. To understand why, let's revisit our section array and see what happens when we delete a section.

Here is how we left our sections array:

Section    End Position
1          2
2          5
3          -1

We can adjust this array two different ways to reflect the removal of section 2.

The first way would be to remove the second element from the array and change the end position of section 1 (the first element) to what was the end of section 2, less one character to account for the deleted section's character code 12:

Section    End Position
1          4
2          -1

Using this approach, the contents of what was section 2 would now appear to become part of section 1.

The second way would be to simply remove the second element:

Section    End Position
1          2
2          -1

Using this approach, the contents of what was section 2 would now appear to be part of what was section 3.

As we saw in our first experiment, Word uses the latter approach.

In a strange-except-to-a-programmer kind of way, this approach makes some sense. For one thing, it's easier. But more importantly, it's consistent with Word's approach to managing these types of document organizing entities by their ends instead of their beginnings. Given this approach, one could reasonably expect that when an entity is deleted, the content that precedes it will naturally flow to the next entity's end, which means that the content always becomes part of the entity that follows the one that was deleted.

Unfortunately, Word doesn't consistently adopt this approach for all document organizing entities. For example, suppose you have a document with two paragraphs. The first has before and after spacing of 12 pts, and the second has before and after spacing of 24 pts. If the paragraph marker is deleted from the end of the first paragraph, according to the logic just described, one would expect the contents of the first paragraph to be joined with the second paragraph and to assume the second paragraph's before and after spacing. But it doesn't. Instead, the remaining paragraph reflects the first paragraph's before and after spacing, which means the deletion pulled the content back to the first paragraph. This doesn't mean that Word isn't tracking paragraphs by their ends, but it does indicate that it is managing their arrays and properties a bit differently than with sections.

Anyhow, here's the important implication of the approach adopted by Word for managing section deletions. Because the content previously covered by the deleted section is taken over by the section that follows the deleted section (i.e., the following section's beginning will now be calculated from the end of the section preceding the deleted one), you can never delete the last section.

There always has to be a last section and its end always corresponds with the end of the document. In a multi-section document, the only way to get rid of the last section is to remove its contents, essentially backing the end of the document up until it corresponds with the end of the next-to-last section. Once that happens, the next-to-last section becomes the last section.

Now, let's return to Word's use of character code 12 to mark section ends. As previously noted, that character code represents the end of a section whenever the character code's position in the content corresponds with a section end position in the sections array. This means that when Word renders a double dotted section break line at a section-ending character code 12, what it is really depicting is the end of a section, not the start of a section. If you move the insertion point up to one of those section break lines (i.e., up to the character code 12) and delete it, you are deleting the section that precedes the line, not the one that follows it. When that happens, the section deletion approach described above takes place and the content preceding the deleted line is subsequently covered by the section that followed the line.

Most users could probably live with this behavior, except for one thing. For some reason, Word labels section break lines with information about the sections that follow them, not with information about the sections they actually represent, which are the ones that precede them. This small fudge causes untold grief among Word users.

If Microsoft really wanted Word to depict on screen the beginnings of sections and make these beginnings actionable, why didn't they just do it? Why this half-hearted, inconsistent implementation? It seems to me they could have done the right thing years ago without even having to change how sections are internally managed by Word. But what do I know. I'm not quite sure how they would handle representing the first section's starting point, except to always start documents with a visible section break (in the margin above the heading in print view?). As far as that goes, they don't show the first section's beginning now, even as they try to masquerade every subsequent section ending as a section beginning, so a missing first section break line wouldn't be any worse than the current approach.

Alternatively, they could have just labeled section breaks with information about the sections they actually represent. At least then users would know which sections they were deleting. This would require Microsoft to rework Word's handling and rendering of the last section, which does not end with a character code 12. In addition to having to display a line break at the end of the document where there hasn't been one before (as with the missing section break at the beginning of the first section), the last section's break would also have to be protected from deletion. Or, they could allow it to be deleted by automatically stretching the next-to-last section's end position to the end of the document, assuming there was more than one section to begin with. Or, even better, they could do what Excel does when partial rows and/or columns are deleted and ask the user what they would like done with the content belonging to the deleted section.

By the way, this is what led to this post in the first place. I wanted a way to delete sections, including the last one, in a way that would leave the deleted section's contents covered by the preceding section rather than by the following section. The VBA routine below does this.

When executed, the routine deletes the current section (determined by the position of the insertion point in the content), but not before copying the deleted section's contents to the end of the preceding section. Being the opposite of Word's section deletion approach, the routine allows the deletion of the last section, but disallows deletion of section 1. To use the routine, drop it into Normal.dot and wire it up to a new toolbar icon.

Public Sub DeleteSectionContentBack()

  ' Removes the section containing the insertion point by folding it into the preceding section,
  ' rather than into the next section, which is what Word does by default.  This also makes it
  ' possible to delete the last section.

  Dim SectionNumber As Long
  Dim WorkingRange  As Word.Range
  
  ' Only run if there is nothing selected (to avoid section ambiguity), and the insertion point is in
  ' the body of the document, and there are multiple sections, and we're not in the first section.

  If Selection.Type <> wdSelectionIP Then Exit Sub
  If Selection.StoryType <> wdMainTextStory Then Exit Sub
  If ActiveDocument.Sections.Count = 1 Then Exit Sub
  
  SectionNumber = Selection.Information(wdActiveEndSectionNumber)
  
  If SectionNumber = 1 Then Exit Sub
  
  ' Copy the contents of the section being deleted.  If not the last section, adjust the range end
  ' to exclude the section's trailing character 12.
  
  Set WorkingRange = ActiveDocument.Sections(SectionNumber).Range
  
  If SectionNumber < ActiveDocument.Sections.Count Then
    Call WorkingRange.MoveEnd(wdCharacter, -1)
  End If
  
  Call WorkingRange.Copy
  
  Set WorkingRange = Nothing
  
  ' Paste the copied contents into the previous section at the section's end.
  
  Set WorkingRange = ActiveDocument.Sections(SectionNumber - 1).Range
  
  Call WorkingRange.EndOf(wdSection, wdMove)
  
  Call WorkingRange.Paste
  
  Set WorkingRange = Nothing
  
  ' Delete the original section.  If deleting the last section, adjust the range start to include
  ' the preceding section's trailing character 12.
  
  Set WorkingRange = ActiveDocument.Sections(SectionNumber).Range
  
  If SectionNumber = ActiveDocument.Sections.Count Then
  
    Call WorkingRange.MoveStart(wdCharacter, -1)
  
    ' Unique last section headers and footers aren't replaced by those in the prior section like
    ' other section characteristics (e.g., orientation, paper size, borders) when the last section is
    ' programmatically "deleted".  One way to resolve this is to first void the uniqueness of the last
    ' section's headers and footers by linking them with the prior section's headers and footers.  This
    ' causes the last section's header/footer content to reflect the prior section's header/footer content.
    ' If the prior section's headers and footers are also derived from the section that precedes it (in
    ' other words, it too is linked to its predecessor), then we're done.  However, if the section prior
    ' to the last one is not linked to its predecessor, then the last section, whose headers/footers will
    ' be folded back over the previous section's content, must be unlinked in order to preserve the content
    ' it just adopted from its predecessor when it was (temporarily) linked with it.  The code below does
    ' this for the simplest case where the last section and its predecessor do not incorporate unique
    ' odd/even and/or first page headers and footers.  Much more work would be required to handle all the
    ' possible header and footer combinations in a general way.
  
    ActiveDocument.Sections(SectionNumber).Headers(wdHeaderFooterPrimary).LinkToPrevious = True
    If Not ActiveDocument.Sections(SectionNumber - 1).Headers(wdHeaderFooterPrimary).LinkToPrevious Then
      ActiveDocument.Sections(SectionNumber).Headers(wdHeaderFooterPrimary).LinkToPrevious = False
    End If
    
    ActiveDocument.Sections(SectionNumber).Footers(wdHeaderFooterPrimary).LinkToPrevious = True
    If Not ActiveDocument.Sections(SectionNumber - 1).Footers(wdHeaderFooterPrimary).LinkToPrevious Then
      ActiveDocument.Sections(SectionNumber).Footers(wdHeaderFooterPrimary).LinkToPrevious = False
    End If
    
  End If
  
  Call WorkingRange.Delete
  
  Set WorkingRange = Nothing

End Sub


Comments

I forgot to mention what Microsoft has to say about Word sections:

http://support.microsoft.com/?kbid=291184

And about deleting them:

http://support.microsoft.com/?kbid=303333

ewbi.develops | 2005.02.15 05:15 PM

For some reason the details about how Word handles Entities causes my eyes to gloss over, but since I think this odd, not so intutitive behavior is one of the weak points in Word, I am trying to get through it. In your first example of this post, you showed the strange deleting behavior by having a continuous section break (csb) followed by a newpage secton break (nsb), and then deleting the NSB. Well, I tried 4 sections NSB1, CSB, NSB2, then deleted the NSB2, with some strange results, in that 2 SB were removed. I also tried your original experiment with the NSB, CSB, and deleted the CSB, restored it then deleted the NSB, in both cases only a CSB remained. Maybe as I study your explaination, this is covered, but it seems odder than you explain :)

Darnley | 2005.02.23 09:05 AM

this is an incredible good article that solves the problem of section breaks

| 2005.03.13 12:38 PM

Thanks for saying so. I'm afraid, though, that the "problem of section breaks" is probably too great for a single post like this. I only hoped to explore them more and write about the way in which I imagine they are handled in an effort to improve my own Word-ability. Hopefully the routine I provided will help some folks, though. Thanks again.

ewbi.develops | 2005.03.13 06:04 PM

Hey Darnley,

You and I have talked about this so I just wanted to note that, in fact, you did get the same results as me. However, I can't argue with your perception that Word section handling is likely "odder" than I explain. ;)

ewbi.develops | 2005.03.13 06:06 PM

Dear ewbi.develops:

Excellent article and most helpful in elucidating one of the trickier aspects of the Word object model. I found your site in searching for a solution to the following issue, which is a slight variation on the problem you solve and I'm wondering if you have some insight into it. I'm using Word XP (2002):

Create a document with multiple sections (as per your example), and create different footers or headers and other section-specific parameters in the last section (something obviously different, like a graphic in the footer).

Now suppose you want to delete the LAST section. I don't want to preserve the text that is in it, I'm trying to get rid of a portion of the document that includes this (old) last section. I want the NEW document's last section to be the old document's second-to-last section (or any section other than the last). Your example is easily modified to disable the copying of the text in the last section so that both section and text is removed.

However, when I delete the last section, the NEW last section takes on the properties of the section I just deleted! The end-of-document paragraph mark stores the properties of the section, and it can't be deleted, it just moves to the end of the new doc with all its baggage. I can't figure out a way to easily delete it or get it to assume all the properties of the second-to-last section (including header and footer settings). Copying the "good" section break and inserting it at the end of the doc leaves an unwanted blank page at the end of the doc, that last paragraph is still the "bad" section break.

For now I am resorting to redefining the section via VBA, but it seems like there should be an easier way to get a given section break to convert to an end-of-document section break.

JL@Farpoint | 2005.08.29 08:15 AM

JL,

Thanks for taking the time to describe this so clearly. I didn't notice that section headers/footers don't behave in the same way as all other section-related characteristics when "deleting" sections in the way I described. After another quick look I don't see there being a simple general solution, but there are some fairly simple solutions for specific cases. I've modified the post above (including the VBA code) to reflect one approach. If I get time I'll try to explain further.

Thanks again!

ewbi.develops | 2005.08.29 12:34 PM

Dear
How Can i Get the Count of header and fotter of the word Document. i want to caount he no of character does the footer and header has.
plz mail me to nagu_anagani@yahoo.com
Regards
nagarjuna

nagarjuna | 2005.11.14 11:49 PM

Nagarjuna,

If I understand what you're asking, it should be as easy as this:

ThisDocument.StoryRanges(WhichStoryRange).Characters.Count

Where WhichStoryRange can be any of the following WdStoryType enumerated constants:

wdPrimaryHeaderStory
wdPrimaryFooterStory
wdFirstPageHeaderStory
wdFirstPageFooterStory
wdEvenPagesHeaderStory
wdEvenPagesFooterStory

Hope that helps.

ewbi.develops | 2005.11.28 05:47 PM

How to remove my End of unprotected section of word document.

ZAKHI | 2005.12.22 06:54 PM

ZAKHI,

Not sure I follow your question/point. Can you be more specific?

ewbi.develops | 2005.12.22 11:37 PM

I'm having a vexing problem that perhaps you could help with.
I simply want to programatically create a Word document with 3 sections each with a separate heading. When I use the code generated by Word's record macro function, it does not work, messing up the headings to be above the wrong sections.

Help would be appreciated!


Sub Macro27()
'
' Macro27 Macro
' Macro recorded 2006-09-08 by Administrator
'
Dim SectionNumber As Long

If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
ActiveWindow.Panes(2).Close
End If
If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
ActivePane.View.Type = wdOutlineView Then
ActiveWindow.ActivePane.View.Type = wdPrintView
End If
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
Selection.TypeText Text:="heading 1"
ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument
Selection.TypeText Text:="text 1"
Selection.TypeParagraph
Selection.InsertBreak Type:=wdSectionBreakNextPage

'SectionNumber = Selection.Information(wdActiveEndSectionNumber)
'ActiveDocument.Sections(SectionNumber).Headers(wdHeaderFooterPrimary).LinkToPrevious = False
'If Not ActiveDocument.Sections(SectionNumber - 1).Headers(wdHeaderFooterPrimary).LinkToPrevious Then
'ActiveDocument.Sections(SectionNumber).Headers(wdHeaderFooterPrimary).LinkToPrevious = False
'End If

If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
ActiveWindow.Panes(2).Close
End If
If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
ActivePane.View.Type = wdOutlineView Then
ActiveWindow.ActivePane.View.Type = wdPrintView
End If
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
Selection.HeaderFooter.LinkToPrevious = False
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
Selection.Delete Unit:=wdCharacter, Count:=1
Selection.TypeText Text:="heading 2"
ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument


Selection.TypeText Text:="text 2"
Selection.TypeParagraph
Selection.InsertBreak Type:=wdSectionBreakNextPage

'SectionNumber = Selection.Information(wdActiveEndSectionNumber)
'ActiveDocument.Sections(SectionNumber).Headers(wdHeaderFooterPrimary).LinkToPrevious = False
'If Not ActiveDocument.Sections(SectionNumber - 1).Headers(wdHeaderFooterPrimary).LinkToPrevious Then
'ActiveDocument.Sections(SectionNumber).Headers(wdHeaderFooterPrimary).LinkToPrevious = False
'End If

If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
ActiveWindow.Panes(2).Close
End If
If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
ActivePane.View.Type = wdOutlineView Then
ActiveWindow.ActivePane.View.Type = wdPrintView
End If
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
Selection.HeaderFooter.LinkToPrevious = Not Selection.HeaderFooter. _
LinkToPrevious
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
Selection.Delete Unit:=wdCharacter, Count:=1
Selection.TypeText Text:="Heading 3"
ActiveWindow.ActivePane.View.SeekView = wdSeekMainDocument
Selection.TypeText Text:="text 3"
Selection.TypeParagraph

End Sub

James Hunter | 2006.09.08 11:55 AM

Hi James, I've addressed your question in a new post:

http://ewbi.blogs.com/develops/2006/09/programmatic_cr.html

Hope it helps. Good luck!

ewbi.develops | 2006.09.12 09:43 AM

Sorry for a bit offtopic, but I need help in creating a macros. Could you please give me a hint? The problem is as following. I've got a text, which consists of several parts, separated by a page break. For each part in this document I have to
1) add an alpha-numeric anchor to the first paragraph of it, make it bold and increase its font size;
2) copy the second and the first paragraphs of each of these parts to another word document (sort of a contents list)
The question is if there are any convenient ways to address those parts, separated by page breaks?
I've been looking for it for the whole evening - all in vain. Do not know what to start with.

Redkaa | 2008.07.17 02:14 PM

Good Post.
My question is how to delete everything after a particular section using macro.I don't need them more.

Hope for the reply

Manoj | 2009.09.23 12:38 AM

Hi Manoj,

I'm afraid I don't have time to code something to share regarding your request, but I'm pretty sure everything you need to do this can be found in the code above. That code includes logic for determining the current section, for deleting sections, and for sliding section formatting info forward/back, if necessary. I'll add this to my list, but can't be sure when I might get it done.

Good luck!

ewbi.develops | 2009.09.23 09:05 AM


TrackBack

TrackBack URL:  http://www.typepad.com/services/trackback/6a00d8341c7bd453ef00d83436d6c653ef

Listed below are links to weblogs that reference Deleting Word Document Sections: