Count characters in each paragraph and return result to top of eac

G

Guest

Thanks in advance for any suggestions. I have 3.5 million characters in a single document. Originally there were no spaces between the characters but I have used the find/replace function to locate a specific 6 character string and insert a line break and new paragraph (replace: ^l^p) after the first letter of said 6 character string. Now I have 582 paragraphs, each with a different number of characters. What I need to know is how to return the number of characters in each paragraph to a header line just above each paragraph (more precisely the value will be used as part of a unique identifier for each 'paragraph' located within a header line). See below for an illustration of the problem:

Original sequence:
ATGCGGTGATTCGGCTATAGCTAGGGATACGGTAGGCT...

Specific 6 character string (find/replace with: ^l^p):
GTGCAC

Resulting 'paragraphs':
ATGCGGTGATTCG

TGCACAGCTG
TGCACACGGTAGGCT...

Thanks again. PRM.
 
J

Jean-Guy Marcil

Bonjour,

Dans son message, < tiger_PRM > écrivait :
In this message, < tiger_PRM > wrote:

|| Thanks in advance for any suggestions. I have 3.5 million characters in
a single document.
|| Originally there were no spaces between the characters but I have used
the find/replace function
|| to locate a specific 6 character string and insert a line break and new
paragraph (replace:
|| ^l^p) after the first letter of said 6 character string. Now I have 582
paragraphs, each with a
|| different number of characters. What I need to know is how to return the
number of characters
|| in each paragraph to a header line just above each paragraph (more
precisely the value will be
|| used as part of a unique identifier for each 'paragraph' located within a
header line). See
|| below for an illustration of the problem:
||
|| Original sequence:
|| ATGCGGTGATTCGGCTATAGCTAGGGATACGGTAGGCT...
||
|| Specific 6 character string (find/replace with: ^l^p):
|| GTGCAC
||
|| Resulting 'paragraphs':
|||
|| ATGCGGTGATTCG
||
|||
|| TGCACAGCTG
||
|||
|| TGCACACGGTAGGCT...
||

Why did you do a find/replace to add ^l^p? Why not just ^p? Was it to make
sure that your new paragraphs were properly spaced? If so, then it was not
necessary.
First, I would do a find/replace to remove all ^l and replace them with
nothing.
Then I would select all the paragraphs and either apply a previously created
style, or, use the Normal style and add a space after each paragraph (Format
Paragraphs... >Space after 12 points). Use the amount of space you find
aesthetic.

Once this is done, I would format the Heading 1 style to be as I want it fro
my "to-be-inserted new heading paragraphs before each of the half DNA
sequences (I don't remember how you call those...).

Then I would run this code to do the job. All told, maybe a big 3-minute
job... OK, 5 if you are not familiar with styles...

'_______________________________________
Sub CharacterCount()

Dim oPara As Paragraph
Dim oParaRg As Range
Dim CaraCount As Long

Application.ScreenUpdating = False

For Each oPara In ActiveDocument.Paragraphs
With oPara
Set oParaRg = .Range
'The "-1" in the next line is to remove each paragraph mark
'from the count... yes, it is counted!
'If you decided to keep all the ^l, then change it to a 2
CaraCount = oParaRg.Characters.Count - 1
oParaRg.InsertBefore CStr(CaraCount) & Chr(13)
Set oParaRg = oParaRg.Paragraphs(1).Range
oParaRg.Style = wdStyleHeading1
End With
Next oPara

Application.ScreenRefresh
Application.ScreenUpdating = True

End Sub
'_______________________________________

Extra note to anyone reading this:

I figured that the procedure below would be slower. But, out of curiosity, I
wrote the code and timed it. I found that I had to add the initial number of
paragraphs to
For i = 1 To .Paragraphs.Count Step 2
to get
For i = 1 To .Paragraphs.Count + ParaCount Step 2
or not all paragraphs would be processed.

My first instinct was to have
For i = 1 To .Paragraphs.Count + 2 Step 2
because I am adding paragraphs with the procedure, I thought this would be
enough (By stepping through the code I saw that .Paragraphs.Count is
incremented dynamically during the process, if it wasn't then I would
understand).

Is it because even though .Paragraphs.Count is dynamically incremented, the
line
For i = 1 To .Paragraphs.Count Step 2
is processed with the initial value of .Paragraphs.Count at each pass?

'_______________________________________
Sub CharacterCount()

Dim i As Long
Dim ParaCount As Long
Dim oParaRg As Range
Dim CaraCount As Long

Application.ScreenUpdating = False

With ActiveDocument
ParaCount = .Paragraphs.Count
For i = 1 To .Paragraphs.Count + ParaCount Step 2
Set oParaRg = .Paragraphs(i).Range
'The "-1" in the next line is to remove each paragraph mark
'from the count... yes, it is counted!
'If you decided to keep all the ^l, then change it to a 2
CaraCount = oParaRg.Characters.Count - 1
oParaRg.InsertBefore CStr(CaraCount) & Chr(13)
Set oParaRg = oParaRg.Paragraphs(1).Range
oParaRg.Style = wdStyleHeading1
Next i
End With

Application.ScreenRefresh
Application.ScreenUpdating = True

End Sub
'_______________________________________

--
Salut!
_______________________________________
ean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top