Using Command Line tools for advanced find and replace

  • Thread starter Matthew Schwarz
  • Start date
M

Matthew Schwarz

Howdy,

Since I can't find any programs to do what I'd like to do, I tried to
experiment using DOS commands from within WinXP.

I have 1,262 .htm files. Almost all of them have a headline that is
surrounded by the exact same strings --
In the front: <font face="Arial" size="5"><b>
And in the back: </b></font>

Some have something slightly different --
In the front: <b><font face="Arial" size="5">
And in the back: </font></b>

Also, every single one of them has near the top of the file this:
<title>News -- Commander, U.S. 7th Fleet</title>

I'd like to replace the "News -- Commander, U.S. 7th Fleet" with the
headline that is in between the tags I meantioned at the beginning of this
post.

So I tried a quick test using FIND.

Something like: find *.htm "<title>" > "</title>" > test.txt

Voila, everything that began with a <title> and ended with </title> was
input into test.txt, and each file name was included too.

I wasn't too worried that test.txt contained not only what was INSIDE the
<title> tags but the tags themselves, because I can use FrontPage's "find and
replace" later on to clean that up.

When I tried something similar using the font tags I mentioned above,
everything got kind of hairy. I believe it is because strings are supposed to
be identified inside quotes and the tags contain quotes themselves, so FIND
got confused.

Even if it worked I wouldn't know what to do next. I would have a file
called test.txt that contained all the headlines I wanted, but I wouldn't
know how to put them inside the <title> tags.

Can anyone help?

Thank you very much.
 
P

Pegasus \(MVP\)

Matthew Schwarz said:
Howdy,

Since I can't find any programs to do what I'd like to do, I tried to
experiment using DOS commands from within WinXP.

I have 1,262 .htm files. Almost all of them have a headline that is
surrounded by the exact same strings --
In the front: <font face="Arial" size="5"><b>
And in the back: </b></font>

Some have something slightly different --
In the front: <b><font face="Arial" size="5">
And in the back: </font></b>

Also, every single one of them has near the top of the file this:
<title>News -- Commander, U.S. 7th Fleet</title>

I'd like to replace the "News -- Commander, U.S. 7th Fleet" with the
headline that is in between the tags I meantioned at the beginning of this
post.

So I tried a quick test using FIND.

Something like: find *.htm "<title>" > "</title>" > test.txt

Voila, everything that began with a <title> and ended with </title> was
input into test.txt, and each file name was included too.

I wasn't too worried that test.txt contained not only what was INSIDE the
<title> tags but the tags themselves, because I can use FrontPage's "find
and
replace" later on to clean that up.

When I tried something similar using the font tags I mentioned above,
everything got kind of hairy. I believe it is because strings are supposed
to
be identified inside quotes and the tags contain quotes themselves, so
FIND
got confused.

Even if it worked I wouldn't know what to do next. I would have a file
called test.txt that contained all the headlines I wanted, but I wouldn't
know how to put them inside the <title> tags.

Can anyone help?

Thank you very much.

Finding strings in text file isn't too hard under DOS/Windows but replacing
them is almost impossible, especially when the text file contains "poison"
characters such as ", ', <, >, ?, | etc. The standard tool for this task is
SED. You can download it from here: http://sed.sourceforge.net/sedfaq2.html.
Look for the word "DOS binaries". If you object to using third-party
utilities then you could use a script solution, e.g. one based on VB Script.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top