Re: How to remove duplicate words from text

Giganews Newsgroups
Subject: Re: How to remove duplicate words from text
Posted by:  Mark Szlazak (mszlaz…@aol.com)
Date: 23 Jul 2003

Now for a change in pace, see if this or a similar regex solution will
fit your needs.

        inText = 'Aberdeen Aberdeen Aberdeen Edinburg Edinburg Inverness
etc.';
        rxDoubled = /\b(\w+)(?:(\s+)\1\b)+/g
        outText = inText.replace(rxDoubled,'$1$2');

If the non-capture group (?: is not supported, change the regex and
ordinals to:

        rxDoubled = /\b(\w+)((\s+)\1\b)+/g
        outText = inText.replace(rxDoubled,'$1$3');

Also, if you need other non-word-character stuff in between like a tag
then replace (\s+) with ((?:\s|<[^>]+>)+)

*** Sent via Developersdexhttp://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Replies

None

In response to

How to remove duplicate words from text posted by carlit…@websurfer.co.za (Voetleuce en fênsievry) on 22 Jul 2003