rulururu

post Finding text which is not html tag via regular expressions

October 10th, 2009

Filed under: RegExp — Alex Bylim @ 1:10 am

Hi, guys. Today i’v encountered with some insteresnting task. I have html text, for example:

<strong>"Skwak"</strong><p> is an awesome French.<p>

So I wanna to fetch text that which is not html tag, some kind of html tag stripping. Yes, you easily can find all tags and remove them, you can find how to do this in google, there are a lot of links and descriptions. Regular expression for this is pretty simple and looks like this:

<[^>]*>

But what if you want to get “Skwak” or is an awesome French, the regular expression for this will be complex enough. I knew that to implement this i need to use lookahead and lookbehind assertions but could not find quick answer in the internet, so it took me a while to solve this problem, and now I wanna share my solution with you. Here is:

(?:(?!>).(?!([^<]+)?>))
ruldrurd
Powered by WordPress, Web Design by Laurentiu Piron
Entries (RSS) and Comments (RSS)