What is strip_tags in php

(PHP 4, PHP 5, PHP 7, PHP 8)

Nội dung chính Show

Which function is used to remove all HTML tags?
How do you remove tags in HTML?
How do I remove a string in HTML?
How do you remove HTML tags in Python?

strip_tags — Strip HTML and PHP tags from a string

mariusz.tarnaski at wp dot pl ¶

13 years ago

Hi. I made a function that removes the HTML tags along with their contents:

Function: function strip_tags_content($text, $tags = '', $invert = FALSE) { preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags); $tags = array_unique($tags[1]);

if(

is_array($tags) AND count($tags) > 0) {

    if($invert == FALSE) {

      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?@si', '', $text);

    }

    else {

      return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?@si', '', $text);

    }

  }

  elseif($invert == FALSE) {

    return preg_replace('@<(\w+)\b.*?>.*?@si', '', $text);

  }

  return $text;

}

?>



Sample text:

$text = 'sample text with tags';

Result for strip_tags($text):

sample text with tags

Result for strip_tags_content($text):

 text with

Result for strip_tags_content($text, ''):

sample text with

Result for strip_tags_content($text, '', TRUE);

 text with 
tags

I hope that someone is useful :)

bzplan at web dot de ¶

9 years ago

a HTML code like this:

$html = '

color is blue size is huge material is wood

'; ?> with = strip_tags($html); ?> ... the result is:
$str = 'color is bluesize is huge material is wood';
notice: the words 'blue' and 'size' grow together :( and line-breaks are still in new string $str
if you need a space between the words (and without line-break) use my function: = rip_tags($html); ?> ... the result is:
$str = 'color is blue size is huge material is wood';
the function:
// -------------------------------------------------------------- function rip_tags($string) { // ----- remove HTML TAGs ----- $string = preg_replace ('/<[^>]*>/', ' ', $string); // ----- remove control characters ----- $string = str_replace("\r", '', $string); // --- replace with empty space $string = str_replace("\n", ' ', $string); // --- replace with space $string = str_replace("\t", ' ', $string); // --- replace with space
// ----- remove multiple spaces -----
$string = trim(preg_replace('/ {2,}/', ' ', $string)); return $string; }// -------------------------------------------------------------- ?> the KEY is the regex pattern: '/<[^>]*>/' instead of strip_tags() ... then remove control characters and multiple spaces :)
doug at exploittheweb dot com ¶
7 years ago
"5.3.4 strip_tags() no longer strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags."
This is poorly worded.
The above seems to be saying that, since 5.3.4, if you don't specify " " in allowable_tags then " " will not be stripped... but that's not actually what they're trying to say.
What it means is, in versions prior to 5.3.4, it "strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags", and that since 5.3.4 this is no longer the case.
So what reads as "no longer strips self-closing tags (unless the self-closing XHTML tag is also given in allowable_tags)" is actually saying "no longer (strips self-closing tags unless the self-closing XHTML tag is also given in allowable_tags)".

i.e.
pre-5.3.4: strip_tags('Hello World ',' ') => 'Hello World ' // strips because it wasn't explicitly specified in allowable_tags
5.3.4 and later: strip_tags('Hello World ',' ') => 'Hello World ' // does not strip because PHP matches it with in allowable_tags
Dr. Gianluigi "Zane" Zanettini ¶
6 years ago
A word of caution. strip_tags() can actually be used for input validation as long as you remove ANY tag. As soon as you accept a single tag (2nd parameter), you are opening up a security hole such as this:
Plus: regexing away attributes or code block is really not the right solution. For effective input validation when using strip_tags() with even a single tag accepted, http://htmlpurifier.org/ is the way to go.
stever at starburstpublishing dot com dot au ¶
6 years ago
Since strip_tags does not remove attributes and thus creates a potential XSS security hole, here is a small function I wrote to allow only specific tags with specific attributes and strip all other tags and attributes.
If you only allow formatting tags such as b, i, and p, and styling attributes such as class, id and style, this will strip all javascript including event triggers in formatting tags.
Note that allowing anchor tags or href attributes opens another potential security hole that this solution won't protect against. You'll need more comprehensive protection if you plan to allow links in your text.
function stripUnwantedTagsAndAttrs($html_str){ $xml = new DOMDocument(); //Suppress warnings: proper error handling is beyond scope of example libxml_use_internal_errors(true); //List the tags you want to allow here, NOTE you MUST allow html and body otherwise entire string will be cleared $allowed_tags = array("html", "body", "b", "br", "em", "hr", "i", "li", "ol", "p", "s", "span", "table", "tr", "td", "u", "ul"); //List the attributes you want to allow here $allowed_attrs = array ("class", "id", "style"); if (!strlen($html_str)){return false;} if ($xml->loadHTML($html_str, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)){ foreach ($xml->getElementsByTagName("*") as $tag){ if (!in_array($tag->tagName, $allowed_tags)){ $tag->parentNode->removeChild($tag); }else{ foreach ($tag->attributes as $attr){ if (!in_array($attr->nodeName, $allowed_attrs)){ $tag->removeAttribute($attr->nodeName); } } } } } return $xml->saveHTML(); } ?>
abe ¶
1 year ago
Note, strip_tags will remove anything looking like a tag - not just tags - i.e. if you have tags in attributes then they may be removed too,
e.g.
$test='
xyz
'; $echo strip_tags($test, "
");will result in <div a="abc bdef/b hij" b="1">x<b>yb>zdiv>

roger dot keulen at vaimo dot com ¶
3 years ago
https://bugs.php.net/bug.php?id=78346
After upgrading from v7.3.3 to v7.3.7 it appears nested "php tags" inside a string are no longer being stripped correctly by strip_tags().
This is still working in v7.3.3, v7.2 & v7.1. I've added a simple test below.
Test script: --------------- $str = '\' ?>2'; var_dump(strip_tags($str));Expected result: ---------------- string(1) "2"Actual result: -------------- string(5) "' ?>2"
CEO at CarPool2Camp dot org ¶

13 years ago
Note the different outputs from different versions of the same tag:
// striptags.php $data = ' Each New Line'; $new = strip_tags($data, ' '); var_dump($new); // OUTPUTS string(21) " EachNew Line" php // striptags.php $data = ' Each New Line'; $new = strip_tags($data, ' '); var_dump($new); // OUTPUTS string(16) "Each NewLine" php // striptags.php $data = ' Each New Line'; $new = strip_tags($data, ' '); var_dump($new); // OUTPUTS string(11) "EachNewLine" ?>
Trititaty ¶
6 years ago
Features: * allowable tags (as in strip_tags), * optional stripping attributes of the allowable tags, * optional comment preserving, * deleting broken and unclosed tags and comments, * optional callback function call for every piece processed allowing for flexible replacements.
function better_strip_tags( $str, $allowable_tags = '', $strip_attrs = false, $preserve_comments = false, callable $callback = null ) { $allowable_tags = array_map( 'strtolower', array_filter( // lowercase preg_split( '/(?:>|^)\\s*(?:<|$)/', $allowable_tags, -1, PREG_SPLIT_NO_EMPTY ), // get tag names function( $tag ) { return preg_match( '/^[a-z][a-z0-9_]*$/i', $tag ); } // filter broken ) ); $comments_and_stuff = preg_split( '/(|$))/', $str, -1, PREG_SPLIT_DELIM_CAPTURE ); foreach ( $comments_and_stuff as $i => $comment_or_stuff ) { if ( $i % 2 ) { // html comment if ( !( $preserve_comments && preg_match( '//', $comment_or_stuff ) ) ) { $comments_and_stuff[$i] = ''; } } else { // stuff between comments $tags_and_text = preg_split( "/(<(?:[^>\"']++|\"[^\"]*+(?:\"|$)|'[^']*+(?:'|$))*(?:>|$))/", $comment_or_stuff, -1, PREG_SPLIT_DELIM_CAPTURE ); foreach ( $tags_and_text as $j => $tag_or_text ) { $is_broken = false; $is_allowable = true; $result = $tag_or_text; if ( $j % 2 ) { // tag if ( preg_match( "%^(\"'/]++|/+?|\"[^\"]*\"|'[^']*')*?(/?>)%i", $tag_or_text, $matches ) ) { $tag = strtolower( $matches[2] ); if ( in_array( $tag, $allowable_tags ) ) { if ( $strip_attrs ) { $opening = $matches[1]; $closing = ( $opening === ') ? '>' : $closing; $result = $opening . $tag . $closing; } } else { $is_allowable = false; $result = ''; } } else { $is_broken = true; $result = ''; } } else { // text $tag = false; } if ( !$is_broken && isset( $callback ) ) { // allow result modification call_user_func_array( $callback, array( &$result, $tag_or_text, $tag, $is_allowable ) ); } $tags_and_text[$j] = $result; } $comments_and_stuff[$i] = implode( '', $tags_and_text ); } } $str = implode( '', $comments_and_stuff ); return $str; } ?> Callback arguments: * &$result: contains text to be placed insted of original piece (e.g. empty string for forbidden tags), it can be changed; * $tag_or_text: original piece of text or a tag (see below); * $tag: false for text between tags, lowercase tag name for tags; * $is_allowable: boolean telling if a tag isn't allowed (to avoid double checking), always true for text between tags Callback function isn't called for comments and broken tags.
Caution: the function doesn't fully validate tags (the more so HTML itself), it just force strips those obviously broken (in addition to stripping forbidden tags). If you want to get valid tags then use strip_attrs option, though it doesn't guarantee tags are balanced or used in the appropriate context. For complex logic consider using DOM parser.
Anonymous ¶
5 years ago
Just bzplan's function with the option to choose what tags are replaced for
function rip_tags($string, $rep = ' ') {
// ----- remove HTML TAGs ----- $string = preg_replace ('/<[^>]*>/', $rep, $string);
// ----- remove control characters ----- $string = str_replace("\r", '', $string); // --- replace with empty space $string = str_replace("\n", $rep, $string); // --- replace with space $string = str_replace("\t", $rep, $string); // --- replace with space
// ----- remove multiple spaces ----- $string = trim(preg_replace('/ {2,}/', $rep, $string));
return $string;
}
D Mo ¶
4 years ago
When process a bulk of strings, the stripping of tags including their content on basis of regular expression is very slow. This function may help:
/** * Removes passed tags with their content. * * @param array $tagsToRemove List of tags to remove * @param $haystack String to cleanup * @return string */ function removeTagsWithTheirContent(array $tagsToRemove, $haystack) { $currTag = ''; $currPos = false;$initSearch = function (&$currTag, &$currPos, $tagsToRemove, $haystack) { $currTag = ''; $currPos = false; foreach ($tagsToRemove as $tag) { $tempPos = stripos($haystack, '<'.$tag); if ($tempPos !== false && ($currPos === false || $tempPos < $currPos)) { $currPos = $tempPos; $currTag = $tag; } } };$substri_count = function ($haystack, $needle, $offset, $length) { $haystack = strtolower($haystack); return substr_count($haystack, $needle, $offset, $length); };$initSearch($currTag, $currPos, $tagsToRemove, $haystack); while ($currPos !== false) { $minTagLength = strlen($currTag) + 2; $tempPos = $currPos + $minTagLength; $tagEndPos = stripos($haystack, '.$currTag.'>', $tempPos); // process nested tags if ($tagEndPos !== false) { $nestedCount = $substri_count($haystack, '<' . $currTag, $tempPos, $tagEndPos - $tempPos);
for (
$i = $nestedCount; $i > 0; $i--) { $lastValidPos = $tagEndPos; $tagEndPos = stripos($haystack, '. $currTag . '>', $tagEndPos + 1); if ($tagEndPos === false) { $tagEndPos = $lastValidPos; break; } } } if ($tagEndPos === false) { // invalid html, end search for current tag $tagsToRemove = array_diff($tagsToRemove, [$currTag]); } else { // remove current tag with its content $haystack = substr($haystack, 0, $currPos) // get string after "" .substr($haystack, $tagEndPos + strlen($currTag) + 3); }$initSearch($currTag, $currPos, $tagsToRemove, $haystack); } return $haystack; } ?>
cesar at nixar dot org ¶
16 years ago
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.
function strip_tags_deep($value) { return is_array($value) ? array_map('strip_tags_deep', $value) : strip_tags($value); }// Example $array = array('Foo', 'Bar', array('Foo', 'Bar')); $array = strip_tags_deep($array);// Output print_r($array); ?>

obeyer at popsugar dot com ¶
8 years ago
actually, for PHP 5.4.19, if you want to add line breaks to allowable tags, you should use " ". Both and in allowable tags won't do anything, and line breaks will be stripped
fernando at zauber dot es ¶
7 years ago
As you probably know, the native function strip_tags don't work very well with malformed HTML when you use the allowed tags parameter. This is a very simple but effective function to remove html tags. It takes a list (array) of allowed tags as second parameter:
function flame_strip_tags($html, $allowed_tags=array()) { $allowed_tags=array_map(strtolower,$allowed_tags); $rhtml=preg_replace_callback('/<\/?([^>\s]+)[^>]*>/i', function ($matches) use (&$allowed_tags) { return in_array(strtolower($matches[1]),$allowed_tags)?$matches[0]:''; },$html); return $rhtml; } ?> The function works reasonably well with invalid/bad formatted HTML.
Use:
$allowed_tags=array("h2","a"); $html=<<
Example
Getting Started Introduction A simple tutorial Language Reference Basic syntax Predefined Interfaces and Classes EOD; echo flame_strip_tags($html,$allowed_tags); ?> The output will be:
Example
Getting Started Introduction A simple tutorial Language Reference Basic syntax Predefined Interfaces and Classes
tom at cowin dot us ¶
12 years ago
With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...
function strip_word_html($text, $allowed_tags = '^_') { mb_regex_encoding('UTF-8'); //replace MS special characters first $search = array('/‘/u', '/’/u', '/“/u', '/”/u', '/—/u'); $replace = array('\'', '\'', '"', '"', '-'); $text = preg_replace($search, $replace, $text); //make sure _all_ html entities are converted to the plain ascii equivalents - it appears //in some MS headers, some html entities are encoded and some aren't $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8'); //try to strip out any C style comments first, since these, embedded in html comments, seem to //prevent strip_tags from removing html comments (MS Word introduced combination) if(mb_stripos($text, '/*') !== FALSE){ $text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm'); } //introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be //'<1' becomes '< 1'(note: somewhat application specific) $text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text); $text = strip_tags($text, $allowed_tags); //eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one $text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text); //strip out inline css and simplify style tags $search = array('#<(strong|b)[^>]*>(.*?)#isu', '#<(em|i)[^>]*>(.*?)#isu', '#]*>(.*?)#isu'); $replace = array('$2', '$2', '$1'); $text = preg_replace($search, $replace, $text); //on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains //some MS Style Definitions - this last bit gets rid of any leftover comments */ $num_matches = preg_match_all("/\

Bài Viết Liên Quan Bài tập trắc nghiệm về đường tròn và tiếp tuyến năm 2024 mẹo hay Khỏe Đẹp Bài tập Khái niệm hứng thú học tập là gì năm 2024 là ai Hỏi Đáp Là gì Học Tốt Học Cơn đau thắt ngưc ổn định là gì năm 2024 là ai Hỏi Đáp Là gì Bài văn cảm nhận về bài thơ qua đèo ngang năm 2024 mẹo hay Đại trung môn văn miếu quốc tử giám năm 2024 mẹo hay Văn Miếu Định lượng ca72-4 cancer antigen 72-4 là gì năm 2024 là ai Hỏi Đáp Là gì CA 72 cyfra 21-1 Cea la gì CA 72-4 wikipedia CEA Văn phòng giao dịch có phải nộp thuế môn bài năm 2024 mẹo hay Ngôn ngữ Dịch Thông tin xuất hóa đơn gồm những gì năm 2024 mẹo hay Intelin là số điện thoại gì năm 2024 là ai INTELIN vn Soạn bài rằm tháng giêng ngữ văn lớp 7 năm 2024 mẹo hay Soạn văn 7 1 số đề kiểm tra môn hóa hk2 năm 2024 mẹo hay Phương tiện truyền thông xã hội là gì năm 2024 là ai Hỏi Đáp Là gì Toán lớp 7 tập 1 bài 37 trang 22 năm 2024 mẹo hay Bài tập lai phân tích 2 cặp tính trạng năm 2024 mẹo hay Khỏe Đẹp Bài tập Cryto Phân tích A launch is on the cards là gì năm 2024 là ai Hỏi Đáp Là gì Launched là gì Laid-back là gì In succession On the rocks Uproar Slipped my mind Lỗi cad không hiển thi hai đầu đoan thắng năm 2024 mẹo hay Trường đại học kinh bắc có những ngành nào năm 2024 mẹo hay Học Tốt Học Đại học Làm thế nào để được nhiều like trên facebook năm 2024 mẹo hay Hỏi Đáp Thế nào Công Nghệ Facebook Mua like Facebook Quản trị kinh doanh ebba là gì năm 2024 là ai Hỏi Đáp Là gì Giá trị di sản văn hóa mỹ sơn năm 2024 mẹo hay Cryto Giá Quảng Cáo Có thể bạn quan tâm Bai tap thày nguyen thanh van toán cao cap năm 2024 1 ngày trước . bởi BlondSpectre Chỉ tiêu monetary base broad money là gì năm 2024 1 ngày trước . bởi TotalDeparture Bán nhà đất xã văn phú thường tín năm 2024 1 ngày trước . bởi AdditiveFriendliness Tại sao copy đoạn văn lại bị lỗi chính tả năm 2024 1 ngày trước . bởi Anti-inflationInaccuracy Thợ kỹ thuật sửa chữa máy tiếng anh là gì năm 2024 1 ngày trước . bởi AssembledLoathing Lỗi không vào được audition trên win 7 năm 2024 1 ngày trước . bởi UnfashionableIllustrator Cbd and non-cbd là gì trong sub-market của cho thuê năm 2024 1 ngày trước . bởi UnansweredMainframe Cai dat phan mem be khoa wifi cho lap top năm 2024 1 ngày trước . bởi BefuddledCouncilman Chữa bệnh không đúng tuyến được bhyt thanh toán năm 2024 1 ngày trước . bởi NavalUnification Top 10 vũ khí mạnh nhất trong bloxpiece năm 2024 1 ngày trước . bởi ConsecutiveSquid Toplist được quan tâm #1 Top 9 tập bản đồ lớp 8 bài 31 2023 5 tháng trước #2 Top 6 kết quả thi hsg đà nẵng 2022 2023 5 tháng trước #3 Top 9 tủ nhựa đài loan 4 cánh 3d 2023 5 tháng trước #4 Top 9 chất khí có thể làm mất màu dung dịch nước brom là: a. so2. b. co2. c. o2. d. hcl. 2023 5 tháng trước #5 Top 8 tìm việc làm tiện, phay bảo q7 2023 5 tháng trước #6 Top 3 tôi xuyên thành tiểu kiều the của lão đại phản 2 2023 5 tháng trước #7 Top 9 đổi mới phong cách, thái độ phục vụ của cán bộ y tế hướng tới sự hài lòng của người bệnh 2023 5 tháng trước #8 Top 2 bài the dục phát triển chung lớp 6 2022 2023 5 tháng trước #9 Top 3 bài giảng vũ điệu sắc màu (lớp 4) 2023 5 tháng trước Quảng cáo Xem Nhiều Bài 42 vật lí 9 sách bài tập năm 2024 1 tuần trước . bởi One-manReceptor Camera điều khiển tren không gọi là gì năm 2024 2 ngày trước . bởi SquareBrunt Ký hiệu ext trên tay cầm ps4 pro là gì năm 2024 1 ngày trước . bởi OverhangingIceberg Nghĩa vụ cơ bản của hợp đồng là gì năm 2024 1 ngày trước . bởi Short-termGrandeur Dùng ram như thế nào cho máy tính xem phim năm 2024 3 ngày trước . bởi DelightedWhereabouts Qua tiêu hóa lipid sẽ được biến đổi thành gì năm 2024 5 ngày trước . bởi RabidContentment Nhà hàng phố đêm 346 phạm văn đồng năm 2024 1 tuần trước . bởi UnregulatedIntersection Đi vệ sinh nhiều lần trong ngày là bệnh gì năm 2024 1 tuần trước . bởi AptPerusal Anh không có font nền gọi là gì năm 2024 5 ngày trước . bởi Two-hourProcessing Lỗi page fault in nonpaged area win 7 năm 2024 5 ngày trước . bởi ImplacableDriver Quảng cáo Chúng tôi Giới thiệu Liên hệ Tuyển dụng Quảng cáo Điều khoản Điều khoản hoạt động Điều kiện tham gia Quy định cookie Trợ giúp Hướng dẫn Loại bỏ câu hỏi Liên hệ Mạng xã hội Facebook Twitter LinkedIn Instagram Bản quyền © 2021 Xây Nhà Inc.