Php truncate string by words

By using the wordwrap function. It splits the texts in multiple lines such that the maximum width is the one you specified, breaking at word boundaries. After splitting, you simply take the first line:

substr[$string, 0, strpos[wordwrap[$string, $your_desired_width], "\n"]];

One thing this oneliner doesn't handle is the case when the text itself is shorter than the desired width. To handle this edge-case, one should do something like:

if [strlen[$string] > $your_desired_width] 
{
    $string = wordwrap[$string, $your_desired_width];
    $string = substr[$string, 0, strpos[$string, "\n"]];
}

The above solution has the problem of prematurely cutting the text if it contains a newline before the actual cutpoint. Here a version which solves this problem:

function tokenTruncate[$string, $your_desired_width] {
  $parts = preg_split['/[[\s\n\r]+]/', $string, null, PREG_SPLIT_DELIM_CAPTURE];
  $parts_count = count[$parts];

  $length = 0;
  $last_part = 0;
  for [; $last_part < $parts_count; ++$last_part] {
    $length += strlen[$parts[$last_part]];
    if [$length > $your_desired_width] { break; }
  }

  return implode[array_slice[$parts, 0, $last_part]];
}

Also, here is the PHPUnit testclass used to test the implementation:

class TokenTruncateTest extends PHPUnit_Framework_TestCase {
  public function testBasic[] {
    $this->assertEquals["1 3 5 7 9 ",
      tokenTruncate["1 3 5 7 9 11 14", 10]];
  }

  public function testEmptyString[] {
    $this->assertEquals["",
      tokenTruncate["", 10]];
  }

  public function testShortString[] {
    $this->assertEquals["1 3",
      tokenTruncate["1 3", 10]];
  }

  public function testStringTooLong[] {
    $this->assertEquals["",
      tokenTruncate["toooooooooooolooooong", 10]];
  }

  public function testContainingNewline[] {
    $this->assertEquals["1 3\n5 7 9 ",
      tokenTruncate["1 3\n5 7 9 11 14", 10]];
  }
}

EDIT :

Special UTF8 characters like 'à' are not handled. Add 'u' at the end of the REGEX to handle it:

$parts = preg_split['/[[\s\n\r]+]/u', $string, null, PREG_SPLIT_DELIM_CAPTURE];

Konrad Kiss

6,8061 gold badge19 silver badges21 bronze badges

answered Sep 17, 2008 at 4:27

Grey PantherGrey Panther

12.6k6 gold badges44 silver badges64 bronze badges

9

This will return the first 200 characters of words:

preg_replace['/\s+?[\S+]?$/', '', substr[$string, 0, 201]];

answered Sep 17, 2008 at 4:41

mattmacmattmac

2,0253 gold badges17 silver badges16 bronze badges

8

$WidgetText = substr[$string, 0, strrpos[substr[$string, 0, 200], ' ']];

And there you have it — a reliable method of truncating any string to the nearest whole word, while staying under the maximum string length.

I've tried the other examples above and they did not produce the desired results.

answered Jan 12, 2011 at 4:29

DaveDave

5294 silver badges6 bronze badges

4

The following solution was born when I've noticed a $break parameter of wordwrap function:

string wordwrap [ string $str [, int $width = 75 [, string $break = "\n" [, bool $cut = false ]]] ]

Here is the solution:

/**
 * Truncates the given string at the specified length.
 *
 * @param string $str The input string.
 * @param int $width The number of chars at which the string will be truncated.
 * @return string
 */
function truncate[$str, $width] {
    return strtok[wordwrap[$str, $width, "...\n"], "\n"];
}

Example #1.

print truncate["This is very long string with many chars.", 25];

The above example will output:

This is very long string...

Example #2.

print truncate["This is short string.", 25];

The above example will output:

This is short string.

answered Jul 25, 2013 at 8:10

2

Keep in mind whenever you're splitting by "word" anywhere that some languages such as Chinese and Japanese do not use a space character to split words. Also, a malicious user could simply enter text without any spaces, or using some Unicode look-alike to the standard space character, in which case any solution you use may end up displaying the entire text anyway. A way around this may be to check the string length after splitting it on spaces as normal, then, if the string is still above an abnormal limit - maybe 225 characters in this case - going ahead and splitting it dumbly at that limit.

One more caveat with things like this when it comes to non-ASCII characters; strings containing them may be interpreted by PHP's standard strlen[] as being longer than they really are, because a single character may take two or more bytes instead of just one. If you just use the strlen[]/substr[] functions to split strings, you may split a string in the middle of a character! When in doubt, mb_strlen[]/mb_substr[] are a little more foolproof.

answered Sep 17, 2008 at 6:08

Garrett AlbrightGarrett Albright

2,8043 gold badges27 silver badges44 bronze badges

Use strpos and substr:

Chủ Đề