Monday, May 7, 2012

How to split Tamil characters in a string in PHP


How do I split Tamil characters in a string?



When I use preg_match_all('/./u', $str, $results) ,





I get the characters "த", "ம", "ி", "ழ" and "்".



How do I get the combined characters "த", "மி" and "ழ்"?


Source: Tips4all

1 comment:

  1. I think you should be able to use the grapheme_extract function to iterate over the combined characters (which are technically called "grapheme clusters").

    Alternatively, if you prefer the regex approach, I think you can use this:

    preg_match_all('/\pL\pM*|./u', $str, $results)


    where \pL means a Unicode "letter", and \pM means a Unicode "mark".

    (Disclaimer: I have not tested either of these approaches.)

    ReplyDelete