pphaneuf: (Default)
[personal profile] pphaneuf
Oh, my goodness. Having fought in the FidoNet charset wars (I was part of the NETDEV echo, way back then), Unicode was supposed to be my saviour, or something.

Behold, trying to keep a two-way rsync of my music library between my Mac OS X laptop and my Linux workstation. Beside the obvious duplication induced by such genius as the interactions between the case-remembering filesystem of Mac OS X and the case-sensitive filesystem of Linux (yeah, "U2" and "u2" are two totally different bands, didn't you know?), charsets come to bite my arse once more, as if I hadn't done my share already.

Some bands, albums or songs with accented characters in them, they were in ISO-8859-1 charset somewhere and UTF-8 in another. At this point, I was all happy of Linux distributions finally having switched over to UTF-8, and Mac OS X being UTF-8 as well, thinking those were old leftovers (they were) and that I just needed to rename them over to UTF-8 in order to regain my sanity.

No. Of course not. How dumb was I?

The latin accented characters can be represented in two ways using UTF-8, using the ISO-8859-1 codepoints, or using some sort of "dead character". This means that using strcmp might mark two identical-looking strings as being different.

Now, what would be your guess on Mac OS X and Linux using the same method to represent latin accented characters? Or, say, the chances of either of them using something more sophisticated than strcmp to compare strings (not that I blame them, this sounds like a ridiculously complicated problem, of the kind we were trying to get rid by kissing the charsets goodbye)?

*sobs*

P.S.: Thank Bob for Firewire and the bandwidth of a hard disk in an enclosure.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

February 2016

S M T W T F S
 123456
7891011 1213
14151617181920
21222324252627
2829     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 20th, 2026 10:08 am
Powered by Dreamwidth Studios