We developed Transformer during the process of rescuing some very old linguistics data. A long-running dictionary project had accumulated a great deal of data created with Lexware and WordPerfect for DOS, along with a set of obscure fonts no longer available, and we had to convert it to Unicode and XML. The files contained complicated sequences of characters used by WordPerfect to encode font and style data, which in turn had been used to achieve representation of IPA characters in DOS. In order to convert this information to Unicode codepoints, we had to build a complex sequence of search/replace operations which would identify specific representations in the WordPerfect files and replace them with Unicode characters. Transformer was developed in order to facilitate the process of building and testing these sequences, and then applying the results to a large number of files in batch format. This is the main purpose for the tool, although of course it can be used for any search/replace operation on Unicode text files.
Transformer works only with Unicode text; when a file is loaded, it will be turned into a stream of 16-bit characters internally (32-bit Unicode is not supported at this point). When loading a file, this is how the program decides what to do:
UTF-8 files without Byte Order Marks are common, because some systems and applications cannot handle byte-order marks. If you know your files are UTF-8, but they don't have BOMs, then you can add a BOM to them in the following way:
All your files will have a UTF-8 BOM added to them, so the program will definitely understand how to read them. You can remove BOMs from UTF-8 files in a similar way, by pressing Control + Alt + C.
When you do batch transformations or save files from the Output Text box, you have the option of adding BOMs to your UTF-8 files or not. Whether you do so is up to you, and depends on what you're going to do with the files later. If you know you will NOT be using the files in contexts where the BOM will cause problems, then it's recommended that you add a BOM.
If you want to use Transformer on ASCII files, and get ASCII files as output, then simply choose to save them as UTF-8 without a BOM. The first 127 characters in UTF-8 are saved as single-byte characters, and are the same as the ASCII set, so an ASCII file containing only these characters is identical to the same file in UTF-8 without a BOM. If you want to work on ANSI files, and keep them in ANSI format, then Transformer is not the right tool for the job.
This explanation from the Unicode Consortium Website may help to clarify some of these issues and the terminology used:
Delphi's Unicode string processing is remarkably fast compared with other environments such as Java. As sequences of search/replace operations get longer and more complicated, it becomes increasingly important that testing operations be completed as rapidly as possible so that the user can see immediately the results of changes. A Windows application written with Delphi is the most practical, efficient approach to this kind of work.