Converting files to UTF-8 without BOM in ANT

In the process of upgrading 10G BPEL proces to 11G composites we encountered "Content is not allowed in prolog. ORABPEL-01501" and "XML-20109: (Fatal Error) PI with the name 'xml' can occur only in the beginning of the document." errors. Major cause is that the original BPEL processes do contain schemas delivered by external organizations in a variety of encodings. Some of them did contain within the encoding not allowed characters which were accepted in 10g but do not pass the more strict validation of schemas in 11g. To forestall most of these issues i wrote the folioing ANT target which is called during the upgrade process to correct most (not all!) of these file encoding issues.

<target name="pre_and_post_upgrade_utf8_encoding_fix">
  <echo>Replace illegal utf-8 characters in XSD</echo>
  <!-- create working dir -->
  <mkdir dir="${utf8.fix.basedir}/tmp_utf8"/>
  <!-- copy files in UTF-8 encoding to tmp dir -->
  <copy todir="${utf8.fix.basedir}/tmp_utf8" outputencoding="UTF-8" overwrite="true">
   <fileset dir="${utf8.fix.basedir}">
    <include name="**/*.xsd"/>
   </fileset>
  </copy>
  <!-- copy files back to original location -->
  <copy todir="${utf8.fix.basedir}" overwrite="true">
   <fileset dir="${utf8.fix.basedir}/tmp_utf8">
    <include name="**/*.*"/>
   </fileset>
  </copy>
  <!-- Remove Byte Order Mark -->
  <replaceregexp match="^.*(&lt;\?xml.*)" replace="\1" byline="true">
   <fileset dir="${utf8.fix.basedir}">
    <include name="*.xsd"/>
   </fileset>
  </replaceregexp>
  <!-- delete working dit -->
  <delete dir="${utf8.fix.basedir}/tmp_utf8"/>
</target>


blog comments powered by Disqus