在Java 8中美化XML
在Java 8中美化XML
我有一个已存储为DOM文档的XML文件,并且我想将其漂亮地打印到控制台上,最好不使用外部库。我知道这个问题在这个网站上已经多次被问及,但是以前的回答都没有对我起作用。我正在使用java 8,因此这可能是我的代码与以前提出的问题不同的地方。我还尝试通过从网络上找到的代码手动设置转换器,但这只会导致一个not found
错误。
这是我的代码,当前仅将每个xml元素输出为控制台左侧的新行。
import java.io.*; import javax.xml.parsers.*; import javax.xml.transform.*; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Document; import org.xml.sax.InputSource; import org.xml.sax.SAXException; public class Test { public Test(){ try { //java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl"); DocumentBuilderFactory dbFactory; DocumentBuilder dBuilder; Document original = null; try { dbFactory = DocumentBuilderFactory.newInstance(); dBuilder = dbFactory.newDocumentBuilder(); original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml")))); } catch (SAXException | IOException | ParserConfigurationException e) { e.printStackTrace(); } StringWriter stringWriter = new StringWriter(); StreamResult xmlOutput = new StreamResult(stringWriter); TransformerFactory tf = TransformerFactory.newInstance(); //tf.setAttribute("indent-number", 2); Transformer transformer = tf.newTransformer(); transformer.setOutputProperty(OutputKeys.METHOD, "xml"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4"); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no"); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.transform(new DOMSource(original), xmlOutput); java.lang.System.out.println(xmlOutput.getWriter().toString()); } catch (Exception ex) { throw new RuntimeException("Error converting to String", ex); } } public static void main(String[] args){ new Test(); } }
admin 更改状态以发布 2023年5月23日
我猜问题与原始文件中的空文本节点(即仅包含空格的文本节点)有关。你应该尝试在解析后立即使用以下代码程序化地将它们移除。如果不移除它们,Transformer
将会保留它们。
original.getDocumentElement().normalize(); XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']"); NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET); for (int i = 0; i < blankTextNodes.getLength(); i++) { blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i)); }
针对Espinosa的评论,这里提供了一个解决方案,适用于“原始xml文件没有缩进或不包含换行符”的情况。
背景
下面是本解决方案的灵感来源文章(参见下面的参考资料)的节选:
根据DOM规范,标签外的空格是完全有效的,并且它们会被正确地保留。要删除它们,我们可以使用XPath的normalize-space函数来定位所有空白节点并首先将它们删除。
Java代码
public static String toPrettyString(String xml, int indent) { try { // Turn xml string into a document Document document = DocumentBuilderFactory.newInstance() .newDocumentBuilder() .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8")))); // Remove whitespaces outside tags document.normalize(); XPath xPath = XPathFactory.newInstance().newXPath(); NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']", document, XPathConstants.NODESET); for (int i = 0; i < nodeList.getLength(); ++i) { Node node = nodeList.item(i); node.getParentNode().removeChild(node); } // Setup pretty print options TransformerFactory transformerFactory = TransformerFactory.newInstance(); transformerFactory.setAttribute("indent-number", indent); Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); // Return pretty print xml string StringWriter stringWriter = new StringWriter(); transformer.transform(new DOMSource(document), new StreamResult(stringWriter)); return stringWriter.toString(); } catch (Exception e) { throw new RuntimeException(e); } }
示例用法
String xml = "" + // "\n " + // "\n "; System.out.println(toPrettyString(xml, 4));Coco Puff " + // "\n10
输出结果
Coco Puff 10
参考资料
- Java: Properly Indenting XML String(发表在MyShittyCode上)
- Save new XML node to file