Jsoup - How To Extract Every Elements
I'm trying to get font information by using Jsoup. For an example: Below is my code: result = rtfToHtml(new StringReader(streamToString((InputStream)contents.getTransferData(dfRTF
Solution 1:
If you only need to extract the text from a document, plus any <b>
or <i>
tags (as per your example), consider using the Whitelist class (see docs):
String html = "<body><pclass='default'><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'><b>Hello World</b></span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> , Testing </span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'><i><b>Font </b></i></span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> Style </span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'><i>Check</i></span><spanstyle='color: #000000; font-size: 10pt; font-family: MyriadPro-Bold;'></span></p></body>";
Whitelist wl = Whitelist.simpleText();
wl.addTags("b", "i"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);
Which will output (as per your example):
11-0719:04:45.738: I/System.out(318): <b>Hello World</b> , Testing11-0719:04:45.738: I/System.out(318): <i><b>Font </b></i>Style11-0719:04:45.738: I/System.out(318): <i>Check</i>
Update:
ArrayList<String> elements = new ArrayList<String>();
Elements e = doc.select("span");
for (int i = 0; i < e.size(); i++) {
elements.add(e.get(i).html());
}
Solution 2:
You need to change your selector to the <p>
tag like so:
Element all = doc.select("p").first();
Then you need to get all the children of that element.
StringmyString="";
for(Element item : all.children()) {
myString += item.text();
}
I am assuming you want the text inside the tags, and not the tags themselves.
Alternatively you could do.
Elements all = doc.select("b");
all.addAll(doc.select("i"));
all.addAll(doc.select("span"));
String myString = all.text();
Post a Comment for "Jsoup - How To Extract Every Elements"