You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Removing elements from the results of SelectNodes causes siblings of nodes either side of the one removed to go missing or change which parent they belong to. Either that or I'm really misunderstanding something.
Fiddle or Project
// @nuget: HtmlAgilityPack
using System;
using HtmlAgilityPack;
public class Program
{
public static void Main()
{
var html =
@"<html>
<head>
<title>Document</title>
</head>
<body>
<div class=""divClass"">
<h3 class=""h3Class"">First Header</h3>
<p class=""pClass"">
Hello
</p>
</div>
<div class=""divClass"">
<h3 class=""h3Class"">Second Header</h3>
<p class=""pClass"">
World!
</p>
</div>
<div class=""divClass"">
<h3 class=""h3Class"">Third Header</h3>
<p class=""pClass"">
Nonsense
</p>
</div>
</body>
</html>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
HtmlNode root = htmlDoc.DocumentNode;
HtmlNodeCollection headers = root.SelectNodes("//h3[contains(@class, 'h3')]");
// don't want the last one
headers.RemoveAt(headers.Count - 1); // without this line, it does what I expect. Both h3's and p's are displayed
foreach(HtmlNode node in headers)
{
Console.WriteLine("Found header: {0}", node.InnerText);
}
Console.WriteLine();
DisplayAllSiblings(headers[0]); // 'p = Hello' should be displayed
DisplayAllSiblings(headers[1]); // 'p = World!' has gone missing
}
static void DisplayAllSiblings(HtmlNode node)
{
HtmlNode parent = node.ParentNode;
HtmlNodeCollection coll = parent.SelectNodes("./*");
Console.WriteLine("Siblings of {0}:", node.InnerText);
foreach(HtmlNode brother in coll)
{
Console.WriteLine("Node: {0} = {1}", brother.Name, brother.InnerText.Trim());
}
Console.WriteLine();
}
}
Output of the above when removing the last node:
Found header: First Header
Found header: Second Header
Siblings of First Header:
Node: h3 = First Header
Node: p = Hello
Siblings of Second Header:
Node: h3 = Second Header
Output when changing the RemoveAt to headers.RemoveAt(1);
Found header: First Header
Found header: Third Header
Siblings of First Header:
Node: h3 = First Header
Node: h3 = Third Header
Node: p = Nonsense
Siblings of Third Header:
Node: h3 = Third Header
Node: p = Nonsense
Further technical details
HAP version: Whichever version dotnetfiddle uses, found in 1.8.10
NET version net472
The text was updated successfully, but these errors were encountered:
The problem with directly using RemoveAt is you use the method from the List<T> which doesn't raise the HasChanges method. To make it works, we would need to create our own List<T> class which I don't think is a good long-term solution.
Make sure to use methods provided by the library instead.
Let me know if that answer correctly to this issue
Description
Removing elements from the results of SelectNodes causes siblings of nodes either side of the one removed to go missing or change which parent they belong to. Either that or I'm really misunderstanding something.
Fiddle or Project
Output of the above when removing the last node:
Output when changing the RemoveAt to headers.RemoveAt(1);
Further technical details
The text was updated successfully, but these errors were encountered: