c# regular expression for finding links in <a> with specific ending

c# regular expression for finding links in <a> with specific ending - c#

I need a regex pattern for finding links in a string (with HTML code) to get the links with file endings like .gif or .png
Example String:
picture.png
For now I get everything between the " " and the text between the <a> and </a>.
I want to get this:
Href = //site.com/folder/picture.png
String = picture.png
My code so far:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Diagnostics;
using System.Drawing;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace downloader
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
string url = textBox1.Text;
string s = gethtmlcode(url);
foreach (LinkItem i in LinkFinder.Find(s))
{
richTextBox1.Text += Convert.ToString(i);
}
}
static string gethtmlcode(string url)
{
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString(url);
return htmlCode;
}
}
public struct LinkItem
{
public string Href;
public string Text;
public override string ToString()
{
return Href + "\n\t" + Text + "\n\t";
}
}
static class LinkFinder
{
public static List<LinkItem> Find(string file)
{
List<LinkItem> list = new List<LinkItem>();
// 1.
// Find all matches in file.
MatchCollection m1 = Regex.Matches(file, #"(<a.*?>.*?</a>)",
RegexOptions.Singleline);
// 2.
// Loop over each match.
foreach (Match m in m1)
{
string value = m.Groups[1].Value;
LinkItem i = new LinkItem();
// 3.
// Get href attribute.
Match m2 = Regex.Match(value, #"href=\""(.*?)\""",
RegexOptions.Singleline);
if (m2.Success)
{
i.Href = m2.Groups[1].Value;
}
// 4.
// Remove inner tags from text.
string t = Regex.Replace(value, #"\s*<.*?>\s*", "",
RegexOptions.Singleline);
i.Text = t;
list.Add(i);
}
return list;
}
}
}
}

I can suggest using HtmlAgilityPack for this task. Install using Manage NuGet Packages for Solution menu, and add the following method:
/// <summary>
/// Collects a href attribute values and a node values if image extension is jpg or png
/// </summary>
/// <param name="html">HTML string or an URL</param>
/// <returns>A key-value pair list of href values and a node values</returns>
private List<KeyValuePair<string, string>> GetLinksWithHtmlAgilityPack(string html)
{
var result = new List<KeyValuePair<string, string>>();
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
{ // html is a URL
var doc = new HtmlAgilityPack.HtmlWeb();
hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
hap = new HtmlAgilityPack.HtmlDocument();
hap.LoadHtml(html);
}
var nodes = hap.DocumentNode.SelectNodes("//a");
if (nodes != null)
foreach (var node in nodes)
if (Path.GetExtension(node.InnerText.Trim()).ToLower() == ".png" ||
Path.GetExtension(node.InnerText.Trim()).ToLower() == ".jpg")
result.Add(new KeyValuePair<string,string>(node.GetAttributeValue("href", null), node.InnerText));
return result;
}
Then, use it as (I am using a dummy string, just for demo)
var result = GetLinksWithHtmlAgilityPack("picture.pngpicture.bmp");
Output:
Or, with a URL, something like:
var result = GetLinksWithHtmlAgilityPack("http://www.google.com");

Related

link extraction using HtmlAgilityPack and c#

i want to extract google result links
My code works it does extract links, but these links are not what i expected to be extracted.
My program would extract links inside the "a href" tag but all links in search result are not Appropriate links , ads link , googles link are also included
what should i do?
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;
namespace Search
{
public partial class Form1 : Form
{
// load snippet
HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();
public Form1()
{
InitializeComponent();
}
private void btn1_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
StringBuilder sb = new StringBuilder();
byte[] ResultsBuffer = new byte[8192];
string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
if (count != 0)
{
tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
sb.Append(tempString);
}
}
while (count > 0);
string sbb = sb.ToString();
HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
html.OptionOutputAsXml = true;
html.LoadHtml(sbb);
HtmlNode doc = html.DocumentNode;
foreach (HtmlNode link in doc.SelectNodes("//a[#href]"))
{
//HtmlAttribute att = link.Attributes["href"];
string hrefValue = link.GetAttributeValue("href", string.Empty);
// if ()
{
int index = hrefValue.IndexOf("&");
if (index > 0)
{
hrefValue = hrefValue.Substring(0, index);
listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
}
}
}
}
}
}
if i want to work with "a href" tag i have to add some condition in If
but i dont know what condition i should use here:
if ()
someplace i read about extracting cite tag not ahref tag anybody can help?

To get the links that are contained in the cite elements, simply access their inner text, like:
HtmlWeb w = new HtmlWeb();
var hd = w.Load("http://www.google.com/search?q=veverke");
var cites = hd.DocumentNode.SelectNodes("//cite");
foreach (var cite in cites)
Console.WriteLine(cite.InnerText);

get title tag by html agility pack

i'm trying to use htmlagility pack to gain links and tites of results
i have this code
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;
namespace Search
{
public partial class Form1 : Form
{
// load snippet
HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();
public Form1()
{
InitializeComponent();
}
private void btn1_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
StringBuilder sb = new StringBuilder();
byte[] ResultsBuffer = new byte[8192];
string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
if (count != 0)
{
tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
sb.Append(tempString);
}
}
while (count > 0);
string sbb = sb.ToString();
HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
html.OptionOutputAsXml = true;
html.LoadHtml(sbb);
HtmlNode doc = html.DocumentNode;
foreach (HtmlNode link in doc.SelectNodes("//a[#href]"))
{
//HtmlAttribute att = link.Attributes["href"];
string hrefValue = link.GetAttributeValue("href", string.Empty);
if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
{
int index = hrefValue.IndexOf("&");
if (index > 0)
{
hrefValue = hrefValue.Substring(0, index);
listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
}
}
}
}
}
}
this code returns result links for a query i want to get title tag for each link too how can i get title for each links?
anybody can help?

If, by 'title', you mean the displayed text of the link, then you can get it from InnerText property of each HtmlNode link :
foreach (HtmlNode link in doc.SelectNodes("//a[#href]"))
{
.....
var title = link.InnerText.Trim();
}

This will give output: links and titles of the links. This will not return the anchor text of links. You will get only the title of each link.
foreach (HtmlNode link in doc.SelectNodes("//a[#href]"))
{
HtmlWeb htmlWeb = new HtmlWeb();
HtmlAgilityPack.HtmlDocument htmlDocument = htmlWeb.Load(link);
var title = htmlDocument.DocumentNode.SelectSingleNode("html/head/title").InnerText;
}

How to create an array and fill from tree node variable

I'm trying to transfer data from a treenode (at least I think that's what it is) which contains much more data than I need. It would be very difficult for me to manipulate the data within the treenode. I would much rather have an array which provides me with only the necessary data for data manipulation.
I would like higher rates have following variables:
1. BookmarkNumber (integer)
2. Date (string)
3. DocumentType (string)
4. BookmarkPageNumberString (string)
5. BookmarkPageNumberInteger (integer)
I would like to the above defined rate from the data from variable book_mark (as can be seen in my code).
I've been wrestling with this for two days. Any help would be much appreciated. I'm probably sure that the question wasn't phrased correctly so please ask questions so that I may explain further if needed.
Thanks so much
BTW what I'm trying to do is create a Windows Form program which parses a PDF file which has multiple bookmarks into discrete PDF files for each bookmark/chapter while saving the bookmark in the correct folder with the correct naming convention, the folder and naming convention dependent upon the PDF name and title name of the bookmark/chapter being parsed.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using itextsharp.pdfa;
using iTextSharp.awt;
using iTextSharp.testutils;
using iTextSharp.text;
using iTextSharp.xmp;
using iTextSharp.xtra;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void ChooseImageFileWrapper_Click(object sender, EventArgs e)
{
OpenFileDialog openFileDialog1 = new OpenFileDialog();
openFileDialog1.InitialDirectory = GlobalVariables.InitialDirectory;
openFileDialog1.Filter = "Pdf Files|*.pdf";
openFileDialog1.RestoreDirectory = true;
openFileDialog1.Title = "Image File Wrapper Chooser";
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
try
{
GlobalVariables.ImageFileWrapperPath = openFileDialog1.FileName;
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
}
}
ImageFileWrapperPath.Text = GlobalVariables.ImageFileWrapperPath;
}
private void ImageFileWrapperPath_TextChanged(object sender, EventArgs e)
{
}
private void button2_Click(object sender, EventArgs e)
{
iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(GlobalVariables.ImageFileWrapperPath);
IList<Dictionary<string, object>> book_mark = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(pdfReader);
List<ImageFileWrapperBookmarks> IFWBookmarks = new List<ImageFileWrapperBookmarks>();
foreach (Dictionary<string, object> bk in book_mark) // bk is a single instance of book_mark
{
ImageFileWrapperBookmarks.BookmarkNumber = ImageFileWrapperBookmarks.BookmarkNumber + 1;
foreach (KeyValuePair<string, object> kvr in bk) // kvr is the key/value in bk
{
if (kvr.Key == "Kids" || kvr.Key == "kids")
{
//create recursive program for children
}
else if (kvr.Key == "Title" || kvr.Key == "title")
{
}
else if (kvr.Key == "Page" || kvr.Key == "page")
{
}
}
}
MessageBox.Show(GlobalVariables.ImageFileWrapperPath);
}
}
}

Here's one way to parse a PDF and create a data structure similar to what you describe. First the data structure:
public class BookMark
{
static int _number;
public BookMark() { Number = ++_number; }
public int Number { get; private set; }
public string Title { get; set; }
public string PageNumberString { get; set; }
public int PageNumberInteger { get; set; }
public static void ResetNumber() { _number = 0; }
// bookmarks title may have illegal filename character(s)
public string GetFileName()
{
var fileTitle = Regex.Replace(
Regex.Replace(Title, #"\s+", "-"),
#"[^-\w]", ""
);
return string.Format("{0:D4}-{1}.pdf", Number, fileTitle);
}
}
A method to create a list of Bookmark (above):
List<BookMark> ParseBookMarks(IList<Dictionary<string, object>> bookmarks)
{
int page;
var result = new List<BookMark>();
foreach (var bookmark in bookmarks)
{
// add top-level bookmarks
var stringPage = bookmark["Page"].ToString();
if (Int32.TryParse(stringPage.Split()[0], out page))
{
result.Add(new BookMark() {
Title = bookmark["Title"].ToString(),
PageNumberString = stringPage,
PageNumberInteger = page
});
}
// recurse
if (bookmark.ContainsKey("Kids"))
{
var kids = bookmark["Kids"] as IList<Dictionary<string, object>>;
if (kids != null && kids.Count > 0)
{
result.AddRange(ParseBookMarks(kids));
}
}
}
return result;
}
Call method above like this to dump the results to a text file:
void DumpResults(string path)
{
using (var reader = new PdfReader(path))
{
// need this call to parse page numbers
reader.ConsolidateNamedDestinations();
var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
var sb = new StringBuilder();
foreach (var bookmark in bookmarks)
{
sb.AppendLine(string.Format(
"{0, -4}{1, -100}{2, -25}{3}",
bookmark.Number, bookmark.Title,
bookmark.PageNumberString, bookmark.PageNumberInteger
));
}
File.WriteAllText(outputTextFile, sb.ToString());
}
}
The bigger problem is how to extract each Bookmark into a separate file. If every Bookmark starts a new page it's easy:
Iterate over the return value of ParseBookMarks()
Select a page range that begins with the current BookMark.Number, and ends with the next BookMark.Number - 1
Use that page range to create separate files.
Something like this:
void ProcessPdf(string path)
{
using (var reader = new PdfReader(path))
{
// need this call to parse page numbers
reader.ConsolidateNamedDestinations();
var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
for (int i = 0; i < bookmarks.Count; ++i)
{
int page = bookmarks[i].PageNumberInteger;
int nextPage = i + 1 < bookmarks.Count
// if not top of page will be missing content
? bookmarks[i + 1].PageNumberInteger - 1
/* alternative is to potentially add redundant content:
? bookmarks[i + 1].PageNumberInteger
*/
: reader.NumberOfPages;
string range = string.Format("{0}-{1}", page, nextPage);
// DEMO!
if (i < 10)
{
var outputPath = Path.Combine(OUTPUT_DIR, bookmarks[i].GetFileName());
using (var readerCopy = new PdfReader(reader))
{
var number = bookmarks[i].Number;
readerCopy.SelectPages(range);
using (FileStream stream = new FileStream(outputPath, FileMode.Create))
{
using (var document = new Document())
{
using (var copy = new PdfCopy(document, stream))
{
document.Open();
int n = readerCopy.NumberOfPages;
for (int j = 0; j < n; )
{
copy.AddPage(copy.GetImportedPage(readerCopy, ++j));
}
}
}
}
}
}
}
}
}
The problem is that it's highly unlikely all bookmarks are going to be at the top of every page of the PDF. To see what I mean, experiment with commenting / uncommenting the bookmarks[i + 1].PageNumberInteger lines.

Build HtmlGenericControl from a string of full html

I want to be able to add attributes to a string of html without having to build a parser to handle the html. In one specific case, I want to be able to extract the id of the html or insert an id to the html server side.
Say I have:
string stringofhtml = "<img src=\"someimage.png\" alt=\"the image\" />";
I would like to be able to do something like:
HtmlGenericControl htmlcontrol = new HtmlGenericControl(stringofhtml);
htmlcontrol.Attributes["id'] = "newid";
OR
int theid = htmlcontrol.Attributes["id"];
This is just a way that I can access/add attributes of the html strings that I have.

You can do this:
HtmlGenericControl ctrl = new HtmlGenericControl();
ctrl.InnerHtml = "<img src=\"someimage.png\" alt=\"the image\" />";
You could always use a LiteralControl too, instead of an HtmlGenericControl:
LiteralControl lit = new LiteralControl(stringOfHtml);

I do not think there is a control available which will provide you with the functionality you are looking for.
Below I have made use of the HtmlAgility pack to parse/query the HTML and created a new control subclassing the Literal control.
This control accepts an HTML string, checks to ensure it contains at least a single element and provides access to get/set that elements attributes.
Example usage
string image = "<img src=\"someimage.png\" alt=\"the image\" />";
HtmlControlFromString htmlControlFromString = new HtmlControlFromString(image);
htmlControlFromString.Attributes["id"] = "image2";
string id = htmlControlFromString.Attributes["id"];
Control
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI.WebControls;
using HtmlAgilityPack;
public class HtmlControlFromString : Literal
{
private HtmlDocument _document = new HtmlDocument();
private HtmlNode _htmlElement;
public AttributesCollection Attributes { get; set; }
public HtmlControlFromString(string html)
{
_document.LoadHtml(html);
if (_document.DocumentNode.ChildNodes.Count > 0)
{
_htmlElement = _document.DocumentNode.ChildNodes[0];
Attributes = new AttributesCollection(_htmlElement);
Attributes.AttributeChanged += new EventHandler(Attributes_AttributeChanged);
SetHtml();
}
else
{
throw new InvalidOperationException("Argument does not contain a valid html element.");
}
}
void Attributes_AttributeChanged(object sender, EventArgs e)
{
SetHtml();
}
void SetHtml()
{
Text = _htmlElement.OuterHtml;
}
}
public class AttributesCollection
{
public event EventHandler AttributeChanged;
private HtmlNode _htmlElement;
public string this[string attribute]
{
get
{
HtmlAttribute htmlAttribute = _htmlElement.Attributes[attribute];
return htmlAttribute == null ? null : htmlAttribute.Value;
}
set
{
HtmlAttribute htmlAttribute = _htmlElement.Attributes[attribute];
if (htmlAttribute == null)
{
htmlAttribute = _htmlElement.OwnerDocument.CreateAttribute(attribute);
htmlAttribute.Value = value;
_htmlElement.Attributes.Add(htmlAttribute);
}
else
{
htmlAttribute.Value = value;
}
EventHandler attributeChangedHandler = AttributeChanged;
if (attributeChangedHandler != null)
attributeChangedHandler(this, new EventArgs());
}
}
public AttributesCollection(HtmlNode htmlElement)
{
_htmlElement = htmlElement;
}
}
Hope this helps.

Searching Specific Data From a File

I have a File having text and few numbers.I just want to extract numbers from it.How do I go about it ???
I tried using all that split thing but no luck so far.
My File is like this:
AT+CMGL="ALL"
+CMGL: 5566,"REC READ","Ufone"
Dear customer, your DAY_BUCKET subscription will expire on 02/05/09
+CMGL: 5565,"REC READ","+923466666666"
KINDLY TELL ME THE WAY TO EXTRACT NUMBERS LIKE +923466666666 from this File so I can put them into another File or textbox.
Thanks

Here's an example using the String.Split. The "number" contains a '+', so really it should be treated as a string not a number. I'm presuming it's a telephone number with the '+' potentially used for international calls? If it is a telephone number, you need to be careful of dashes, spaces in the number as well as extension numbers added to the end eg "+9234 666-66666 ext 235" and so on...
Anyway - hopefully the example is useful in getting to grips with Split.
The code include unit tests using NUnit v2.4.8
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using System.Text.RegularExpressions;
namespace SO.NumberExtractor.Test
{
public class NumberExtracter
{
public List<string> ExtractNumbers(string lines)
{
List<string> numbers = new List<string>();
string[] seperator = { System.Environment.NewLine };
string[] seperatedLines = lines.Split(seperator, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in seperatedLines)
{
string s = ExtractNumber(line);
numbers.Add(s);
}
return numbers;
}
public string ExtractNumber(string line)
{
string s = line.Split(',').Last<string>().Trim('"');
return s;
}
public string ExtractNumberWithoutLinq(string line)
{
string[] fields = line.Split(',');
string s = fields[fields.Length - 1];
s = s.Trim('"');
return s;
}
}
[TestFixture]
public class NumberExtracterTest
{
private readonly string LINE1 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666666\"";
private readonly string LINE2 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666667\"";
private readonly string LINE3 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666668\"";
[Test]
public void ExtractOneLineWithoutLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumberWithoutLinq(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractOneLineUsingLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumber(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractMultipleLines()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(LINE1);
sb.AppendLine(LINE2);
sb.AppendLine(LINE3);
NumberExtracter ne = new NumberExtracter();
List<string> extractedNumbers = ne.ExtractNumbers(sb.ToString());
string expectedFirst = "+923466666666";
string expectedSecond = "+923466666667";
string expectedThird = "+923466666668";
Assert.AreEqual(expectedFirst, extractedNumbers[0]);
Assert.AreEqual(expectedSecond, extractedNumbers[1]);
Assert.AreEqual(expectedThird, extractedNumbers[2]);
}
}
}

If the numbers are all at the end of the lines then you can use code like the following
foreach ( string line in File.ReadAllLines(#"c:\path\to\file.txt") ) {
Match result = Regex.Match(line, #"\+(\d+)""$");
if ( result.Success ) {
var number = result.Groups[1].Value;
// do what you want with the number
}
}

How large is the file? If the file is under a few megabytes in size I would recommend loading the file contents into a string and using a compiled regular expression to extract matches.
Here's a quick example:
Regex NumberExtractor = new Regex("[0-9]{7,16}",RegexOptions.Compiled);
/// <summary>
/// Extracts numbers between seven and sixteen digits long from the target file.
/// Example number to be extracted: +923466666666
/// </summary>
/// <param name="TargetFilePath"></param>
/// <returns>List of the matching numbers</returns>
private IEnumerable<ulong> ExtractLongNumbersFromFile(string TargetFilePath)
{
if (String.IsNullOrEmpty(TargetFilePath))
throw new ArgumentException("TargetFilePath is null or empty.", "TargetFilePath");
if (File.Exists(TargetFilePath) == false)
throw new Exception("Target file does not exist!");
FileStream TargetFileStream = null;
StreamReader TargetFileStreamReader = null;
string FileContents = "";
List<ulong> ReturnList = new List<ulong>();
try
{
TargetFileStream = new FileStream(TargetFilePath, FileMode.Open);
TargetFileStreamReader = new StreamReader(TargetFileStream);
FileContents = TargetFileStreamReader.ReadToEnd();
MatchCollection Matches = NumberExtractor.Matches(FileContents);
foreach (Match CurrentMatch in Matches) {
ReturnList.Add(System.Convert.ToUInt64(CurrentMatch.Value));
}
}
catch (Exception ex)
{
//Your logging, etc...
}
finally
{
if (TargetFileStream != null) {
TargetFileStream.Close();
TargetFileStream.Dispose();
}
if (TargetFileStreamReader != null)
{
TargetFileStreamReader.Dispose();
}
}
return (IEnumerable<ulong>)ReturnList;
}
Sample Usage:
List<ulong> Numbers = (List<ulong>)ExtractLongNumbersFromFile(#"v:\TestExtract.txt");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# regular expression for finding links in <a> with specific ending - c#

Related

link extraction using HtmlAgilityPack and c#

get title tag by html agility pack

How to create an array and fill from tree node variable

Build HtmlGenericControl from a string of full html

Searching Specific Data From a File

Categories

Resources