C# - Efficient way of searching and replacing strings from a file

C# - Efficient way of searching and replacing strings from a file - c#

I have a text file where I wish to search if a set of lines exists and update/overwrite them or if the set of lines does not exists, add them.
Here is the text file.
#
# Virtual Hosts
#
# If you want to maintain multiple domains/hostnames on your
# machine you can setup VirtualHost containers for them. Most configurations
# use only name-based virtual hosts so the server doesn't need to worry about
# IP addresses. This is indicated by the asterisks in the directives below.
#
# Please see the documentation at
# <URL:http://httpd.apache.org/docs/2.2/vhosts/>
# for further details before you try to setup virtual hosts.
#
# You may use the command line option '-S' to verify your virtual host
# configuration.
#
# Use name-based virtual hosting.
#
##NameVirtualHost *:80
#
# VirtualHost example:
# Almost any Apache directive may go into a VirtualHost container.
# The first VirtualHost section is used for all requests that do not
# match a ServerName or ServerAlias in any <VirtualHost> block.
#
##<VirtualHost *:80>
##ServerAdmin postmaster#dummy-host.localhost
##DocumentRoot "C:/xampp/htdocs/dummy-host.localhost"
##ServerName dummy-host.localhost
##ServerAlias www.dummy-host.localhost
##ErrorLog "logs/dummy-host.localhost-error.log"
##CustomLog "logs/dummy-host.localhost-access.log" combined
##</VirtualHost>
##<VirtualHost *:80>
##ServerAdmin postmaster#dummy-host2.localhost
##DocumentRoot "C:/xampp/htdocs/dummy-host2.localhost"
##ServerName dummy-host2.localhost
##ServerAlias www.dummy-host2.localhost
##ErrorLog "logs/dummy-host2.localhost-error.log"
##CustomLog "logs/dummy-host2.localhost-access.log" combined
##</VirtualHost>
<VirtualHost *:80>
ServerAdmin postmaster#dummy-host2.localhost
DocumentRoot "C:/xampp/htdocs/dummy-host2.localhost"
ServerName dummy-host2.localhost
ServerAlias www.dummy-host2.localhost
#ErrorLog "logs/dummy-host2.localhost-error.log"
CustomLog "logs/dummy-host2.localhost-access.log" combined
</VirtualHost>
Here is the skeleton code. I first open the text file, and read it to the end so that I can use the Regex class. (I chose this because the code looks cleaner and concise rather than doing it the C way - looping). But it isn't that simple because I need to check first a set of lines.
StreamReader reader = new StreamReader(path);
string content = reader.ReadToEnd();
reader.Close();
// Replace the strings using Regex replace method
StreamWriter writer = new StreamWriter(path);
writer.Write(content);
writer.Close();
Given a port number, I appended it to this pattern
string virtualHost = "<VirtualHost *:" + cls_globalvariables.portNumber + ">";
And I used
Match match = Regex.Match(content, virtualHost);
to find the index of the search pattern. I also had to find the index of its closing tag and replace them with an updated version of those lines. I have no problems of searching the ending line but I do have a problem of distinguishing the commented from uncommented lines.
Regex.Match returns the first occurrence of the search pattern which is the commented line. What I wanted to do was to search patterns without comments but how do I do that? I began thinking in C schemes such as looping character by character backwards and forwards starting from the match.Index until I detect a delimiter of "\r\n". Is there an efficient C# way to solve this?

You can either do it the way Jesse suggests, or if you want to stick to Regex, just put ^ at the start of your regex and set RegexOptions.Multiline.
Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string. For more information, see the "Multiline Mode" section in the Regular Expression Options topic.

Related

Regex - Extract also URLs with www

I use this regex to find URLs:
(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
Problem is, that it doesn't find urls which start with www.
How can I solve this?
Here is my data source that I need to extract urls from.

This answer is based on the provide xml file you come with in your comment.
There are couple of issues with your file, beside starting with https, http and www, it contains urls that start with download.somedomain.com, marketplace.somedmain.com, so it is inconsistence. the other issues is the ending of the the url can end with ., </, it does not have spaces after ending the url and it does not have a pattern to go through it line by line or chunk by chunk.
And last thing it contains duplicates.
The way I chose to solve, by chopping regex in 2 parts:
One part take all urls that start with valid url, with out looking at the end of it.
The second part take care of the valid url of what is remained from first part.
Regarding duplicates, I used hashset for that.
The solution does not consider specific tags in the xml or specific contain, it just care about urls in content.
Here is the solution:
HashSet<string> urls = new HashSet<string>();
var beginWith = new Regex(#"\b(?:(http|ftp|https)?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (Match item in beginWith.Matches(input))
{
var endWith = new Regex(#"([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?");
foreach (Match url in endWith.Matches(item.ToString()))
{
urls.Add(url.ToString());
}
}
The code here can in deed be reduced and improved. I leave it for your fantasy.
Here is the final and 5 first urls output of the file:
www.w3.org/2005/Atom
marketplace.xboxlive.com/resource/product/v1
www.xbox.com/live/accounts
download.xbox.com/content/images/66acd000-77fe-1000-9115-d802534307d4/1033/boxartlg.jpg
download.xbox.com/content/images/66acd000-77fe-1000-9115-d802534307d4/1033/boxartsm.jpg
etc.....

Well, just check if your string contain "https://" or "http://", if not, add https:// at the beginning ^^
string url = "";
if (!url.Contains("https://") || !url.Contains("http://"))
{
url.Insert(0, "https://");
}

Regex matching URL's by sub-folder

I am trying to essentially write an outbound URL matcher so I can replace a stream of html containing URL's to point to my CDN. I cant use the IIS URL Rewrite module as I am using compression. I currently have a regex that matches on a sub folder for a specific file type i.e.
Regex ASSET_PATH = new Regex(#"(?i)assets/([A-Za-z0-9\-_/.]+)\.(jpg|jpeg|bmp|tiff|png|gif|js|css|mov|mp4|ogg|avi|mp3)", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase );
This works great and allows me to manipulate anything in the string from that point onwards ( i.e. from "assets/" onwards to the right ). What I need to achieve is to manipulate the string to the left of the "assets/" sub-folder, without necessarily knowing the format? Here are some examples :
<img src="./assets/123/pig.jpg" />
<img src="http://mysite.blah/assets/123/pig.jpg" />
<img src="http://www.mysite.blah/assets/123/pig.jpg" />
<img src='assets/123/pig.jpg' />
in css / inline styles :
background-image : URL('assets/123/pig.jpg')
background-image : URL(http://www.mysite.blah/assets/123/pig.jpg)
anyway, I think you get the picture. I essentially want to be able to look to the "left" of the word "assets" until I can find the logical start point of the url and then manipulate it from there to point to my CDN.
I'm not sure this is possible in regex, so any suggestions using a combination of regex / c# /HTML Agility Pack are welcome

Is this what you're after?
(?<BeforeAssets>.*?(?:\/|^))assets\/(?<AfterAssets>[A-Za-z0-9\-_\/.]+)\.(?<FileExtension>jpg|jpeg|bmp|tiff|png|gif|js|css|mov|mp4|ogg|avi|mp3)
You can try this out here: http://regexstorm.net/tester
Or here: https://regex101.com/r/b8XxcF/1
NB: In the above regex I escaped the forward slash characters. .Net doesn't require this, but doesn't complain; and doing so makes this compatible with other Regex engines; which means it can be tested on Regex101.
When testing with those tools you'll need to specify the MultiLine or SingleLine options to get the example where assets/ has nothing preceding it, since otherwise the ^ character won't match the start of that line. This option may not be required in your code; i.e. if you're only matching one string at a time, rather than a whole block of text.
Update
Apologies for misreading; you're parsing the full HTML page; not just the URIs returned from that page. To do this you could use something like:
["'\(](?<BeforeAssets>[^"'\(\)]*?)assets\/(?<AfterAssets>[A-Za-z0-9\-_\/.]+)\.(?<FileExtension>jpg|jpeg|bmp|tiff|png|gif|js|css|mov|mp4|ogg|avi|mp3)
(thankfully characters ", ', and ( are illegal in the URL, so should be OK to detect the start of a variable: https://www.rfc-editor.org/rfc/rfc3986#section-2.2.)
This isn't fool-proof; it's better to use an HTML parsing tool, then pull out the URIs from that; but if you are doing everything with regex, hopefully this will help.

How work network paths in C#?

I'm having a doubt about this
I tried to save a xmlDocument in a network device folder, not mapping.
Where:
config.plc.Path ="\\IpAdress\\folder\\";
doc.Save(config.plc.Path + "file.xml");
It was throwing an exception, and I just fixed it using the '#'
doc.Save(#config.plc.Path + "file.xml");
When I add the parameter as verbatim string with #, it gets like this:
config.plc.Path ="\\\\IpAdress\\\\folder\\\\";
is the first time I see a path like this, with
\\\\
can someone help me to understand this?

It's simple, \\ is just an escape sequence for \. # (Verbatim String) has to be used to avoid this

string path inside a directory of WPF c# solution

I created a directory inside my WPF solution called Sounds and it holds sound files.(For example: mySound.wav).
Inside my code I use a List and there I have to add to those strings that relate to the sound files. In the beginning I used #"C:..." but I want it to be something like "UNIVERSAL" path. I tried using: "\Sounds\mySound.wav" but it generates an error.
The lines that I use there this directory are:
myList.Add("\Sounds\11000_0.2s.wav");//Error
using (WaveFileReader reader = new WaveFileReader(sourceFile))
where sourceFile is a string which express a path of the file.

Make sure that you check CopyToOutputDir in the properties of the soundfile, that will make sure the file is copied to the location you program runs from.
Also don't use single backslashes in the path since its an escape character.
Instead, do one of the following things:
Use a verbatim string:
#"Sounds\11000_0.2s.wav"
Escape the escape char:
"Sounds\\11000_0.2s.wav"
Use forward slashes:
"Sounds/11000_0.2s.wav"
For more information on string literals check msdn.

You either need to escape the / in the string or add the string literal indicator # at the beginning of the string.
Escape example:
var myFilePath = "c:\\Temp\\MyFile.txt";
String literal example:
var myFilePath = #"c:\Temp\MyFile.txt";

Whitespace terminators & MakePlusRule in Irony

I'm trying to create a fairly simple parser using Irony, but am coming to the conclusion that Irony may not be suitable in this particular case.
These is an example of what I'm trying to parse:
server_name example.com *.example.com www.example.*;
server_name www.example.com ~^www\d+\.example\.com$;
server_name ~^(?<subdomain>.+?)\.(?<domain>.+)$;
I'm using FreeTextLiterals with either a space or semi-colon as a terminator
var serverNamevalue = new FreeTextLiteral("serverNameValue", FreeTextOptions.None, " ", ";");
I'm then using the MakePlusRule to pick up one or more server_name values:
httpCoreServerName.Rule = "server_name" + httpCoreServerNameItems + semicolon;
httpCoreServerNameItems.Rule = MakePlusRule(httpCoreServerNameItems, serverNamevalue);
However - I think there's a problem with having whitespace as a terminator for the FreeTextLiteral in this case. When I run this, I get a parser error. If I substitute the whitespace for another specific character to act as terminator (and also add this a delimiter in the call to MakePlusRule) - it works fine.
Does anyone have any ideas as to how I could deal with this in Irony?

I posted this question over at the Irony project on Codeplex where Roman Ivantsov - the developer of Irony - confirmed there was an issue with the parser when using semi-colons with FreeTextLiterals.
Roman has helpfully fixed / patched this issue. I've dowloaded the latest source and can confirm it's fixed the issue.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.