Multi Substring from long string - c#

I have a long string I need to take out only substrings that are between { and }, and turn it into a Json object
This string
sys=t85,fggh{"Name":"5038.zip","Folder":"Root",,"Download":"services/DownloadFile.ashx?"} dsdfg x=565,dfg
{"Name":"5038.zip","Folder":"Root",,"Download":"services/DownloadFile.ashx?"}dfsdfg567
{"Name":"5038.zip","Folder":"Root",,"Download":"services/DownloadFile.ashx?"}sdfs
I have trash inside so I need to extract the substring of the data between { and }
My code is here, but I'm stuck, I can't remove the data that I already taken.
List<JsonTypeFile> AllFiles = new List<JsonTypeFile>();
int lenght = -1;
while (temp.Length>3)
{
lenght = temp.IndexOf("}") - temp.IndexOf("{");
temp=temp.Substring(temp.IndexOf("{"), lenght+1);
temp.Remove(temp.IndexOf("{"), lenght + 1);
var result = JsonConvert.DeserializeObject<SnSafe.JsonTypeFile>(temp);
AllFiles.Add(result);
}

Or using regex you can get the strings like this:
var regex = new Regex("{([^}]*)}");
var matches = regex.Matches(str);
var list = (from object m in matches select m.ToString().Replace("{",string.Empty).Replace("}",string.Empty)).ToList();
var jsonList = JsonConvert.SerializeObject(list);
The str variable containing your string as you provided in your question.

You can use a regex for this but what I would do is use .split ('{') to split into sections, skip the first section, and then using .split('}) to find the first portion of each section.
You can do this using LINQ
var data = temp
.Split('{')
.Skip(1)
.Select(v => v.Split('}').FirstOrDefault());

If I understand correctly, you just want to extract anything in-between the braces and ignore anything else.
The following regular expression should allow you to extract that info:
{[^}]*} (a brace, followed by anything that isn't a brace, followed by a brace)
You can extract all instances and then deserialize them using something along the lines of:
using System.Text.RegularExpressions;
...
List<JsonTypeFile> AllFiles = new List<JsonTypeFile>();
foreach(Match match in Regex.Matches(temp, "{[^}]*}"))
{
var result = JsonConvert.DeserializeObject<SnSafe.JsonTypeFile>(match.Value);
AllFiles.Add(result);
}

Related

Replacing first part of string by another

I need to replace multiple file names in a folder. Here is one of the files:
Abc.CDE.EFG
I need to replace the first part of the string before the dot ("ABC") and replace it with: "zef".
Any ideas? I found this but it takes out the dot and not sure how to add the "zef".
var input = _FileInfo.ToString();
var output = input.Substring(input.IndexOf(".").Trim())
Since the question is tagged with regex, you can use a regular expression like so:
var input = "abc.def.efg";
var pattern = "^[^\\.]+";
var replacement = "zef";
var rgx = new Regex(pattern);
var output = rgx.Replace(input, replacement);
Source: https://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.110).aspx
You are almost there, try:
string myString = "Abc.CDE.EFG";
//This splits your string into an array with 3 items
//"Abc", "CDE" and "EFG"
var stringArray = myString.Split('.');
//Now modify the first item by changing it to "zef"
stringArray[0] = "zef";
//Then we rebuild the string by joining the array together
//delimiting each group by a period
string newString = string.Join(".", stringArray);
With this solution you can independently access any of the "blocks" just by referencing the array by index.
Fiddle here
Try this:
var input = _FileInfo.ToString();
var output = "zef" + input.Substring(input.IndexOf("."));
If you know the length of the first string , you can replace mentioning number of characters starting from position until the length you want to replace else.
string s = "Abc.CDE.EFG";
string [] n = s.Split('.');
n[0] = "ZEF";
string p = string.Join(".",n);
Console.WriteLine(p);
}

Regular Expression - Get partial string

I have a list of project names that I need some matching on.The list of projects could look something like this:
suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada
etc
If the searched for project is suzu, I'd like to have the following result from the list:
suzu
suzu-domestic
suzu-international
but not anything containing suzuran. I also like to have the following match if the search for project is suzuran
suzuran
suzuran-international
but not anything containing suzu.
In C# code I have something that looks like similar to this:
String searchForProject = "suzu";
String regStr = #"THE_REGEX_GOES_HERE"; // The regStr will be in a config file
List<Project> projects = DataWrapper.GetAllProjects();
Regex regEx = new Regex(String.Format(regStr, searchForProject));
result = new List<Project>();
foreach (Project proj in projects)
{
if (regEx.IsMatch(proj.ProjectName))
{
result.Add(proj);
}
}
The question is, can I have a regexp that will enable me to get match on all exact project names, but not the ones that would get returned by a startWith equivalent?
(Today I have a regStr = #"^({0})#", but this does not satisfy the above scenario since it gives more hits than it should)
I'd appreciate if someone can give me a hint in the right direction. Thanks !
Magnus
All you need is actually
var regStr = #"^{0}\b";
The ^ anchor asserts the position at the beginning of string.
The \b pattern matches a location between a word and a non-word character, the start or end of string. You do not need to match the rest of string with .* since you are using Regex.IsMatch, it is a redundant overhead.
C# test code:
var projects = new List<string>() { "suzu", "suzu-domestic", "suzu-international", "suzuran", "suzuran-international", "scorpion", "scorpion-default", "yada", "yada-yada" };
var searchForProject = "suzu";
var regStr = #"^{0}\b"; // The regStr will be in a config file
var regEx = new Regex(String.Format(regStr, searchForProject));
var result = new List<string>();
foreach (var proj in projects)
{
if (regEx.IsMatch(proj))
{
result.Add(proj);
}
}
The foreach may be replaced with a shorter LINQ:
var result = projects.Where(s => regEx.IsMatch(s)).ToList();
You can use a regex like this:
^suzu\b.*
Working demo
If you want suzuran just use:
^suzuran\b.*
You can use "\b{0}\b.*" if you want the match anywhere in the string (but not in the middle of a word), or "^{0}\b.*" if you only want it at the start.
See a regexstorm sample.
If you want an elegant solution in one line with Linq and without regex, you can check this working solution (Demo on .NETFiddle) :
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public void Main()
{
string input = "suzu";
string s = #"suzu
suzu-domestic
suzu-international
suzuran
suzuran-international
scorpion
scorpion-default
yada
yada-yada";
foreach (var line in ExtractLines(s, input))
Console.WriteLine(line);
}
// works if "-" is your delimiter.
IEnumerable<string> ExtractLines(string lines, string input)
{
return from line in lines.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries) // use to split your string by line
let cleanLine = line.Contains("-") ? line.Split('-')[0] : line // use only the needed part
where cleanLine.Equals(input) // check if the output match with the input
select line; // return the valid line
}
}
With negative lookahead:
suzu(?!.*ran).*\b
This also uses \b for a word break

Position of and length in regular expression

I have text like this:
This is a sample {text}. I want to inform my {Dada} that I have some
data which is {not useful}. So I need data to start by { and ends with
}. This data needs to {find out}.
Total text have some substrings separated within curly braces {}. How can I find the starting position and length of the substring starting with { and ending with }? Further, I will replace the substring with the processed string.
With Regex.Match, you can check the index of each match by accessing the Index property, and the length of each match by checking the Length property.
If you want to count the curly braces in, you can use \{(.*?)\} regex, like this:
var txt = "This is a sample {text}. I want to inform my {Dada} that I have some data which is {not useful}. So I need data to start by { and ends with }. This data needs to {find out}.";
var rgx1 = new Regex(#"\{(.*?)\}");
var matchees = rgx1.Matches(txt);
// Get the 1st capure groups
var all_matches = matchees.Cast<Match>().Select(p => p.Groups[1].Value).ToList();
// Get the indexes of the matches
var idxs = matchees.Cast<Match>().Select(p => p.Index).ToList();
// Get the lengths of the matches
var lens = matchees.Cast<Match>().Select(p => p.Length).ToList();
Outputs:
Perhaps, you will want to use a dictionary with search and replace terms, and that will be more effecient:
var dic = new Dictionary<string, string>();
dic.Add("old", "new");
var ttxt = "My {old} car";
// And then use the keys to replace with the values
var output = rgx1.Replace(ttxt, match => dic[match.Groups[1].Value]);
Output:
If you know you will not have nested curly braces, you can use the following:
var input = #"This is a sample {text}. I want to inform my {Dada} that I have some data which is {not useful}. So I need data to start by { and ends with }. This data needs to {find out}."
var pattern = #"{[^]*}"
foreach (Match match in Regex.Matches(input, pattern)) {
string subString = match.Groups(1).Value;
int start = match.Groups(1).Index;
int length = match.Groups(1).Length;
}

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?
This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];
What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.
Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;
Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.
How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "
Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

Remove String After Determinate String

I need to remove certain strings after another string within a piece of text.
I have a text file with some URLs and after the URL there is the RESULT of an operation. I need to remove the RESULT of the operation and leave only the URL.
Example of text:
http://website1.com/something Result: OK(registering only mode is on)
http://website2.com/something Result: Problems registered 100% (SOMETHING ELSE) Other Strings;
http://website3.com/something Result: error: "Âíèìàíèå, îáíàðóæåíà îøèáêà - Ìåñòî æèòåëüñòâà ñîäåðæèò íåäîïóñòèìûå ê
I need to remove all strings starting from Result: so the remaining strings have to be:
http://website1.com/something
http://website2.com/something
http://website3.com/something
Without Result: ........
The results are generated randomly so I don't know exactly what there is after RESULT:
One option is to use regular expressions as per some other answers. Another is just IndexOf followed by Substring:
int resultIndex = text.IndexOf("Result:");
if (resultIndex != -1)
{
text = text.Substring(0, resultIndex);
}
Personally I tend to find that if I can get away with just a couple of very simple and easy to understand string operations, I find that easier to get right than using regex. Once you start going into real patterns (at least 3 of these, then one of those) then regexes become a lot more useful, of course.
string input = "Action2 Result: Problems registered 100% (SOMETHING ELSE) Other Strings; ";
string pattern = "^(Action[0-9]*) (.*)$";
string replacement = "$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
You use $1 to keep the match ActionXX.
Use Regex for this.
Example:
var r = new System.Text.RegularExpressions.Regex("Result:(.)*");
var result = r.Replace("Action Result:1231231", "");
Then you will have "Action" in the result.
You can try with this code - by using string.Replace
var pattern = "Result:";
var lineContainYourValue = "jdfhkjsdfhsdf Result:ljksdfljh"; //I want replace test
lineContainYourValue.Replace(pattern,"");
Something along the lines of this perhaps?
string line;
using ( var reader = new StreamReader ( File.Open ( #"C:\temp\test.txt", FileMode.Open ) ) )
using ( var sw = new StreamWriter(File.Open( #"C:\Temp\test.edited.txt", FileMode.CreateNew ) ))
while ( (line = reader.ReadLine()) != null )
if(!line.StartsWith("Result:")) sw.WriteLine(line);
You can use RegEx for this kind of processing.
using System.Text.RegularExpressions;
private string ParseString(string originalString)
{
string pattern = ".*(?=Result:.*)";
Match match = Regex.Match(originalString, pattern);
return match.Value;
}
A Linq approach:
IEnumerable<String> result = System.IO.File
.ReadLines(path)
.Where(l => l.StartsWith("Action") && l.Contains("Result"))
.Select(l => l.Substring(0, l.IndexOf("Result")));
Given your current example, where you want only the website, regex match the spaces.
var fileLine = "http://example.com/sub/ random text";
Regex regexPattern = new Regex("(.*?)\\s");
var websiteMatch = regexPattern.Match(fileLine).Groups[1].ToString();
Debug.Print("!" + websiteMatch + "!");
Repeating for each line in your text file. Regex explained: .* matches anything, ? makes the match ungreedy, (brackets) puts the match into a group, \\s matches whitespace.

Categories

Resources