How can I rearrange string with SQL? - c#

Declare #CustTotalCount as int
Declare #CustMatchCount as int
select #CustTotalCount = count(*) from ENG_CUSTOMERTALLY
select #CustMatchCount = count(*) from Task where MPDReference in(
select ENG_CUSTOMERTALLY_CUSTOMERTASKNUMBER from dbo.ENG_CUSTOMERTALLY)
if(#CustTotalCount>#CustMatchCount)
select distinct
substring(ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO, charindex('-', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
+ 1, 1000)
from dbo.ENG_CUSTOMERMYCROSS where
ENG_CUSTOMERMYCROSS_CUSTOMER_NUMBER in(
select ENG_CUSTOMERTALLY_CUSTOMERTASKNUMBER from ENG_CUSTOMERTALLY1
except
select MPDReference from Task )
I can convert
- A320-200001-01-1(1)
- A320-200001-02-1(2)
- A320-200001-01-1(2)
- A320-200001-01-1(1)
- A320-200001-01-1(2)
- A320-200001-02-1(1)
TO
- 200001-01-1(1)
- 200001-02-1(2)
- 200001-01-1(2)
- 200001-01-1(1)
- 200001-01-1(2)
- 200001-02-1(1)
But I need to :
- 200001-01-1
- 200001-02-1
- 200001-01-1
- 200001-01-1
- 200001-01-1
- 200001-02-1
How can I do that in SQL and C#?

Is the pattern always the same, if so you could just use SUBSTRING to pull out the bit you want.
EDIT: To take in additional stuff asked in How can i use substring in SQL?
You could
SELECT DISTINCT SUBSTRING(....) FROM ...

as answered above, use the SUBSTRING method like you are but use a length of 11 instead of 1000 as long as the data is always in the format you show above.
In C# it would be:
string s = "A320-20001-01-1(1)";
string result = s.Substring(s.IndexOf('-'), 11);
again this is assuming the part you want is always 11 characters. Otherwise if it is always the first '(' you want to end before, you the IndexOf method/function again to find the end index and subtract the first index

Try substring and len, this sample cuts first 6 and last 4 (4 = 10-6) chars
declare #var varchar(50)
set #var = 'A320-200001-01-1(1)
select substring(#var, 6, len(#var) - 10)
output: 200001-01
In c#, functions are similar, exept zero-based index:
string var = "A320-200001-01-1(1)";
var = var.Substring(5, var.Length - 8);
Console.WriteLine(var);

Here's a technique that uses PATINDEX, which can use wild cards.
SUBSTRING(ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO,
PATINDEX('%[0-9]%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO),
PATINDEX('%(%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
- PATINDEX('%[0-9]%', ENG_CUSTOMERMYCROSS_MYTECHNIC_TASK_NO)
)
The start for your substring is the position of the first numeric value (%[0-9]%). The length value is the position of the first parenthesis ('%(%') less the starting position.

Related

Get the value using substr and instr in oracle

I have following value in the table.
aaaaaa 26G 2.0G 23G 8 tmp
tmpfs 506M 0 506M 0 /dev/shm
I need to store first value that is ('aaaaaa' and'tmpfs') and second value (26 and 506) in another table. I got first value by
CAST(substr(COL_1,1,InStr(COL_1,' ')-1) AS VARCHAR2(10)) col
How do I get the second value such as 26 and 506 using substring and instring.?
I would recommend regexp_substr():
select regexp_substr(col1, '[^ ]+ ', 1, 1) as first,
regexp_substr(col1, '[^ ]+ ', 1, 2) as second
This returns the value with a space at the end. I think the pattern works without the space, because regular expression matching is greedy in Oracle:
select regexp_substr(col1, '[^ ]+', 1, 1) as first,
regexp_substr(col1, '[^ ]+', 1, 2) as second
There is an optional argument to instr where you can specify the nth occurrence of a specific string being searched.
CAST(substr(COL_1,InStr(COL_1,' ',1,1)+1,InStr(COL_1,' ',1,2)-InStr(COL_1,' ',1,1)-1) AS VARCHAR2(10))
To only extract the number from this substring, use regexp_substr. This assumes letters always follow one or more numeric characters.
regexp_substr(CAST(substr(COL_1,InStr(COL_1,' ',1,1)+1,InStr(COL_1,' ',1,2)-InStr(COL_1,' ',1,1)-1) AS VARCHAR2(10)),'\d+')

Efficient SQL Bucket Sort based on Length of Substring Match

Given a SQL database table containing strings indexed alphabetically, how might I perform a search query that orders by substring match?
For example, given the data set:
bad
banana
bandana
banker
bed
brother
And the search string band, I would expect the results ordered as follows
bandana (index 0-3 matched)
banana (index 0-2 matched)
banker
bad (index 0-1 matched)
bed (index 0 matched)
brother
Note that we only care about the length of the substring matched. The matches that fall into each bucket don't have to be sorted alphabetically, I only care about the bucket they fall into.
So I guess naively the problem involves:
Seeing the length of substring match against my input for each row
Putting each row into the appropriate bucket based on the match length
Ordering the buckets in a descending order, ie (4 chars matched, 3 chars matched, 2..)
But this sounds expensive, so how could I implement this in SQL or C#, and do it efficiently?
Is there a similar problem/pattern I could benefit from here?
Many thanks
Not sure if it is the most efficient way but.
Using a numbers table, split the strings into chars and join this to a split of the search string then just order by count and the string.
DECLARE #t TABLE ( string VARCHAR(50) )
INSERT INTO #t (string)
VALUES
('bad'),
('banana'),
('bandana'),
('banker'),
('bed'),
('brother')
DECLARE #search VARCHAR(50) = 'band'
;WITH numbers AS
(
SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY t1.number) AS n
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
)
SELECT string
FROM #t t
CROSS APPLY (
SELECT SUBSTRING(t.string, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(string)
) s1
JOIN (
SELECT SUBSTRING(#search, numbers.n, 1) c, n
FROM numbers
WHERE numbers.n <= LEN(#search)
) s2 ON s2.c = s1.c
AND s2.n = s1.n
GROUP BY string
ORDER BY COUNT(1) DESC, string
demo
String operations and sql-server is not the best match afaik.
My best bet would be to try a modified version of the Bayer-Moore-horspool to find the number of matching characters. However, on a miss you wouldn't skip the full word length, only the length of the maximum match. Then simply insert into the appriate bucket.

How can I get the IndexOf() method to return the correct values?

I have been working with googlemaps and i am now looking to format coordinates.
I get the coordinates in the following format:
Address(coordinates)zoomlevel.
I use the indexof method to get the start of "(" +1 so that i get the first number of the coordinate and store this value in a variable that i call "start".
I then do them same thing but this time i get the index of ")" -2 to get the last number of the last coordinate and store this value in a variable that i call "end".
I get the following error:
"Index and length must refer to a location within the string.Parameter name: length"
I get the following string as an imparameter:
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
but for some reason i get the values 41 in start and 71 in end.
why?
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start,end);
return formated;
}
I then tried hardcoding the correct values
string Test = cord.Substring(36,65);
I then get the following error:
startindex cannot be larger than length of string. parameter name startindex
I understand what both of the errors mean but in this case they are incorrect since im not going beyond the strings length value.
Thanks!
The second parameter of Substring is a length (MSDN source). Since you are passing in 65 for the second parameter, your call is trying to get the characters between 36 and 101 (36+65). Your string does not have 101 characters in it, so that error is thrown. To get the data between the ( characters, use this:
public string RemoveParantheses(string coord)
{
int start = coord.IndexOf("(")+1;
int end = coord.IndexOf(")")-2;
string formated = coord.Substring(start, end - start);
return formated;
}
Edit: The reason it worked with only the coordinates, was because the length of the total string was shorter, and since the coordinates started at the first position, the end coordinate was the last position. For example...
//using "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5"
int start = coord.IndexOf("(") + 1; // 36
int end = coord.IndexOf(")")-2; // 65
coord.Substring(start, end); //looks at characters 35 through 101
//using (61.9593214318303,14.0585965625)5
int start = coord.IndexOf("(") + 1; // 1
int end = coord.IndexOf(")")-2; // 30
coord.Substring(start, end); //looks at characters 1 through 31
The second instance was valid because 31 actually existed in your string. Once you added the address to the beginning of the string, your code would no longer work.
Extracting parts of a string is a good use for regular expressions:
var match = Regex.Match(locationString, #"\((?<lat>[\d\.]+),(?<long>[\d\.]+)\)");
string latitude = match.Groups["lat"].Value;
string longitude = match.Groups["long"].Value;
You probably forgot to count newlines and other whitespaces, a \r\n newline is 2 "invisible" characters. The other mistake is that you are calling Substring with (Start, End) while its (Start, Count) or (Start, End - Start)
by my calculations i should get the value 36 in the start variable and the value 65 in the end variable
Then your calculations are wrong. With the string above I also see (and LinqPad confirms) that the open paren is at position 42 and the close paren is at index 73.
The error you're getting when using Substring is becuase the parameters to Substring are a beginning position and the length, not the ending position, so you should be using:
string formated = coord.Substring(start,(end-start+1));
That overload of Substring() takes two parameters, start index and a length. You've provided the second value as the index of the occurance of ) when really you want to get the length of the string you wish to trim, in this case you could subtract the index of ) from the index of (. For example: -
string foo = "Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
int start = foo.IndexOf("(") + 1;
int end = foo.IndexOf(")");
Console.Write(foo.Substring(start, end - start));
Console.Read();
Alternatively, you could parse the string using a regular expression, for example: -
Match r = Regex.Match(foo, #"\(([^)]*)\)");
Console.Write(r.Groups[1].Value);
Which will probably perform a little better than the previous example
string input =
"Loddvägen 155, 840 80 Lillhärdal, Sverige (61.9593214318303,14.0585965625)5";
var groups = Regex.Match(input,
#"\(([\d\.]+),([\d\.]+)\)(\d{1,2})").Groups;
var lat = groups[1].Value;
var lon = groups[2].Value;
var zoom = groups[3].Value;

LINQ-to-SQL orderby question

I have a LINQ-to-SQL query, and I order on an nvarchar field called CustomerReference. The problem is, reference's that start with a capital letter seem to be after ones without capitals, when I need this the other way around. For example, if I have the following rows:
d93838
D98484
It is currently ordered in that sequence right now, however I need it reversed - so it'd be like this
D98484
d93838
Any ideas guys? Thanks
This assumes the Format [A-Za-z]\d+ and will put b3432 before C1234 but after B9999
list.OrderBy (l => l.CustomerReference.Substring(0,1).ToLower())
.ThenByDescending(l =>l.CustomerReference.Substring(0,1).ToUpper()==l.CustomerReference.Substring(0,1))
.ThenBy (l =>l.CustomerReference )
EDIT: I was asked for the SQL too so this is what LINQPad does
-- Region Parameters
DECLARE #p0 Int SET #p0 = 0
DECLARE #p1 Int SET #p1 = 1
DECLARE #p2 Int SET #p2 = 0
DECLARE #p3 Int SET #p3 = 1
DECLARE #p4 Int SET #p4 = 0
DECLARE #p5 Int SET #p5 = 1
-- EndRegion
SELECT [T0].CustomerReference FROM [dbo].[test] AS [t0]
ORDER BY LOWER(SUBSTRING([t0].[CustomerReference], #p0 + 1, #p1)),
(CASE
WHEN UPPER(SUBSTRING([t0].[CustomerReference], #p2 + 1, #p3)) = SUBSTRING([t0].[CustomerReference], #p4 + 1, #p5) THEN 1
WHEN NOT (UPPER(SUBSTRING([t0].[CustomerReference], #p2 + 1, #p3)) = SUBSTRING([t0].[CustomerReference], #p4 + 1, #p5)) THEN 0
ELSE NULL
END) DESC, [t0].[CustomerReference]
In most implementations, lower-case comes first (not least, that is how code-points are arranged ordinally). You won't be able to get SQL server to change that, so the next best thing is to bring it back unsorted, and write a custom comparer. Note that the inbuilt .NET comparers will also treat lower-case as either first or equal (compared to their upper-case equivalent), depending on the comparer.
However! Unless you limit yourself to very simple examples (ASCII etc), ordering "alike" characters is a very non-trivial exercise. Even if we ignore the Turkish I / İ / ı / i, accented characters are going to cause you problems).

How does this regex find triangular numbers?

Part of a series of educational regex articles, this is a gentle introduction to the concept of nested references.
The first few triangular numbers are:
1 = 1
3 = 1 + 2
6 = 1 + 2 + 3
10 = 1 + 2 + 3 + 4
15 = 1 + 2 + 3 + 4 + 5
There are many ways to check if a number is triangular. There's this interesting technique that uses regular expressions as follows:
Given n, we first create a string of length n filled with the same character
We then match this string against the pattern ^(\1.|^.)+$
n is triangular if and only if this pattern matches the string
Here are some snippets to show that this works in several languages:
PHP (on ideone.com)
$r = '/^(\1.|^.)+$/';
foreach (range(0,50) as $n) {
if (preg_match($r, str_repeat('o', $n))) {
print("$n ");
}
}
Java (on ideone.com)
for (int n = 0; n <= 50; n++) {
String s = new String(new char[n]);
if (s.matches("(\\1.|^.)+")) {
System.out.print(n + " ");
}
}
C# (on ideone.com)
Regex r = new Regex(#"^(\1.|^.)+$");
for (int n = 0; n <= 50; n++) {
if (r.IsMatch("".PadLeft(n))) {
Console.Write("{0} ", n);
}
}
So this regex seems to work, but can someone explain how?
Similar questions
How to determine if a number is a prime with regex?
Explanation
Here's a schematic breakdown of the pattern:
from beginning…
| …to end
| |
^(\1.|^.)+$
\______/|___match
group 1 one-or-more times
The (…) brackets define capturing group 1, and this group is matched repeatedly with +. This subpattern is anchored with ^ and $ to see if it can match the entire string.
Group 1 tries to match this|that alternates:
\1., that is, what group 1 matched (self reference!), plus one of "any" character,
or ^., that is, just "any" one character at the beginning
Note that in group 1, we have a reference to what group 1 matched! This is a nested/self reference, and is the main idea introduced in this example. Keep in mind that when a capturing group is repeated, generally it only keeps the last capture, so the self reference in this case essentially says:
"Try to match what I matched last time, plus one more. That's what I'll match this time."
Similar to a recursion, there has to be a "base case" with self references. At the first iteration of the +, group 1 had not captured anything yet (which is NOT the same as saying that it starts off with an empty string). Hence the second alternation is introduced, as a way to "initialize" group 1, which is that it's allowed to capture one character when it's at the beginning of the string.
So as it is repeated with +, group 1 first tries to match 1 character, then 2, then 3, then 4, etc. The sum of these numbers is a triangular number.
Further explorations
Note that for simplification, we used strings that consists of the same repeating character as our input. Now that we know how this pattern works, we can see that this pattern can also match strings like "1121231234", "aababc", etc.
Note also that if we find that n is a triangular number, i.e. n = 1 + 2 + … + k, the length of the string captured by group 1 at the end will be k.
Both of these points are shown in the following C# snippet (also seen on ideone.com):
Regex r = new Regex(#"^(\1.|^.)+$");
Console.WriteLine(r.IsMatch("aababc")); // True
Console.WriteLine(r.IsMatch("1121231234")); // True
Console.WriteLine(r.IsMatch("iLoveRegEx")); // False
for (int n = 0; n <= 50; n++) {
Match m = r.Match("".PadLeft(n));
if (m.Success) {
Console.WriteLine("{0} = sum(1..{1})", n, m.Groups[1].Length);
}
}
// 1 = sum(1..1)
// 3 = sum(1..2)
// 6 = sum(1..3)
// 10 = sum(1..4)
// 15 = sum(1..5)
// 21 = sum(1..6)
// 28 = sum(1..7)
// 36 = sum(1..8)
// 45 = sum(1..9)
Flavor notes
Not all flavors support nested references. Always familiarize yourself with the quirks of the flavor that you're working with (and consequently, it almost always helps to provide this information whenever you're asking regex-related questions).
In most flavors, the standard regex matching mechanism tries to see if a pattern can match any part of the input string (possibly, but not necessarily, the entire input). This means that you should remember to always anchor your pattern with ^ and $ whenever necessary.
Java is slightly different in that String.matches, Pattern.matches and Matcher.matches attempt to match a pattern against the entire input string. This is why the anchors can be omitted in the above snippet.
Note that in other contexts, you may need to use \A and \Z anchors instead. For example, in multiline mode, ^ and $ match the beginning and end of each line in the input.
One last thing is that in .NET regex, you CAN actually get all the intermediate captures made by a repeated capturing group. In most flavors, you can't: all intermediate captures are lost and you only get to keep the last.
Related questions
(Java) method matches not work well - with examples on how to do prefix/suffix/infix matching
Is there a regex flavor that allows me to count the number of repetitions matched by * and + (.NET!)
Bonus material: Using regex to find power of twos!!!
With very slight modification, you can use the same techniques presented here to find power of twos.
Here's the basic mathematical property that you want to take advantage of:
1 = 1
2 = (1) + 1
4 = (1+2) + 1
8 = (1+2+4) + 1
16 = (1+2+4+8) + 1
32 = (1+2+4+8+16) + 1
The solution is given below (but do try to solve it yourself first!!!!)
(see on ideone.com in PHP, Java, and C#):
^(\1\1|^.)*.$

Categories

Resources