SSIS Script Task - Convert VARCHAR to NVARCHAR Using C#

SSIS Script Task - Convert VARCHAR to NVARCHAR Using C# - c#

I'm using SQL Server 2014 Enterprise and Visual Studio 2103.
I have hundreds of TSV files that contain foreign characters that I'm importing into SQL Server. I have a SSIS package that automates this (just a script task that I found online that uses C#). Tables are created with NVARCHAR(MAX) datatype for all columns, then each file is read line by line by the script, with the values inserted into the tables.
The source TSV files are exported as Unicode, but SQL Server doesn't seem to care - it imports the files as VARCHAR (i.e., Chinese characters come over as "?????"). If you manually import the file into SQL Server, the code page shows "65001 (UTF-8)" so I'm not sure why the datatypes default to VARCHAR.
Now, I suppose I can configure a DATA CONVERSION TRANSFORM for each of the files, but there are too many files and I'm thinking this can be done on the fly within the script task insert:
SCRIPT TASK:
Some variables for encoding:
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
Encoding utf32 = Encoding.UTF32;
The following part of the script task code is where I try to convert the encoding (the first part of the IF statement (not shown) creates the receiving table). It errors out where indicated:
else
{
//ADJUST FOR SINGLE QUOTES:
line = line.Replace("'", "''");
byte[] unicodeBYTES = unicode.GetBytes(line);
byte[] unicodeCONVERT = Encoding.Convert(unicode, utf8, unicodeBYTES); <--- ERRORS OUT
char[] unicodeCHARS = new char[unicode.GetCharCount(unicodeCONVERT, 0, unicodeCONVERT.Length)];
unicode.GetChars(unicodeCONVERT, 0, unicodeCONVERT.Length, unicodeCHARS, 0);
string NEWline = new string(unicodeCHARS);
string query = "Insert into " + SchemaName + ".[" + TableName + "] (" + ColumnList + ") ";
query += "VALUES('" + NEWline + "')";
// MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, myADONETConnection);
myCommand1.ExecuteNonQuery();
}
However, If I change the line:
byte[] unicodeCONVERT = Encoding.Convert(unicode, utf8, unicodeBYTES);
to the following:
byte[] unicodeCONVERT = Encoding.Convert(unicode, unicode, unicodeBYTES);
It loads the data, but is still in ASCII format (with "?????" characters).
Any help would be appreciated.
Thank you.

Related

StreamReader from .csv - "foreign" chars and blank values showing up as '?'

I'm having two problems with reading my .csv file with streamreader. What I'm trying to do is get the values, put them into variables which I'll be using later on, inputting the values into a browser via Selenium.
Here's my code (the Console.Writeline at the end is just for debugging):
string[] read;
char[] seperators = { ';' };
StreamReader sr = new StreamReader(#"C:\filename.csv", Encoding.Default, true);
string data = sr.ReadLine();
while((data = sr.ReadLine()) != null)
{
read = data.Split(seperators);
string cpr = read[0];
string ydelsesKode = read[1];
string startDato = read[3];
string stopDato = read[4];
string leverandoer = read[5];
string leverandoerAdd = read[6];
Console.WriteLine(cpr + " " + ydelsesKode + " " + startDato + " " + stopDato + " " + leverandoer + " " + leverandoerAdd);
}
The code in and of itself works just fine - but I have two problems:
The file has values in Danish, which means I get åøæ, but they're showing up as '?' in console. In notepad those characters look fine.
Blank values also show up as '?'. Is there any way I can turn them into a blank space so Selenium won't get "confused"?
Sample output:
1372 1.1 01-10-2013 01-10-2013 Bakkev?nget - dagcenter ?
Bakkev?nget should be Bakkevænget and the final '?' should be blank (or rather, a bank space).

"Fixed" it by going with tab delimited unicode .txt file instead of .csv. For some reason my version of excel doesn't have the option to save in unicode .csv...
Don't quite understand the problem of "rolling my own" parser, but maybe someday someone will take the time to explain it to me better. Still new-ish at this c# stuff...

Not able to insert long text in database

I have ckeditor in the application in asp.net application through jquery.
It's working fine at local. Its inserting data in database, does not matter
whatever is the length of the text. But when I insert data on live after
putting the build on the server.
It is not saving the data, when I reduce, its length then it is saving the data.I am inserting this text using WCF service.
Please help in this context.
Looking forward to hear regarding the same.

Oracle DB accepts 4000 characters as a max length of an input string (NOT a datatype).
i.e, you are expected to send at most 4000 chars as a value.
The solution:
if you are using ASP.NET, try to use this function after changing your column type to CLOB so that its size would be up to 4GB:
Public Shared Function AssignStringToCLOB(ByVal targetString As String, ByVal myConnection As OracleConnection) As OracleLob
Dim _tempCommand As New OracleCommand()
_tempCommand.Connection = myConnection
_tempCommand.Transaction = _tempCommand.Connection.BeginTransaction()
_tempCommand.CommandText = "DECLARE A " + OracleType.Clob.ToString() + "; " + "BEGIN " + "DBMS_LOB.CREATETEMPORARY(A, FALSE); " + ":LOC := A; " + "END;"
Dim p As OracleParameter = _tempCommand.Parameters.Add("LOC", OracleType.Clob)
p.Direction = ParameterDirection.Output
_tempCommand.ExecuteNonQuery()
Dim _tempCLOB As OracleLob = CType(p.Value, OracleLob)
If targetString <> String.Empty Then
Dim _bytesArray As Byte() = Text.Encoding.Unicode.GetBytes(targetString)
_tempCLOB.BeginBatch(OracleLobOpenMode.ReadWrite)
_tempCLOB.Write(_bytesArray, 0, _bytesArray.Length)
_tempCLOB.EndBatch()
End If
_tempCommand.Transaction.Commit()
Return _tempCLOB
End Function
and call it after opening the connection with Oracle DB to set a value to your parameter, this should work perfectly.

C#: Convert Japanese text encoding in shift-JIS and stored as ASCII into UTF-8

I am trying to convert an old application that has some strings stored in the database as ASCII.
For example, the string: ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð is stored in the database.
Now, if I copy that string in a text editor and save it as ASCII and then open the file in a web browser and set it to automatically detect the Encoding, I get the correct string in japanese: チャネルパートナーの選択, and the page says that the detected encoding is Japanese (Shift_JIS).
When I try to do the conversion in the C# code doing something like this:
var asciiBytes = Encoding.ASCII.GetBytes(text);
var japaneseEncoding = Encoding.GetEncoding(932);
var convertedBytes = Encoding.Convert(japaneseEncoding, Encoding.ASCII, asciiBytes);
var japaneseString = japaneseEncoding.GetString(convertedBytes);
I get ?`???l???p?[?g?i?[???I?? as the japanese String and thus I cannot show it on the webpage.
Any light would be appreciated.
Thanks

some strings stored in the database as ASCII
It isn't ASCII, about none of the characters in ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð are ASCII. Encoding.ASCII.GetBytes(text) is going to produce a lot of huh? characters, that's why you got all those question marks.
The core issue is that the bytes in the dbase column were read with the wrong encoding. You used code page 1252:
var badstringFromDatabase = "ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð";
var hopefullyRecovered = Encoding.GetEncoding(1252).GetBytes(badstringFromDatabase);
var oughtToBeJapanese = Encoding.GetEncoding(932).GetString(hopefullyRecovered);
Which produces "チャネルパートナーの選択"
This is not going to be completely reliable, code page 1252 has a few unassigned codes that are used in 932. You'll end up with a garbled string from which you cannot recover the original byte value anymore. You'll need to focus on getting the data provider to use the correct encoding.

As per the other answer, I'm pretty sure you're using ANSI/Default encoding not ASCII.
The following examples seem to get you what you're after.
var japaneseEncoding = Encoding.GetEncoding(932);
// From file bytes
var fileBytes = File.ReadAllBytes(#"C:\temp\test.html");
var japaneseTextFromFile = japaneseEncoding.GetString(fileBytes);
japaneseTextFromFile.Dump();
// From string bytes
var textString = "ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð";
var textBytes = Encoding.Default.GetBytes(textString);
var japaneseTextFromString = japaneseEncoding.GetString(textBytes);
japaneseTextFromString.Dump();
Interestingly I think I need to read up on Encoding.Convert as it did not produce the behaviour I expected. The GetString methods seem to only work if I pass in bytes read in the Encoding.Default format - if I convert to the Japanese encoding beforehand they do not work as expected.

This code will dump a bunch of different options out so you can see what's close.
I use this a lot for comments in old applications that don't have any encoding awareness.
You can copy-paste to run it online here:
https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding.getencodings?view=netframework-4.8#System_Text_Encoding_GetEncodings
using System;
public class Program
{
public static void Main()
{
var badstringFromDatabase = "ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð";
var recovered1 = System.Text.Encoding.GetEncoding(932).GetBytes(badstringFromDatabase); //Shift JIS
var recovered2 = System.Text.Encoding.GetEncoding(20932).GetBytes(badstringFromDatabase); //EUC
var recovered3 = System.Text.Encoding.GetEncoding(51932).GetBytes(badstringFromDatabase); //EUC
var recovered4 = System.Text.Encoding.GetEncoding(50220).GetBytes(badstringFromDatabase); //ISO-2022-JP
var recovered5 = System.Text.Encoding.GetEncoding(50221).GetBytes(badstringFromDatabase); //ISO-2022-JP
var recovered6 = System.Text.Encoding.GetEncoding(50222).GetBytes(badstringFromDatabase); //ISO-2022-JP
var recovered7 = System.Text.Encoding.GetEncoding(65001).GetBytes(badstringFromDatabase); //UTF-8
var recovered8 = System.Text.Encoding.GetEncoding(1200).GetBytes(badstringFromDatabase); //UTF-16
var recovered9 = System.Text.Encoding.GetEncoding(12000).GetBytes(badstringFromDatabase); //UTF-32
var recovered10 = System.Text.Encoding.GetEncoding(12001).GetBytes(badstringFromDatabase); //UTF-32BE
var recovered11 = System.Text.Encoding.GetEncoding(65000).GetBytes(badstringFromDatabase); //UTF-7
Console.WriteLine("Shift JIS: " + System.Text.Encoding.GetEncoding(932).GetString(recovered1)); //Shift JIS
Console.WriteLine("EUC: " + System.Text.Encoding.GetEncoding(932).GetString(recovered2)); //EUC
Console.WriteLine("EUC: " + System.Text.Encoding.GetEncoding(932).GetString(recovered3)); //EUC
Console.WriteLine("ISO-2022-JP: " + System.Text.Encoding.GetEncoding(932).GetString(recovered4)); //ISO-2022-JP
Console.WriteLine("ISO-2022-JP: " + System.Text.Encoding.GetEncoding(932).GetString(recovered5)); //ISO-2022-JP
Console.WriteLine("ISO-2022-JP: " + System.Text.Encoding.GetEncoding(932).GetString(recovered6)); //ISO-2022-JP
Console.WriteLine("UTF-8: " + System.Text.Encoding.GetEncoding(932).GetString(recovered7)); //UTF-8
Console.WriteLine("UTF-16: " + System.Text.Encoding.GetEncoding(932).GetString(recovered8)); //UTF-16
Console.WriteLine("UTF-32: " + System.Text.Encoding.GetEncoding(932).GetString(recovered9)); //UTF-32
Console.WriteLine("UTF-32BE: " + System.Text.Encoding.GetEncoding(932).GetString(recovered10)); //UTF-32BE
Console.WriteLine("UTF-7: " + System.Text.Encoding.GetEncoding(932).GetString(recovered11)); //UTF-7
}
}

Write Image to SQL Server using Blob Literal String in C#

I have a c# function that needs to write to a SQL image column. I have the image in a byte array. The standard seems to be the use of SqlCommand while passing the byte array as a parameter of type System.Data.SqlDbType.Image.
Unfotunately, my function can only use text queries (don't ask why) so I have to find a way to use T-SQL commands only. What I have so far can write to the column but I don't know what format of string to make the image blob string.
sql = "DECLARE #ptrval binary(16)" +
"SELECT #ptrval = textptr(Photo) FROM EMPhoto WHERE Employee='" + employeeID + "'" +
"WRITETEXT EMPhoto.Photo #ptrval " + imageByteArrayAsString;
I've tried converting imageByteArray to a Hex string and Binary string but it doesn't seem to end up correct in SQL or in the application that reads it.

A T-SQL Binary constant is an unquoted hexidecimal string prefixed with 0x. ie 0xFFD8FFE0...
string imageByteArrayAsString = "0x" + BitConverter.ToString(image).Replace("-", string.Empty);

writing long text in excel workbook using interop throws error?

I am writing long text (1K to 2K characters long, plain xml data) into a cell in excel workbook.
The below statement throws COM error Exception from HRESULT: 0x800A03EC
range.set_Value(Type.Missing, data);
If I copy paste the same xml manually into excel it just works fine ,but the same does not work progamatically.
If I strip the text to something like 100/300 chars it works fine.

There is a limit (somehwere between 800 and 900 chars if i remember correctly) that is nearly impossible to get around like this.
Try using an ole connection and inserting the data with an SQL command. That might work better for you. you can then use interop to do any formatting if necessary.

the following KB article explains that the max limit is 911 characters. I checked the same on my code it does work for string upto 911 chars.
http://support.microsoft.com/kb/818808
The work around mentioned in this article recommends to make sure no cell holds more than 911 characters. thats lame!

Good Ole and excel article: http://support.microsoft.com/kb/316934
The following code updates a private variable that is the number of successful rows and returns a string which is the path to the excel file.
Remember to use Path from System.IO;!
string tempXlsFilePathName;
string result = new string;
string sheetName;
string queryString;
int successCounter;
// set sheetName and queryString
sheetName = "sheetName";
queryString = "CREATE TABLE " + sheetName + "([columnTitle] char(255))";
// Write .xls
successCounter = 0;
tempXlsFilePathName = (_tempXlsFilePath + #"\literalFilename.xls");
using (OleDbConnection connection = new OleDbConnection(GetConnectionString(tempXlsFilePathName)))
{
OleDbCommand command = new OleDbCommand(queryString, connection);
connection.Open();
command.ExecuteNonQuery();
yourCollection.ForEach(dataItem=>
{
string SQL = "INSERT INTO [" + sheetName + "$] VALUES ('" + dataItem.ToString() + "')";
OleDbCommand updateCommand = new OleDbCommand(SQL, connection);
updateCommand.ExecuteNonQuery();
successCounter++;
}
);
// update result with successfully written username filepath
result = tempXlsFilePathName;
}
_successfulRowsCount = successCounter;
return result;
N.B. This was edited in a hurry, so may contain some mistakes.

To solve this limitation, only writing/updating one cell at a time and dispose the Excel com object immediately. And recreate the object again for writing/updating the next cell.
I can confirm this solution is working in VS2010 (VB.NET project) with Microsoft Excel 10.0 Object Library (Microsoft Office XP)

This limitation is supposed to have been removed in Excel 2007/2010. Using VBA the following works
Sub longstr()
Dim str1 As String
Dim str2 As String
Dim j As Long
For j = 1 To 2000
str1 = str1 & "a"
Next j
Range("a1:a5").Value2 = str1
str2 = Range("a5").Value2
MsgBox Len(str2)
End Sub

I'll start by saying I haven't tried this myself, but my research says that you can use QueryTables to overcome the 911 character limitation.
This is the primary post I found which talks about using a record set as the data source for a QueryTable and adding it to a spreadsheet: http://www.excelforum.com/showthread.php?t=556493&p=1695670&viewfull=1#post1695670.
Here is some sample C# code of using QueryTables: import txt files using excel interop in C# (QueryTables.Add).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

SSIS Script Task - Convert VARCHAR to NVARCHAR Using C# - c#

Related

StreamReader from .csv - "foreign" chars and blank values showing up as '?'

Not able to insert long text in database

C#: Convert Japanese text encoding in shift-JIS and stored as ASCII into UTF-8

Write Image to SQL Server using Blob Literal String in C#

writing long text in excel workbook using interop throws error?

Categories

Resources