Reading a large CSV file and processing in C#. Any suggestions? - c#

I have a large CSV file around 25G. I need to parse each line which has around 10 columns and do some processing and finally save it to a new file with parsed data.
I am using dictionary as my datastructure. To avoid the memory overflow I am writing the file after 500,000 records and clearing the dictionary.
Can anyone suggest whether is this good way of doing. If not, any other better way of doing this? Right now it is taking 30 mins to process 25G file.
Here is the code
private static void ReadData(string filename, FEnum fileType)
{
var resultData = new ResultsData
{
DataColumns = new List<string>(),
DataRows = new List<Dictionary<string, Results>>()
};
resultData.DataColumns.Add("count");
resultData.DataColumns.Add("userid");
Console.WriteLine("Start Processing : " + DateTime.Now);
const long processLimit = 100000;
//ProcessLimit : 500000, TimeElapsed : 30 Mins;
//ProcessLimit : 100000, TimeElaspsed - Overflow
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Dictionary<string, Results> parsedData = new Dictionary<string, Results>();
FileStream fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (StreamReader streamReader = new StreamReader(fileStream))
{
string charsRead = streamReader.ReadLine();
int count = 0;
long linesProcessed = 0;
while (!String.IsNullOrEmpty(charsRead))
{
string[] columns = charsRead.Split(',');
string eventsList = columns[0] + ";" + columns[1] + ";" + columns[2] + ";" + columns[3] + ";" +
columns[4] + ";" + columns[5] + ";" + columns[6] + ";" + columns[7];
if (parsedData.ContainsKey(columns[0]))
{
Results results = parsedData[columns[0]];
results.Count = results.Count + 1;
results.Conversion = results.Count;
results.EventList.Add(eventsList);
parsedData[columns[0]] = results;
}
else
{
Results results = new Results {
Count = 1, Hash_Person_Id = columns[0], Tag_Id = columns[1], Conversion = 1,
Campaign_Id = columns[2], Inventory_Placement = columns[3], Action_Id = columns[4],
Creative_Group_Id = columns[5], Creative_Id = columns[6], Record_Time = columns[7]
};
results.EventList = new List<string> {eventsList};
parsedData.Add(columns[0], results);
}
charsRead = streamReader.ReadLine();
linesProcessed++;
if (linesProcessed == processLimit)
{
linesProcessed = 0;
SaveParsedValues(filename, fileType, parsedData);
//Clear Dictionary
parsedData.Clear();
}
}
}
stopwatch.Stop();
Console.WriteLine(#"File : {0} Batch Limit : {1} Time elapsed : {2} ", filename + Environment.NewLine, processLimit + Environment.NewLine, stopwatch.Elapsed + Environment.NewLine);
}
Thank you

The Microsoft.VisualBasic.FileIO.TextFieldParser class looks like it could do the job. Try it, it may speed things up.

Related

C# compare id from text file in filestream

I need to fill a text file with information about workers. Then I need to read from the file and search for an ID that user tries to find. For example my file contains ids 1,2,3 and if I try to find id 3 and it matches, then this worker's all information is written in console. Otherwise it writes a text A worker cannot be found.
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
string file = "C:\\Temp\\registery.txt";
FileStream fOutStream = File.Open(file, FileMode.Append, FileAccess.Write);
StreamWriter sWriter = new StreamWriter(fOutStream);
int[] id = { 1, 2, 3 };
string[] name = { "John", "Carl", "Thomas" };
float[] salary = { 3500, 4800, 2100 };
for (int i = 0; i < id.Length; i++)
{
sWriter.WriteLine(id[i] + " " + name[i] + " " + salary[i]);
}
sWriter.Flush();
sWriter.Close();
FileStream fInStream = File.OpenRead(file);
StreamReader sReader = new StreamReader(fInStream);
int id2;
Console.WriteLine("Type worker's id");
id2 = int.Parse(Console.ReadLine());
bool a;
a = sReader.ReadToEnd().Contains(id2);
Console.WriteLine(a);
sReader.Close();
}
}
If you want to create a text file to be searchable, it should be delimited by a separator like comma /TAB
so modify your code:
sWriter.WriteLine(id[i] + "," + name[i] + "," + salary[i]);
To search your text file by id/name/..whatever and use AND/OR, you can use the method described here:
How would I convert data in a .txt file into xml? c#
BTW: Re-factor your code to create the file in a separate method, and the search in other one.
I found a solution myself to my problem and it worked good enough. It might not be the best solution. I removed bool things and I replaced the whole thing with this:
string line;
while ((line = sReader.ReadLine()) != null)
{
if (line.Contains("id: " + id2))
{
Console.WriteLine(line);
break;
}
else if ((line = sReader.ReadLine()) == null)
{
Console.WriteLine("Worker not found with id " + id2);
}
}
And I fixed the upper for loop to look like this:
sWriter.WriteLine("id: " + id[i] + " name: " + name[i] + " salary: " + salary[i]);

csv modify file

I am a bit in a pickle regarding a consolidation application we are using in our company. We create a csv file from an progress database this csv file has 14 columns and NO header.
The CSV file contains payments (around 173 thousand rows). Most of these rows are the same except for the column amount (last column)
Example:
2014;MONTH;;SC;10110;;;;;;;;EUR;-6500000
2014;01;;SC;10110;;;;;;;;EUR;-1010665
2014;01;;LLC;11110;;;;;;;;EUR;-6567000
2014;01;;SC;10110;;;;;;;;EUR;-1110665
2014;01;;LLC;11110;;;;;;;;EUR;65670.00
2014;01;;SC;10110;;;;;;;;EUR;-11146.65
(around 174000 rows)
As you can see some of these lines are the same except for the amount column. What i need is to sort all rows, add up the amount and save one unique row instead of 1100 rows with different amounts.
My coding skills are failing me to get the job done within a certain timeframe, maybe one of you can push me in the right direction solving this problem.
Example code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = File.ReadAllText(#"c:\temp\test.txt");
string inputLine = "";
StringReader reader = new StringReader(input);
List<List<string>> data = new List<List<string>>();
while ((inputLine = reader.ReadLine()) != null)
{
if (inputLine.Trim().Length > 0)
{
string[] inputArray = inputLine.Split(new char[] { ';' });
data.Add(inputArray.ToList());
}
}
//sort data by every column
for (int sortCol = data[0].Count() - 1; sortCol >= 0; sortCol--)
{
data.OrderBy(x => x[sortCol]);
}
//delete duplicate rows
for (int rowCount = data.Count - 1; rowCount >= 1; rowCount--)
{
Boolean match = true;
for (int colCount = 0; colCount < data[rowCount].Count - 2; colCount++)
{
if(data[rowCount][colCount] != data[rowCount - 1][colCount])
{
match = false;
break;
}
}
if (match == true)
{
decimal previousValue = decimal.Parse(data[rowCount - 1][data[rowCount].Count - 1]);
decimal currentValue = decimal.Parse(data[rowCount][data[rowCount].Count - 1]);
string newStrValue = (previousValue + currentValue).ToString();
data[rowCount - 1][data[rowCount].Count - 1] = newStrValue;
data.RemoveAt(rowCount);
}
}
string output = string.Join("\r\n",data.AsEnumerable()
.Select(x => string.Join(";",x.Select(y => y).ToArray())).ToArray());
File.WriteAllText(#"c:\temp\test1.txt",output);
}
}
}
Read the CSV file line by line, and build an in-memory dictionary in which you keep the totals (and other information you require). As most of the lines belong to the same key, it will probably not cause out of memory issues. Afterwards, generate a new CSV based on the information in the dictionary.
As I interpret your question, your problem and the solution you are asking for are how to take your input that are in the form of
#"2014;MONTH;;SC;10110;;;;;;;;EUR;-6500000
2014;01;;SC;10110;;;;;;;;EUR;-1010665
2014;01;;LLC;11110;;;;;;;;EUR;-6567000
2014;01;;SC;10110;;;;;;;;EUR;-1110665
2014;01;;LLC;11110;;;;;;;;EUR;65670.00
2014;01;;SC;10110;;;;;;;;EUR;-11146.65"
Get the last column and then sum it up? If so this is actually very easy to do with something like this
public static void Main()
{
string input = #"2014;MONTH;;SC;10110;;;;;;;;EUR;-6500000
2014;01;;SC;10110;;;;;;;;EUR;-1010665
2014;01;;LLC;11110;;;;;;;;EUR;-6567000
2014;01;;SC;10110;;;;;;;;EUR;-1110665
2014;01;;LLC;11110;;;;;;;;EUR;65670.00
2014;01;;SC;10110;;;;;;;;EUR;-11146.65";
var rows = input.Split('\n');
decimal totalValue = 0m;
foreach(var row in rows)
{
var transaction = row.Substring(row.LastIndexOf(';') +1);
decimal val = 0m;
if(decimal.TryParse(transaction, out val))
totalValue += val;
}
Console.WriteLine(totalValue);
}
But maybe I have misunderstood what you were asking for?
Sorry answering my post so late but this is my final solution
Replacing all " characters and write the output to the stream writer. (going from 25mb to a 15mb file.). Than copy my CSV file to the SQL server so i can bulk insert. After my insert i just query the table and read / write the result set to a new file. My new file is only +/-700KB!
The Filldata() method is filling a datagridview in my application so you can review the result instead of opening the file in excel.
I am new with C#, i am currently writing a new solution to query the csv file directly or in memory and write it back to a new file.
Method1:
string line;
StreamWriter sw = new StreamWriter(insertFile);
using (StreamReader sr = new StreamReader(sourcePath))
{
while ((line = sr.ReadLine()) != null)
{
sw.WriteLine(line.Replace("\"", ""));
}
sr.Close();
sw.Close();
sr.Dispose();
sw.Dispose();
File.Copy(insertFile, #"\\SQLSERVER\C$\insert.csv");
}
Method2:
var destinationFile = #"c:\insert.csv";
var querieImportCSV = "BULK INSERT dbo.TABLE FROM '" + destinationFile + "' WITH ( FIELDTERMINATOR = ';', ROWTERMINATOR = '\n', FIRSTROW = 1)";
var truncate = #"TRUNCATE TABLE dbo.TABLE";
string queryResult =
#"SELECT [Year]
,[Month]
,[Week]
,[Entity]
,[Account]
,[C11]
,[C12]
,[C21]
,[C22]
,[C3]
,[C4]
,[CTP]
,[VALUTA]
,SUM(AMOUNT) as AMOUNT
,[CURRENCY_ORIG]
,[AMOUNTEXCH]
,[AGENTCODE]
FROM dbo.TABLE
GROUP BY YEAR, MONTH, WEEK, Entity, Account, C11, C12, C21, C22, C3, C4, CTP, VALUTA, CURRENCY_ORIG, AMOUNTEXCH, AGENTCODE
ORDER BY Account";
var conn = new SqlConnection(connectionString);
conn.Open();
SqlCommand commandTruncate = new SqlCommand(truncate, conn);
commandTruncate.ExecuteNonQuery();
SqlCommand commandInsert = new SqlCommand(querieImportCSV, conn);
SqlDataReader readerInsert = commandInsert.ExecuteReader();
readerInsert.Close();
FillData();
SqlCommand commandResult = new SqlCommand(queryResult, conn);
SqlDataReader readerResult = commandResult.ExecuteReader();
StringBuilder sb = new StringBuilder();
while (readerResult.Read())
{
sb.Append(readerResult["Year"] + ";" + readerResult["Month"] + ";" + readerResult["Week"] + ";" + readerResult["Entity"] + ";" + readerResult["Account"] + ";" +
readerResult["C11"] + ";" + readerResult["C12"] + ";" + readerResult["C21"] + ";" + readerResult["C22"] + ";" + readerResult["C3"] + ";" + readerResult["C4"] + ";" +
readerResult["CTP"] + ";" + readerResult["Valuta"] + ";" + readerResult["Amount"] + ";" + readerResult["CURRENCY_ORIG"] + ";" + readerResult["AMOUNTEXCH"] + ";" + readerResult["AGENTCODE"]);
}
sb.Replace("\"","");
StreamWriter sw = new StreamWriter(homedrive);
sw.WriteLine(sb);
readerResult.Close();
conn.Close();
sw.Close();
sw.Dispose();

Union of million line urls in 2 files

File A B contains million urls.
1, go through the url in file A one by one.
2, extract subdomain.com (http://subdomain.com/path/file)
3, if subdomain.com exist file B, save it to file C.
Any quickest way to get file C with c#?
Thanks.
when i use readline, it have no much different.
// stat
DateTime start = DateTime.Now;
int totalcount = 0;
int n1;
if (!int.TryParse(num1.Text, out n1))
n1 = 0;
// memory
dZLinklist = new Dictionary<string, string>();
// read file
string fileName = openFileDialog1.FileName; // get file name
textBox1.Text = fileName;
StreamReader sr = new StreamReader(textBox1.Text);
string fullfile = File.ReadAllText(#textBox1.Text);
string[] sArray = fullfile.Split( '\n');
//IEnumerable<string> sArray = tool.GetSplit(fullfile, '\n');
//string sLine = "";
//while (sLine != null)
foreach ( string sLine in sArray)
{
totalcount++;
//sLine = sr.ReadLine();
if (sLine != null)
{
//string reg = "http[s]*://.*?/";
//Regex R = new Regex(reg, RegexOptions.Compiled);
//Match m = R.Match(sLine);
//if(m.Success)
int length = sLine.IndexOf(' ', n1); // default http://
if(length > 0)
{
//string urls = sLine.Substring(0, length);
dZLinklist[sLine.Substring(0,length)] = sLine;
}
}
}
TimeSpan time = DateTime.Now - start;
int count = dZLinklist.Count;
double sec = Math.Round(time.TotalSeconds,2);
label1.Text = "(" + totalcount + ")" + count.ToString() + " / " + sec + " = " + (Math.Round(count / sec,2)).ToString();
sr.Close();
I would go for using Microsoft LogParser for processing big files: MS LogParser. Are you limited to implement it in described way only?

Alternative to ReadLine?

I'm trying to read some files with ReadLine, but my file have some break lines that I need to catch (not all of them), and I don't know how to get them in the same array, neither in any other array with these separators... because... ReadLine reads lines, and break these lines, huh?
I can't replace these because I need to check it after the process, so I need to get the breaklines AND the content after that. That's the problem. How can I do that?
Here's my code:
public class ReadFile
{
string extension;
string filename;
System.IO.StreamReader sr;
public ReadFile(string arquivo, System.IO.StreamReader sr)
{
string ext = Path.GetExtension(arquivo);
sr = new StreamReader(arquivo, System.Text.Encoding.Default);
this.sr = sr;
this.extension = ext;
this.filename = Path.GetFileNameWithoutExtension(arquivo);
if (ext.Equals(".EXP", StringComparison.OrdinalIgnoreCase))
{
ReadEXP(arquivo);
}
else MessageBox.Show("Extensão de arquivo não suportada: "+ext);
}
public void ReadEXP(string arquivo)
{
string line = sr.ReadLine();
string[] words;
string[] Separators = new string[] { "<Segment>", "</Segment>", "<Source>", "</Source>", "<Target>", "</Target>" };
string ID = null;
string Source = null;
string Target = null;
DataBase db = new DataBase();
//db.CreateTable_EXP(filename);
db.CreateTable_EXP();
while ((line = sr.ReadLine()) != null)
{
try
{
if (line.Contains("<Segment>"))
{
ID = "";
words = line.Split(Separators, StringSplitOptions.None);
ID = words[0];
for (int i = 1; i < words.Length; i++ )
ID += words[i];
MessageBox.Show("Segment[" + words.Length + "]: " + ID);
}
if (line.Contains("<Source>"))
{
Source = "";
words = line.Split(Separators, StringSplitOptions.None);
Source = words[0];
for (int i = 1; i < words.Length; i++)
Source += words[i];
MessageBox.Show("Source[" + words.Length + "]: " + Source);
}
if (line.Contains("<Target>"))
{
Target = "";
words = line.Split(Separators, StringSplitOptions.None);
Target = words[0];
for (int i = 1; i < words.Length; i++)
Target += words[i];
MessageBox.Show("Target[" + words.Length + "]: " + Target);
db.PopulateTable_EXP(ID, Source, Target);
MessageBox.Show("ID: " + ID + "\nSource: " + Source + "\nTarget: " + Target);
}
}
catch (IndexOutOfRangeException e)
{
MessageBox.Show(e.Message.ToString());
MessageBox.Show("ID: " + ID + "\nSource: " + Source + "\nTarget: " + Target);
}
}
return;
}
If you are trying to read XML, try using the built in libaries, here is a simple example of loading a section of XML with <TopLevelTag> in it.
var xmlData = XDocument.Load(#"C:\folder\file.xml").Element("TopLevelTag");
if (xmlData == null) throw new Exception("Failed To Load XML");
Here is a tidy way to get content without it throwing an exception if missing from the XML.
var xmlBit = (string)xmlData.Element("SomeSubTag") ?? "";
If you really have to roll your own, then look at examples for CSV parsers,
where ReadBlock can be used to get the raw data including line breaks.
private char[] chunkBuffer = new char[4096];
var fileStream = new System.IO.StreamReader(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite));
var chunkLength = fileStream.ReadBlock(chunkBuffer, 0, chunkBuffer.Length);

JPG to PDF Convertor in C#

I would like to convert from an image (like jpg or png) to PDF.
I've checked out ImageMagickNET, but it is far too complex for my needs.
What other .NET solutions or code are there for converting an image to a PDF?
Easy with iTextSharp:
class Program
{
static void Main(string[] args)
{
Document document = new Document();
using (var stream = new FileStream("test.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
PdfWriter.GetInstance(document, stream);
document.Open();
using (var imageStream = new FileStream("test.jpg", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
var image = Image.GetInstance(imageStream);
document.Add(image);
}
document.Close();
}
}
}
iTextSharp does it pretty cleanly and is open source. Also, it has a very good accompanying book by the author which I recommend if you end up doing more interesting things like managing forms. For normal usage, there are plenty resources on mailing lists and newsgroups for samples of how to do common things.
EDIT: as alluded to in #Chirag's comment, #Darin's answer has code that definitely compiles with current versions.
Example usage:
public static void ImagesToPdf(string[] imagepaths, string pdfpath)
{
using(var doc = new iTextSharp.text.Document())
{
iTextSharp.text.pdf.PdfWriter.GetInstance(doc, new FileStream(pdfpath, FileMode.Create));
doc.Open();
foreach (var item in imagepaths)
{
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(item);
doc.Add(image);
}
}
}
Another working code, try it
public void ImagesToPdf(string[] imagepaths, string pdfpath)
{
iTextSharp.text.Rectangle pageSize = null;
using (var srcImage = new Bitmap(imagepaths[0].ToString()))
{
pageSize = new iTextSharp.text.Rectangle(0, 0, srcImage.Width, srcImage.Height);
}
using (var ms = new MemoryStream())
{
var document = new iTextSharp.text.Document(pageSize, 0, 0, 0, 0);
iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms).SetFullCompression();
document.Open();
var image = iTextSharp.text.Image.GetInstance(imagepaths[0].ToString());
document.Add(image);
document.Close();
File.WriteAllBytes(pdfpath+"cheque.pdf", ms.ToArray());
}
}
One we've had great luck with is PDFSharp (we use it for TIFF and Text to PDF conversion for hundreds of medical claims every day).
http://pdfsharp.com/PDFsharp/
Such task can be easily done with help of Docotic.Pdf library.
Here is a sample that creates PDF from given images (not only JPGs, actually):
public static void imagesToPdf(string[] images, string pdfName)
{
using (PdfDocument pdf = new PdfDocument())
{
for (int i = 0; i < images.Length; i++)
{
if (i > 0)
pdf.AddPage();
PdfPage page = pdf.Pages[i];
string imagePath = images[i];
PdfImage pdfImage = pdf.AddImage(imagePath);
page.Width = pdfImage.Width;
page.Height = pdfImage.Height;
page.Canvas.DrawImage(pdfImage, 0, 0);
}
pdf.Save(pdfName);
}
}
Disclaimer: I work for the vendor of the library.
If you want to do it in a cross-platform way, without any thirty part library,
or paying any license, you can use this code.
It takes an array of pictures (I think it only works only with jpg) with its sizes and return a pdf file, with one picture per page.
You have to create two files:
File Picture:
using System;
using System.Collections.Generic;
using System.Text;
namespace PDF
{
public class Picture
{
private byte[] data;
private int width;
private int height;
public byte[] Data { get => data; set => data = value; }
public int Width { get => width; set => width = value; }
public int Height { get => height; set => height = value; }
}
}
File PDFExport:
using System;
using System.Collections.Generic;
namespace PDF
{
public class PDFExport
{
private string company = "Your Company Here";
public sbyte[] createFile(List<Picture> pictures)
{
int N = (pictures.Count + 1) * 3;
string dateTimeStr = DateTime.Now.ToString("yyyyMMddhhmmss");
string file1 =
"%PDF-1.4\n";
string file2 =
"2 0 obj\n" +
"<<\n" +
"/Type /Pages\n" +
getKids(pictures) +
"/Count " + pictures.Count + "\n" +
">>\n" +
"endobj\n" +
"1 0 obj\n" +
"<<\n" +
"/Type /Catalog\n" +
"/Pages 2 0 R\n" +
"/PageMode /UseNone\n" +
"/PageLayout /SinglePage\n" +
"/Metadata 7 0 R\n" +
">>\n" +
"endobj\n" +
N + " 0 obj\n" +
"<<\n" +
"/Creator(" + company + ")\n" +
"/Producer(" + company + ")\n" +
"/CreationDate (D:" + dateTimeStr + ")\n" +
"/ModDate (D:" + dateTimeStr + ")\n" +
">>\n" +
"endobj\n" +
"xref\n" +
"0 " + (N + 1) + "\n" +
"0000000000 65535 f\n" +
"0000224088 00000 n\n" +
"0000224031 00000 n\n" +
"0000000015 00000 n\n" +
"0000222920 00000 n\n" +
"0000222815 00000 n\n" +
"0000224153 00000 n\n" +
"0000223050 00000 n\n" +
"trailer\n" +
"<<\n" +
"/Size " + (N + 1) + "\n" +
"/Root 1 0 R\n" +
"/Info 6 0 R\n" +
">>\n" +
"startxref\n" +
"0\n" +
"%% EOF";
sbyte[] part1 = file1.GetBytes();
sbyte[] part2 = file2.GetBytes();
List<sbyte[]> fileContents = new List<sbyte[]>();
fileContents.Add(part1);
for (int i = 0; i < pictures.Count; i++)
{
fileContents.Add(getPageFromImage(pictures[i], i));
}
fileContents.Add(part2);
return getFileContent(fileContents);
}
private string getKids(List<Picture> pictures)
{
string kids = "/Kids[";
for (int i = 0; i < pictures.Count; i++)
{
kids += (3 * (i + 1) + 1) + " 0 R ";
}
kids += "]\n";
return kids;
}
private sbyte[] getPageFromImage(Picture picture, int P)
{
int N = (P + 1) * 3;
string imageStart =
N + " 0 obj\n" +
"<<\n" +
"/Type /XObject\n" +
"/Subtype /Image\n" +
"/Width " + picture.Width + "\n" +
"/Height " + picture.Height + "\n" +
"/BitsPerComponent 8\n" +
"/ColorSpace /DeviceRGB\n" +
"/Filter /DCTDecode\n" +
"/Length " + picture.Data.Length + "\n" +
">>\n" +
"stream\n";
string dimentions = "q\n" +
picture.Width + " 0 0 " + picture.Height + " 0 0 cm\n" +
"/X0 Do\n" +
"Q\n";
string imageEnd =
"\nendstream\n" +
"endobj\n" +
(N + 2) + " 0 obj\n" +
"<<\n" +
"/Filter []\n" +
"/Length " + dimentions.Length + "\n" +
">>\n" +
"stream\n";
string page =
"\nendstream\n" +
"endobj\n" +
(N + 1) + " 0 obj\n" +
"<<\n" +
"/Type /Page\n" +
"/MediaBox[0 0 " + picture.Width + " " + picture.Height + "]\n" +
"/Resources <<\n" +
"/XObject <<\n" +
"/X0 " + N + " 0 R\n" +
">>\n" +
">>\n" +
"/Contents 5 0 R\n" +
"/Parent 2 0 R\n" +
">>\n" +
"endobj\n";
List<sbyte[]> fileContents = new List<sbyte[]>();
fileContents.Add(imageStart.GetBytes());
fileContents.Add(byteArrayToSbyteArray(picture.Data));
fileContents.Add(imageEnd.GetBytes());
fileContents.Add(dimentions.GetBytes());
fileContents.Add(page.GetBytes());
return getFileContent(fileContents);
}
private sbyte[] byteArrayToSbyteArray(byte[] data)
{
sbyte[] data2 = new sbyte[data.Length];
for (int i = 0; i < data2.Length; i++)
{
data2[i] = (sbyte)data[i];
}
return data2;
}
private sbyte[] getFileContent(List<sbyte[]> fileContents)
{
int fileSize = 0;
foreach (sbyte[] content in fileContents)
{
fileSize += content.Length;
}
sbyte[] finaleFile = new sbyte[fileSize];
int index = 0;
foreach (sbyte[] content in fileContents)
{
for (int i = 0; i < content.Length; i++)
{
finaleFile[index + i] = content[i];
}
index += content.Length;
}
return finaleFile;
}
}
}
You can use the code in this easy way
///////////////////////////////////////Export PDF//////////////////////////////////////
private sbyte[] exportPDF(List<Picture> images)
{
if (imageBytesList.Count > 0)
{
PDFExport pdfExport = new PDFExport();
sbyte[] fileData = pdfExport.createFile(images);
return fileData;
}
return null;
}
You need Acrobat to be installed. Tested on Acrobat DC. This is a VB.net code. Due to that these objects are COM objects, you shall do a 'release object', not just a '=Nothing". You can convert this code here: https://converter.telerik.com/
Private Function ImageToPDF(ByVal FilePath As String, ByVal DestinationFolder As String) As String
Const PDSaveCollectGarbage As Integer = 32
Const PDSaveLinearized As Integer = 4
Const PDSaveFull As Integer = 1
Dim PDFAVDoc As Object = Nothing
Dim PDFDoc As Object = Nothing
Try
'Check destination requirements
If Not DestinationFolder.EndsWith("\") Then DestinationFolder += "\"
If Not System.IO.Directory.Exists(DestinationFolder) Then Throw New Exception("Destination directory does not exist: " & DestinationFolder)
Dim CreatedFile As String = DestinationFolder & System.IO.Path.GetFileNameWithoutExtension(FilePath) & ".pdf"
'Avoid conflicts, therefore previous file there will be deleted
If File.Exists(CreatedFile) Then File.Delete(CreatedFile)
'Get PDF document
PDFAVDoc = GetPDFAVDoc(FilePath)
PDFDoc = PDFAVDoc.GetPDDoc
If Not PDFDoc.Save(PDSaveCollectGarbage Or PDSaveLinearized Or PDSaveFull, CreatedFile) Then Throw New Exception("PDF file cannot be saved: " & PDFDoc.GetFileName())
If Not PDFDoc.Close() Then Throw New Exception("PDF file could not be closed: " & PDFDoc.GetFileName())
PDFAVDoc.Close(1)
Return CreatedFile
Catch Ex As Exception
Throw Ex
Finally
System.Runtime.InteropServices.Marshal.ReleaseComObject(PDFDoc)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(PDFDoc)
PDFDoc = Nothing
System.Runtime.InteropServices.Marshal.ReleaseComObject(PDFAVDoc)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(PDFAVDoc)
PDFAVDoc = Nothing
GC.Collect()
GC.WaitForPendingFinalizers()
GC.Collect()
End Try
End Function
not sure if you're looking for just free / open source solutions or considering commercial ones as well. But if you're including commercial solutions, there's a toolkit called EasyPDF SDK that offers an API for converting images (plus a number of other file types) to PDF. It supports C# and can be found here:
http://www.pdfonline.com/
The C# code would look as follows:
Printer oPrinter = new Printer();
ImagePrintJob oPrintJob = oPrinter.ImagePrintJob;
oPrintJob.PrintOut(imageFile, pdfFile);
To be fully transparent, I should disclaim that I do work for the makers of EasyPDF SDK (hence my handle), so this suggestion is not without some personal bias :) But feel free to check out the eval version if you're interested. Cheers!
I use Sautinsoft, its very simple:
SautinSoft.PdfMetamorphosis p = new SautinSoft.PdfMetamorphosis();
p.Serial="xxx";
p.HtmlToPdfConvertStringToFile("<html><body><img src=\""+filename+"\"></img></body></html>","output.pdf");
You may try to convert any Images to PDF using this code sample:
PdfVision v = new PdfVision();
ImageToPdfOptions options = new ImageToPdfOptions();
options.JpegQuality = 95;
try
{
v.ConvertImageToPdf(new string[] {inpFile}, outFile, options);
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outFile) { UseShellExecute = true });
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
Console.ReadLine();
}
Or if you need to convert Image Class to PDF:
System.Drawing.Image image = Image.FromFile(#"..\..\image-jpeg.jpg");
string outFile = new FileInfo(#"Result.pdf").FullName;
PdfVision v = new PdfVision();
ImageToPdfOptions options = new ImageToPdfOptions();
options.PageSetup.PaperType = PaperType.Auto;
byte[] imgBytes = null;
using (MemoryStream ms = new System.IO.MemoryStream())
{
image.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
imgBytes = ms.ToArray();
}
try
{
v.ConvertImageToPdf(imgBytes, outFile, options);
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outFile) { UseShellExecute = true });
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
Console.ReadLine();
}
Many diff tools out there. One I use is PrimoPDF (FREE) http://www.primopdf.com/ you go to print the file and you print it to pdf format onto your drive. works on Windows

Categories

Resources