I have SSIS package with a transformation script component. It loads about 460 rows then it stops to do the script component again ( I dont know why it does this) , ofc it creates my C# class variables again and "forgets" where it was the "first time it ran" poping out nulls for the varibles.
Is there anyway make the script component not run its self again after 460 rows ? My batch that I am pulling is 10000 so it cant be that.
And the weirdest thing of all is that after 3 times of run the package (without changing anything) it does everything right...
public class ScriptMain : UserComponent
{
string MarkToRem;
string TypeToRem;
string SerToRem;
int IDCnt;
public override void PreExecute()
{
base.PreExecute();
}
public override void PostExecute()
{
base.PostExecute();
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
MyOutputBuffer.AddRow();
if(Row.IncomingPrice == "Mark")
{
MarkToRem = Row.IncomingCode ; // Setting ver to remember the mark we are in
MyOutputBuffer.ID = Row.IncomingID.ToString();
MyOutputBuffer.Mark = MarkToRem;
MyOutputBuffer.Type = "";
MyOutputBuffer.Series = "";
MyOutputBuffer.Code = "";
MyOutputBuffer.Price = "";
MyOutputBuffer.Description = "Mark Verander";
}
else if( Row.IncomingPrice == "Sub")
{
TypeToRem = Row.IncomingCode; // Save our current Type
SerToRem = Row.IncomingCode; //Save our current Series
// ============ Output ========================
MyOutputBuffer.ID = Row.IncomingID.ToString();
MyOutputBuffer.Mark = MarkToRem;
MyOutputBuffer.Type = "";
MyOutputBuffer.Series = "";
MyOutputBuffer.Code = "";
MyOutputBuffer.Price = "";
MyOutputBuffer.Description = "Sub en series verander";
}
else if (Row.IncomingPrice == "Series")
{
SerToRem = Row.IncomingCode; //Save our current Series
// ============ Output ========================
MyOutputBuffer.ID = Row.IncomingID.ToString();
MyOutputBuffer.Mark = MarkToRem;
MyOutputBuffer.Type = "";
MyOutputBuffer.Series = SerToRem;
MyOutputBuffer.Code = "";
MyOutputBuffer.Price = "";
MyOutputBuffer.Description = "Series verander";
}
else
{
MyOutputBuffer.ID = Row.IncomingID.ToString();
MyOutputBuffer.Mark = MarkToRem;
MyOutputBuffer.Type = TypeToRem;
MyOutputBuffer.Series =SerToRem;
MyOutputBuffer.Code = Row.IncomingCode;
MyOutputBuffer.Price = Row.IncomingPrice;
MyOutputBuffer.Description = Row.IncomingDiscription;
}
IDCnt = IDCnt + 1;
}
}
The first 9 rows looks like this. For the incoming data
ID Code Price Discription
1 184pin DDR Mark
2 DDR - Non-ECC Sub
3 ME-A1GDV4 388 Adata AD1U400A1G3-R 1Gb ddr-400 ( pc3200 ) , CL3 - 184pin - lifetime warranty
4 ME-C512DV4 199 Corsair Valueselect VS512MB400 512mb ddr-400 ( pc3200 ) , CL2.5 - 184pin - lifetime warranty
5 ME-C1GDV4 399 Corsair Valueselect VS1GB400C3 1Gb ddr-400 ( pc3200 ) , CL3 - 184pin - lifetime warranty
6 240pin DDR2 Mark
7 DDR2 - Non-ECC Sub
8 Adata - lifetime warranty Series
9 ME-A2VD26C5 345 Adata AD2U667B2G5 Valuselect , 2Gb ddr2-667 ( pc2-5400 ) , CL5 , 1.8v - 240pin - lifetime warranty
Solved it.
Avoid Asynchronous Transformation wherever possible
SSIS runtime executes every task other than data flow task in the defined sequence. Whenever the SSIS runtime engine encounters a data flow task, it hands over the execution of the data flow task to data flow pipeline engine.
The data flow pipeline engine breaks the execution of a data flow task into one more execution tree(s) and may execute two or more execution trees in parallel to achieve high performance.
Synchronous transformations get a record, process it and pass it to the other transformation or destination in the sequence. The processing of a record is not dependent on the other incoming rows.
Whereas the asynchronous transformation requires addition buffers for its output and does not utilize the incoming input buffers. It also waits for all incoming rows to arrive for processing, that’s the reason the asynchronous transformation performs slower and must be avoided wherever possible. For example, instead of using Sort Transformation you can get sorted results from the source itself by using ORDER BY clause.
Related
Currently we are using clustered redis.
In some instances, we have to delete hundred of thousands of cached objects. I am wondering what's currently the best approach to do so?
At the moment we are executing a LUA scrip with scan and unlink.
private const string ClearCacheLuaScript = (#"
local cursor = 0
local calls = 0
local dels = 0
repeat
local result = redis.call('SCAN', cursor, 'MATCH', ARGV[1])
calls = calls + 1
for _,key in ipairs(result[2]) do
redis.call('UNLINK', key)
dels = dels + 1
end
cursor = tonumber(result[1])
until cursor == 0");
public async Task FlushAllAsync(string section, string group, string prefix)
{
var cacheKey = GetCacheKey(section, group, prefix);
await _cache.ScriptEvaluateAsync(ClearCacheLuaScript
, values: new RedisValue[]
{
cacheKey+"*",
});
}
Currently wondering if the alternative next best approach would be to iteratively delete with keys. Something like
public async Task FlushAllKeys()
{
foreach(var key in keys)
{
await _cache.KeyDeleteAsync(key );
}
}
My main question is, should I be deleting via a pattern using LUA or grab all keys and iteratively delete and not use LUA script
I’m trying to show pins on the map but only the ones that can fit the screen around your current location depending on the zoom level I’ve set on when the map appears, because I have 10’s of pins and when I open the map it loads every pin and takes a long time to load.
Any idea on how to do it ?
Method to load pins:
async Task ExecuteLoadPinsCommand()
{
IsBusy = true;
try
{
Map.Pins.Clear();
Map.MapElements.Clear();
Map.CustomPins.Clear();
var contents = await placeRepository.GetAllPlacesWithoutRelatedDataAsync();
if (contents == null || contents.Count < 1)
{
await App.Current.MainPage.DisplayAlert("No places found", "No places have been found for that category, please try again later", "Ok");
await ExecuteLoadPinsCommand();
}
if (contents != null)
{
places.Clear();
var customPins = this.Map.CustomPins;
places = contents;
foreach (var item in places)
{
CustomPin devicePin = new CustomPin
{
Type = PinType.Place,
PlaceId = item.PlaceId.ToString(),
Position = new Position(item.Latitude, item.Longitude),
Label = $"{item.Name}",
Address = $"{item.Name}"
};
Map.CustomPins.Add(devicePin);
Map.Pins.Add(devicePin);
}
}
}
catch (Exception ex)
{
Debug.WriteLine(ex);
}
finally
{
IsBusy = false;
}
}
CustomMapRenderer:
protected override MarkerOptions CreateMarker(Pin pin)
{
CustomPin pin = (CustomPin)pin;
var thePlace = Task.Run(async () => await placeRepository.GetPlaceByIdWithMoodAndEventsAsync(Guid.Parse(pin.PlaceId)));
var place = thePlace.ConfigureAwait(true)
.GetAwaiter()
.GetResult();
var marker = new MarkerOptions();
marker.SetPosition(new LatLng(place.Position.Latitude, place.Position.Longitude));
if (place.Category == "" && place.SomethingCount == 0)
{
marker.SetIcon(BitmapDescriptorFactory.FromResource(Resource.Drawable.icon));
}
//else if ...
return marker;
}
A major part of programming is learning to debug well.
When facing a performance problem, it is important to isolate the time delay to the smallest bit of code that you can.
Here's the thought process I would go through:
YOUR OBSERVATION: When there is one pin, it takes less than a second. When there are 20 pins, it takes maybe 10 seconds. (9 second difference.)
YOUR HYPOTHESIS (Given the question you posted): Maybe adding 20 pins to map takes much or most of the 9 seconds.
TESTS: How can we test EXACTLY the code that "adds pins to map"?
A: "Divide and conquer":
Let all the other code run 20 times. That is, have 20 pins as data. BUT suppress the code that adds those pins.
Test #1: Have 20 pins returned by GetAllPlacesWithoutRelatedDataAsync. So all that work is done 20 times.
Comment out JUST the code that ADDS the pins. Make this change:
//Map.CustomPins.Add(devicePin);
//Map.Pins.Add(devicePin);
Result #1: _____ seconds
Its possible that having NO pins allows the map to skip loading some pin-related code. Lets find out how quick it is when we only ADD ONE of the 20 pins.
Test #2: Have 20 pins in the data. BUT only ADD one of them. Make this ONE-LINE change:
foreach (var item in places)
{
CustomPin devicePin = new CustomPin
{
Type = PinType.Place,
PlaceId = item.PlaceId.ToString(),
Position = new Position(item.Latitude, item.Longitude),
Label = $"{item.Name}",
Address = $"{item.Name}"
};
Map.CustomPins.Add(devicePin);
Map.Pins.Add(devicePin);
break; // <--- ADD THIS LINE.
}
Result #2: _____ seconds
(Test #2 is what I was trying to ask you to do, in one of my comments on the question. I've deleted those comments.)
Now we have enough information to determine how much of the ~9 extra seconds are due to going through that foreach loop 20 times, to ADD 20 pins.
This will determine whether there is any point in trying to speed up the ADDS, or whether there is a problem elsewhere.
If most time is spent elsewhere, then you need to add the suspect code to the question. Then do similar tests there. Until you can report exactly what code takes most of the time.
IF 20x map.Pins.Add(..) takes a significant amount of time, THEN here are two techniques, either of which should be faster, imho.
FASTER ADD #1:
Use Map.ItemsSource.
FASTER ADD #2:
Create the map WITH its pins, BEFORE displaying it.
using Xamarin.Forms;
using Xamarin.Forms.Maps;
namespace XFSOAnswers
{
// Based on https://github.com/xamarin/xamarin-forms-samples/blob/main/WorkingWithMaps/WorkingWithMaps/WorkingWithMaps/PinPageCode.cs
public class PinPageCode : ContentPage
{
public PinPageCode()
{
Title = "Pins demo";
Position position = new Position(36.9628066, -122.0194722);
MapSpan mapSpan = new MapSpan(position, 0.01, 0.01);
Map map = new Map(mapSpan);
Pin pin = new Pin
{
Label = "Santa Cruz",
Address = "The city with a boardwalk",
Type = PinType.Place,
Position = position
};
map.Pins.Add(pin);
// ... more pins.
Content = new StackLayout
{
Margin = new Thickness(10),
Children =
{
map
}
};
}
}
}
I'm building a candle recorder (Binance Crypto), interesting in 1 minute candles, including intra candle data for market study purpose (But eventually I could use this same code to actually be my eyes on what's happening in the market)
To avoid eventual lag / EF / SQL performance etc. I decided do accomplish this using two threads.
One receives the subscribed (Async) tokens from Binance and put them in a ConcurrentQueue, while another keeps trying to dequeue and save the data in MSSQL
My question goes for the second Thread, a while(true) loop. Whats the best approach to save like 200 + info/sec to SQL while these info come in individually (sometimes 300 info in a matter of 300ms, sometime less) using EF:
Should I open the SQL con each time I want to save? (Performance).
Whats the best approach to accomplish this?
-- EDITED --
At one point I got 600k+ in the Queue so I'm facing problems inserting to SQL
Changed from Linq to SQL to EF
Here's my actual code:
//Initialize
public void getCoinsMoves()
{
Thread THTransferDatatoSQL = new Thread(TransferDatatoSQL);
THTransferDatatoSQL.Name = "THTransferDatatoSQL";
THTransferDatatoSQL.SetApartmentState(ApartmentState.STA);
THTransferDatatoSQL.IsBackground = true;
THTransferDatatoSQL.Start();
List<string> SymbolsMap;
using(DBBINANCEEntities lSQLBINANCE = new DBBINANCEEntities())
{
SymbolsMap = lSQLBINANCE.TB_SYMBOLS_MAP.Select(h => h.SYMBOL).ToList();
}
socketClient.Spot.SubscribeToKlineUpdatesAsync(SymbolsMap, Binance.Net.Enums.KlineInterval.OneMinute, h =>
{
RecordCandles(h);
});
}
//Enqueue Data
public void RecordCandles(Binance.Net.Interfaces.IBinanceStreamKlineData Candle)
{
FRACTIONED_CANDLES.Enqueue(new TB_FRACTIONED_CANDLES_DATA()
{
BASE_VOLUME = Candle.Data.BaseVolume,
CLOSE_TIME = Candle.Data.CloseTime.AddHours(-3),
MONEY_VOLUME = Candle.Data.QuoteVolume,
PCLOSE = Candle.Data.Close,
PHIGH = Candle.Data.High,
PLOW = Candle.Data.Low,
POPEN = Candle.Data.Open,
SYMBOL = Candle.Symbol,
TAKER_BUY_BASE_VOLUME = Candle.Data.TakerBuyBaseVolume,
TAKER_BUY_MONEY_VOLUME = Candle.Data.TakerBuyQuoteVolume,
TRADES = Candle.Data.TradeCount,
IS_LAST_CANDLE = Candle.Data.Final
});
}
//Transfer Data to SQL
public void TransferDatatoSQL()
{
while (true)
{
TB_FRACTIONED_CANDLES_DATA NewData;
if (FRACTIONED_CANDLES.TryDequeue(out NewData))
{
using (DBBINANCEEntities LSQLBINANCE = new DBBINANCEEntities())
{
LSQLBINANCE.TB_FRACTIONED_CANDLES_DATA.Add(NewData);
if (NewData.IS_LAST_CANDLE)
LSQLBINANCE.TB_CANDLES_DATA.Add(new TB_CANDLES_DATA()
{
BASE_VOLUME = NewData.BASE_VOLUME,
CLOSE_TIME = NewData.CLOSE_TIME,
IS_LAST_CANDLE = NewData.IS_LAST_CANDLE,
MONEY_VOLUME = NewData.MONEY_VOLUME,
PCLOSE = NewData.PCLOSE,
PHIGH = NewData.PHIGH,
PLOW = NewData.PLOW,
POPEN = NewData.POPEN,
SYMBOL = NewData.SYMBOL,
TAKER_BUY_BASE_VOLUME = NewData.TAKER_BUY_BASE_VOLUME,
TAKER_BUY_MONEY_VOLUME = NewData.TAKER_BUY_MONEY_VOLUME,
TRADES = NewData.TRADES
});
LSQLBINANCE.SaveChanges();
}
}
Thread.Sleep(1);
}
}
Thx in Adv
Rafael
I see one error in your code, you're sleeping a background thread after every insert, don't sleep if there's more data. Instead of:
if (FRACTIONED_CANDLES.TryDequeue(out NewData))
{
using (DBBINANCEEntities LSQLBINANCE = new DBBINANCEEntities())
{
LSQLBINANCE.TB_FRACTIONED_CANDLES_DATA.Add(NewData);
if (NewData.IS_LAST_CANDLE)
LSQLBINANCE.TB_CANDLES_DATA.Add(new TB_CANDLES_DATA()
{
BASE_VOLUME = NewData.BASE_VOLUME,
CLOSE_TIME = NewData.CLOSE_TIME,
IS_LAST_CANDLE = NewData.IS_LAST_CANDLE,
MONEY_VOLUME = NewData.MONEY_VOLUME,
PCLOSE = NewData.PCLOSE,
PHIGH = NewData.PHIGH,
PLOW = NewData.PLOW,
POPEN = NewData.POPEN,
SYMBOL = NewData.SYMBOL,
TAKER_BUY_BASE_VOLUME = NewData.TAKER_BUY_BASE_VOLUME,
TAKER_BUY_MONEY_VOLUME = NewData.TAKER_BUY_MONEY_VOLUME,
TRADES = NewData.TRADES
});
LSQLBINANCE.SaveChanges();
}
}
Thread.Sleep(1);
Change the last line to:
else
Thread.Sleep(1);
This may resolve your problem.
I have a Stock class which loads lots of stock data history from a file (about 100 MB). I have a Pair class that takes two Stock objects and calculates some statistical relations between the two then writes the results to file.
In my main method I have a loop going through a list of pairs of stocks (about 500). It creates 2 stock objects and then a pair object out of the two. At this point the pair calculations are written to file and I'm done with the objects. I need to free the memory so I can go on with the next calculation.
I addition to setting the 3 objects to null I have added the following two lines at the end of the loop:
GC.Collect(GC.MaxGeneration);
GC.WaitForPendingFinalizers();
Stepping over theses two lines just seems to just free up 50 MB out of the 200-300 MB that is allocated per every loop iteration (viewing it from task manager).
The program does about eight or ten pairs before it gives me a system out of memory exception. The memory usage steadily increases until it crashes at about 1.5 GB. (This is an 8 GB machine running Win7 Ultimate)
I don't have much experience with garbage collection. Am I doing something wrong?
Here's my code since you asked: (note: program has two modes, 1> add mode in which new pairs are added to system. 2> regular mode which updates the pair files realtime based on filesystemwatcher events. The stock data is updated by external app called QCollector.)
This is the segment in MainForm which runs in Add Mode:
foreach (string line in PairList)
{
string[] tokens = line.Split(',');
stockA = new Stock(QCollectorPath, tokens[0].ToUpper());
stockB = new Stock(QCollectorPath, tokens[1].ToUpper());
double ratio = double.Parse(tokens[2]);
Pair p = new Pair(QCollectorPath, stockA, stockB, ratio);
// at this point the pair is written to file (constructor handles this)
// commenting out the following lines of code since they don't fix the problem
// stockA = null;
// stockB = null;
// p = null;
// refraining from forced collection since that's not the problem
// GC.Collect(GC.MaxGeneration);
// GC.WaitForPendingFinalizers();
// so far this is the only way i can fix the problem by setting the pair classes
// references to StockA and StockB to null
p.Kill();
}
I am adding more code as per request: Stock and Pair are subclasses of TimeSeries, which has the common functionality
public abstract class TimeSeries {
protected List<string> data;
// following create class must be implemented by subclasses (stock, pair, etc...)
// as each class is created differently, although their data formatting is identical
protected void List<string> Create();
// . . .
public void LoadFromFile()
{
data = new List<string>();
List<StreamReader> srs = GetAllFiles();
foreach (StreamReader sr in srs)
{
List<string> temp = new List<string>();
temp = TurnFileIntoListString(sr);
data = new List<string>(temp.Concat(data));
sr.Close()
}
}
// uses directory naming scheme (according to data month/year) to find files of a symbol
protected List<StreamReader> GetAllFiles()...
public static List<string> TurnFileIntoListString(StreamReader sr)
{
List<string> list = new List<string>();
string line;
while ((line = sr.ReadLine()) != null)
list.Add(line);
return list;
}
// this is the only mean to access a TimeSeries object's data
// this is to prevent deadlocks by time consuming methods such as pair's Create
public string[] GetListCopy()
{
lock (data)
{
string[] listCopy = new string[data.count];
data.CopyTo(listCopy);
return listCopy();
}
}
}
public class Stock : TimeSeries
{
public Stock(string dataFilePath, string symbol, FileSystemWatcher fsw = null)
{
DataFilePath = dataFilePath;
Name = symbol.ToUpper();
LoadFromFile();
// to update stock data when external app updates the files
if (fsw != null) fsw.Changed += FileSystemEventHandler(fsw_Changed);
}
protected void List<string> Create()
{
// stock files created by external application
}
// . . .
}
public class Pair : TimeSeries {
public Pair(string dataFilePath, Stock stockA, Stock stockB, double ratio)
{
// assign parameters to local members
// ...
if (FileExists())
LoadFromFile();
else
Create();
}
protected override List<string> Create()
{
// since stock can get updated by fileSystemWatcher's event handler
// a copy is obtained from the stock object's data
string[] listA = StockA.GetListCopy();
string[] listB = StockB.GetListCopy();
List<string> listP = new List<string>();
int i, j;
i = GetFirstValidBar(listA);
j = GetFirstValidBar(listB);
DateTime dtA, dtB;
dtA = GetDateTime(listA[i]);
dtB = GetDateTime(listB[j]);
// this hidden segment adjusts i and j until they are starting at same datetime
// since stocks can have different amount of data
while (i < listA.Count() && j < listB.Count)
{
double priceA = GetPrice(listA[i]);
double priceB = GetPrice(listB[j]);
double priceP = priceA * ratio - priceB;
listP.Add(String.Format("{0},{1:0.00},{2:0.00},{3:0.00}"
, dtA
, priceP
, priceA
, priceB
);
if (i < j)
i++;
else if (j < i)
j++;
else
{
i++;
j++;
}
}
}
public void Kill()
{
data = null;
stockA = null;
stockB = null;
}
}
Your memory leak is here:
if (fsw != null) fsw.Changed += FileSystemEventHandler(fsw_Changed);
The instance of the stock object will be kept in memory as long as the FileSystemWatcher is alive, since it is responding to an event of the FileSystemWatcher.
I think that you want to either implement that event somewhere else, or at some other point in your code add a:
if (fsw != null) fsw.Changed -= fsw_Changed;
Given the way that the code is written it might be possible that stock object is intended to be called without a FileSystemWatcher in cases where a bulk processing is done.
In the original code that you posted the constructors of the Stock classes were being called with a FileSystemWatcher. You have changed that now. I think you will find that now with a null FileSystemWatcher you can remove your kill and you will not have a leak since you are no longer listening to the fsw.Changed.
I'm building a console application that have to process a bunch of document.
To stay simple, the process is :
for each year between X and Y, query the DB to get a list of document reference to process
for each of this reference, process a local file
The process method is, I think, independent and should be parallelized as soon as input args are different :
private static bool ProcessDocument(
DocumentsDataset.DocumentsRow d,
string langCode
)
{
try
{
var htmFileName = d.UniqueDocRef.Trim() + langCode + ".htm";
var htmFullPath = Path.Combine("x:\path", htmFileName;
missingHtmlFile = !File.Exists(htmFullPath);
if (!missingHtmlFile)
{
var html = File.ReadAllText(htmFullPath);
// ProcessHtml is quite long : it use a regex search for a list of reference
// which are other documents, then sends the result to a custom WS
ProcessHtml(ref html);
File.WriteAllText(htmFullPath, html);
}
return true;
}
catch (Exception exc)
{
Trace.TraceError("{0,8}Fail processing {1} : {2}","[FATAL]", d.UniqueDocRef, exc.ToString());
return false;
}
}
In order to enumerate my document, I have this method :
private static IEnumerable<DocumentsDataset.DocumentsRow> EnumerateDocuments()
{
return Enumerable.Range(1990, 2020 - 1990).AsParallel().SelectMany(year => {
return Document.FindAll((short)year).Documents;
});
}
Document is a business class that wrap the retrieval of documents. The output of this method is a typed dataset (I'm returning the Documents table). The method is waiting for a year and I'm sure a document can't be returned by more than one year (year is part of the key actually).
Note the use of AsParallel() here, but I never got issue with this one.
Now, my main method is :
var documents = EnumerateDocuments();
var result = documents.Select(d => {
bool success = true;
foreach (var langCode in new string[] { "-e","-f" })
{
success &= ProcessDocument(d, langCode);
}
return new {
d.UniqueDocRef,
success
};
});
using (var sw = File.CreateText("summary.csv"))
{
sw.WriteLine("Level;UniqueDocRef");
foreach (var item in result)
{
string level;
if (!item.success) level = "[ERROR]";
else level = "[OK]";
sw.WriteLine(
"{0};{1}",
level,
item.UniqueDocRef
);
//sw.WriteLine(item);
}
}
This method works as expected under this form. However, if I replace
var documents = EnumerateDocuments();
by
var documents = EnumerateDocuments().AsParrallel();
It stops to work, and I don't understand why.
The error appears exactly here (in my process method):
File.WriteAllText(htmFullPath, html);
It tells me that the file is already opened by another program.
I don't understand what can cause my program not to works as expected. As my documents variable is an IEnumerable returning unique values, why my process method is breaking ?
thx for advises
[Edit] Code for retrieving document :
/// <summary>
/// Get all documents in data store
/// </summary>
public static DocumentsDS FindAll(short? year)
{
Database db = DatabaseFactory.CreateDatabase(connStringName); // MS Entlib
DbCommand cm = db.GetStoredProcCommand("Document_Select");
if (year.HasValue) db.AddInParameter(cm, "Year", DbType.Int16, year.Value);
string[] tableNames = { "Documents", "Years" };
DocumentsDS ds = new DocumentsDS();
db.LoadDataSet(cm, ds, tableNames);
return ds;
}
[Edit2] Possible source of my issue, thanks to mquander. If I wrote :
var test = EnumerateDocuments().AsParallel().Select(d => d.UniqueDocRef);
var testGr = test.GroupBy(d => d).Select(d => new { d.Key, Count = d.Count() }).Where(c=>c.Count>1);
var testLst = testGr.ToList();
Console.WriteLine(testLst.Where(x => x.Count == 1).Count());
Console.WriteLine(testLst.Where(x => x.Count > 1).Count());
I get this result :
0
1758
Removing the AsParallel returns the same output.
Conclusion : my EnumerateDocuments have something wrong and returns twice each documents.
Have to dive here I think
This is probably my source enumeration in cause
I suggest you to have each task put the file data into a global queue and have a parallel thread take writing requests from the queue and do the actual writing.
Anyway, the performance of writing in parallel on a single disk is much worse than writing sequentially, because the disk needs to spin to seek the next writing location, so you are just bouncing the disk around between seeks. It's better to do the writes sequentially.
Is Document.FindAll((short)year).Documents threadsafe? Because the difference between the first and the second version is that in the second (broken) version, this call is running multiple times concurrently. That could plausibly be the cause of the issue.
Sounds like you're trying to write to the same file. Only one thread/program can write to a file at a given time, so you can't use Parallel.
If you're reading from the same file, then you need to open the file with only read permissions as not to put a write lock on it.
The simplest way to fix the issue is to place a lock around your File.WriteAllText, assuming the writing is fast and it's worth parallelizing the rest of the code.