For a work project called DbSalsa, I parse log files in real-time according to some regular expressions defined in a configuration file and take matches and store them in a database. I’ve abstracted the database using the Entity Framework. Pretty straightforward, right? Not so much.
Basically, everything that happens in the log file is stored in an Event object (and corresponding table) and events can be aggregated into Job objects. A Job has a start event, intermediate events, and an end event.
Let’s say that the log file instantaneously enlarges by 10 megabytes. (This happens when, for testing purposes, I start with no log and dump in a previously created log file.) It sits and churns for a while… longer… longer… and then gives me a no available memory exception.
What did I do wrong? Well, let’s pretend that the EF didn’t just find a way to bloat 10 MB worth of data into over 2 GB of working RAM. [Edit: Turns out I had an infinite loop generating an infinite number of records, but this is still valid for anyone working with extremely high volumes of data.] What I didn’t do is account for the fact that unlike regular database record creation, I’m actually creating objects in memory here. So every time I create an Event object and stick it into the JobInstance.Events collection, I’m using more RAM. The same problem would occur if you lazy-loaded huge amounts of data into these collections using .Load().
Since I never have any expectation of needing the Event object again once it is created, I am simply doing the following three steps after every enumeration through my loop that generates a Event:
The third line is key — it marks the clears as unchanged and unbinds the removed objects, removing the last reference to the Event object that my code long forgot about and allows the garbage collector to do its thing.