So today I solved one major lab issue (I think).
I have always used “personal” file servers to archive our data: essentially simple hard drive arrays in RAID5 configuration. These have been accessible to all in the lab from undergraduate project students to postdocs. These are great but they do fail .
Recently two of ours failed.
Had these two drives been in the same array, we would have lost a lot of our imaging data. So I felt the need to do something about this. My University has fortunately just implemented 5Tb (yes, Tb not Gb) of network storage for each “data steward” (essentially each academic). This is automatically backed up and can (in future) be archived to tape. This seems to solve the problem for 5Tb at least but also for more (if we pay). Local IT have also provided a possible solution for local file storage (i.e. within the building) which could be a very useful solution for our core research facilities such as Bioimaging and Proteomics that generate large data volumes.
The large network data store required me to write a data management plan which I did using this online Data Management Planning Tool by the UK Digital Curation Centre through a JISC funded project. This was surprisingly useful in the way it prompted me to think about the way that I organize lab data and consider its future use, access etc.
So, now to the question – HOW do I organize these data sets.
We take a lot of microscopy data from a lot of different imaging systems. Thus formats are complex and not always future-proofed. one plan here is to export as OME-TIFF using the Open Microscopy format that includes the META data. This is simple and fine. In fact we acquire most of our data using Volocity from Improvision/Perkin Elmer – this includes good metadata and (at least for now) the core software is free from Perkin-Elmer on registration.The OME-TIFF option would allow us to take everything including the meta data into ImageJ or equivalent very easily.
Its more the folder structure that requires better organization. Currently I list everything in date order (yy-mm-dd) followed by a brief experiment name. This date format means I can sort by date acquired (which to a large extent is how my brain remembers experiments). I am wondering whether to then include a Word/Text doc alongside detailing methods, labelling etc is enough. We can usually get this annotated directly on the microscope. The other thing I plan to include is a short description of the experimental goal and outcomes where possible. This might in fact be within a higher level directory where we organize by project/sub-project. I need to get the lab to properly commit to this. Progressive creep away from such rigid systems is also a concern but that is another issue…
So….does anyone have a better way? I wish to have something that is simple but highly effective. I want to look through our data archives and know exactly what everything is without referring to the person who acquired the data. I should already be there but I am not.
Before someone suggests OME, we don’t have the wherewithal to move to OMERO – the server setup is beyond me and not something that we can implement easily through our IT support. This is one for the future…
The good news is that I am taking data management more seriously as required by Research Councils (in the UK at least but I guess also internationally by all funders). The bad news is that I clearly have a long way to go with this.