A Database to organise digitised archive information

Feb 9, 2012
Reaction score
A Database to organise digitised archive information

I am attempting to create a database to use as a tool for searching through and organising a large number of images of documents as part of an archive digitisation project. The actual physical artefacts, the folders stored in the repository, are being scanned. Each physical folder’s contents, having been scanned, are then placed in a folder on the computer which matches their class-mark in the real world. However, there are two problems with the physical organisation of the archive which are at this stage replicated in the digital version;
1) The organisation of the archive is often incoherent. Correspondence is out of order, or spread across a variety of files. This is in part down to the archive having been through a whole series of different filing systems, and in part due to neglect.
2) The inventory which exists is frequently wildly inaccurate.
The archive contains a range of information, which can be broken into the following categories;
1) Correspondence – between individuals and officials/departments.
2) Minutes of various sorts – often relating to correspondence.
3) Memoranda/notebooks – relating to a diverse range of topics.
The database will need to perform several key functions, from the very simple query to much more complicated processes. However, before getting to that, I will explain the intended mode of data collection, and outline which categories of data, relevant to a potential restructuring of the archive, can be culled from the images.
Due to the frequent occurrence of handwritten items, OCRing would not be a suitable method to digitise the material. Therefore, each image will have to be consulted in turn. Whilst time consuming, this is the only way to ensure accuracy.
The categories of data which will form the basis of the columns in the database’s table are as follows;
- Date sent
- Date received
- Reference (in the form of a code frequently appearing on items)
- Refers to
- Date of items referred to
- Author’s Name
- Author’s Position
- Recipient’s Name
- Recipient’s Position
- Title
- Subject
- Notes
- Class-mark
Included in the table will be a hyperlink to each file. Before the database is complete it simply will not be possible to determine whether a physical reorganisation of the archive will be useful/worthwhile. The digital files will therefore also remain in their current file architecture.
It should also be obvious that many of the above fields will not be relevant for each material type, but in the case of a memorandum on goods prices for instance, date sent simply becomes date on memo and so on. The data entry will endeavour to provide as complete a record as possible, but this will not be possible in every case, leaving blanks in the table.
The database is intended to fulfil several functions;
1) By a simply class-mark query provide information on all items with that particular class-mark to facilitate the creation of a new inventory.
2) Permit the retrieval of items through a query in one of the other fields, or a combination of the two, for example, all correspondence send or received by John Smith over the course of a decade, or between John Smith and Peter Brown over the course of a year.
3) Make it possible to use the ‘Reference’/’Refers to’ and if possible ‘Date Sent’/’Date Received’ or ‘Date of items referred to’ fields to reconstruct series of correspondence chronologically. This many not be the same as the process in point two, as two authors may be conversing on a wide range of different topics over any given period, and often simultaneously. It is hoped that the ‘Title’/’Subject’ fields may be put to use here.

It is highly likely that not all items in a correspondence series are extant, but coming as close as possible to the original order is highly important. It may also be the case that a combination of steps 2 and 3 will allow blanks to be filled in a series of correspondence, where the correct references do not appear on the images themselves. There may be cases where minutes and memoranda also have a place in the chain, and through a similar process it is hoped that tables can be reorganised to reflect this, although this may have to be a manual process.
The primary key which can be assigned to each entry would presumably permit the entire table to be returned to its original state through a sort. By assigning a similar number to each item once the reorganisation is complete, it will presumably also be possible to switch between the two.
The desired end result is to have an ordered and searchable database of the material in the archive, which shows chains of correspondence and related materials. Taking a particular item, one should be able to not only link to following and preceding items, but get an overview of the entire series.
At this stage there are only two questions which truly need to be asked. Is all this possible, and further, how complex would it be to achieve?
Secondly, as far as data inputting, which is ready to commence, goes, will it be better to use separate tables for each folder, or simply one master table? The former approach would make creating the inventory my easier, but would it not make constructing the necessary queries more difficult?
Any guidance you can provide will be much appreciated.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question