Mark's Musings

A miscellany of thoughts and opinions from an unimportant small town politician and bit-part web developer

Why you can’t have too much information in your database

| 0 comments

This question was posted in uk.net.web.authoring:

I’ve got a site on the way, where I need to add products to a sort of simple shopping basket. The plan is to add a pair of values (UserID, ProductID) to a MySQL database for every product the user adds to the basket, and then get rid of them when the user checks out. Problem is if the user doesn’t chek out, then the cookie can expire and leave me with a load of useless records in my database. Any suggestions?

I think the question is making a false assumption. Here’s my answer:

They’re not useless records.

I have a very similar application – the database entry is generated when the first item is added to the cart, but doesn’t become visible to the admin system until the checkout is complete. (Well, actually semi-complete – it’s visible when they’ve entered a delivery address, but doesn’t become fully live until payment is complete). That does, of course, mean that the order table always has a lot of “dead” entries where people started the shopping process but never completed it. But that’s not a problem – the staff never see these incomplete entries, because the admin system doesn’t display them (and they only see the semi-complete entries if they select a specific option on the admin system), so it doesn’t affect the functioning of the shop at all.

Once a day, I have a batch job that backs up the order table and cleans out the dead entries from the live one. This keeps the live table from becoming unwieldy. Eventually, I’ll have to archive completed entries as well, and delete them from the live table, but not for several years (unless sales go up massively in the meantime). But the backup table contains all the dead entries as well as completed ones – this doesn’t matter from a performance angle, as the backup table is never read by any customer facing (or staff facing, for that matter) scripts, but it does mean I’ve got all the data there if I ever need or want to look at it.

And the incomplete orders can contain a lot of useful information – it’s interesting to see which products are more likely to be added to a cart but never checked out, and which types of customers (broken down by various stuff such as IP address, Referer, ISP, browser, time of day, etc) are more likely to abandon things before completion. That can provide a lot of very important information from a marketing perspective, and can also help highlight possible problems in the checkout process. The other really helpful thing is to know how far the customer got before bailing out – is there a particular sticking point anywhere? Consider this in the light of the other thread about requiring istuff like date of birth – do a lot of people drop the cart when asked for information that they may not want to give? Is there a section which requires Javascript to be enabled, and are people dropping out there? Do you have an image link in the process which can’t be navigated with image loading turned off? A lot of online shops won’t give a definite price until you go to checkout, so the only way to compare between different suppliers is to create a cart on both and then only complete the one with the lowest price after P&P has been added at the end. If you’re seeing a lot of drop-outs at the point where you add this on, then maybe you need to look at the pricing structure.

Your shopping cart database is much more than simply a list of orders – it’s a goldmine of information about customer behaviour, much of which can be key to the success of the site. You can never have too much information – ideally, you want to be able to track every click from the point at which a customer enters the site to the point at which they either buy or leave. That level of detail may be impractical for many people, particularly if you’re using an off-the-shelf package, but anything you do have is useful. There’s no such thing as useless data.