Hi, this post will be an intro to quite a large series in which I’d like to dive deep into the topic of importing data into an eCommerce system.
I’ve built several import mechanisms already, and learned from rebuilding a few that had „not-so-ideal” 😀 designs initially. What strikes me is the fact that the heavily promoted ETL (Extract-Transform-Load) pattern which I’ll talk more about in the next article doesn’t really meet all the needs of full-scale data import into an eCommerce system. In my opinion, it’s simply too simplified. I even came up with a joke: why not replace the ETL pattern with JII (Just Import It)? Of course, that’s a bit of an exaggeration, but I think most people who have experience building such systems will agree that in a production solution, we end up adding steps that don’t quite fit into any of the Extract, Transform, or Load categories.
Another reason that motivated me to start this series was a survey I ran in the OroCommerce developer community. When I asked what topics might be interesting, Reliable product import landed in first place. On top of that, I have plenty of personal experience building such mechanisms and later fixing them after myself.
I’d like this series to be both comprehensive and based on real-life examples of what works and what causes problems. That’s why I’m also collecting experiences from other developers. I already have a few people supporting me, but if you’d like to contribute, I’d be extremely happy. Write to me in the comments, via the contact form, or on LinkedIn.
There are many business reasons (I’ll list a few later) why correctness and speed in processing product data matter so much. But I want to focus on an aspect that often gets overlooked: every failure, every bug in the import system pulls the development team away from building the product. That means errors don’t just affect the business today they also affect the business tomorrow.
So our task is to build an import system once, properly, so it doesn’t drain the development team’s capacity to deliver great new features.
Okay, we’ve put reliability on the banner, and that’s great. But as practice shows, development budgets are not elastic. It’s not enough to theoretically design safeguards for every possible case and write recovery plans for recovery plans.
And this is where things get interesting. What I want to create is a guide on how to design imports adequate to the scale of the system we’re working on. That’s why I suggest dividing the problem into a few criteria:
From the intersection of these criteria, we’ll get a table of cases.
As you can see, there are quite a few versions and that’s exactly the point. There is no single magic solution that fits all scenarios. That’s why the idea is to create a whole series of articles to address this problem.
So you know what to expect from this series, here’s a draft plan of the articles. Of course, it’s not set in stone and may evolve. If you think something should be added let me know.
Finally, two notes:
That said, I’ll do my best to keep the guide as universal as possible, so that anyone regardless of which eCommerce platform they run can take away something useful.
Your email will only be used for Customer Scoring Extension communications and won’t be shared with third parties.
