Case Study

The Migration Engine: Moving 20,000 Assets to a New CMS

Rescuing 20,000 product assets during a website migration when the backend was locked down.

The Bottleneck

Moving the Grohe website to a new CMS should have been a standard data transfer. Instead, it was a fight. The home office in Germany controlled the legacy system and they severely restricted access. We were staring at 4,000 live product detail pages that held the images and spec sheets we needed for the new site.

Manual downloads could have taken weeks, if not months. That’s a project-killer.

The Hack

I decided to treat the live website like a public library. Since I couldn’t get into the backend, I wrote a Python script to “harvest” the assets from the frontend.

The Web Scraper

The script crawled all 4,000 product pages. It didn’t just grab files; it looked at the page for the SKU and renamed every image and sell-sheet as it downloaded them. Everything was organized and ready for the new system before it even hit my computer.

The Box API Slog

Moving the files was only half the battle. We used Box.com to host the assets for the new website, but every single file needed a public URL. Clicking through 20,000 files to set permissions manually would have taken months.

The Automated Fix

I used the Box API to force a “Public” status on all 20,000 assets. The script then grabbed those new URLs and dumped them into a master spreadsheet. This became the backbone for our PIM (Product Information Management) upload.

The Result

We turned a manual nightmare into a predictable process. When launch day came, every product had its documents and images exactly where they belonged. We didn’t need permission from the home office, and we didn’t have to miss our deadline.