Everything is broken: why you shouldn’t beat yourself up when troubleshooting

Glenn Fleishman
19 May, 2015
View more articles fromthe author
AAA
Help

I’ve made a decent to large part of my living for more than 20 years learning about how to fix problems and then trying to tell others how to follow suit. And this last week has been among my highest in terms of frustration in using computers in my entire life. But, per my modus operandi, I have truth borne from a bloody fight to share with you.

A few weeks ago, I tried to deal with the mystery of my 2011 Mac mini taking forever to start up and be ready to use by switching to an external SSD drive with both FireWire 800 and USB 3.0 built in. I documented that here, and people have a lot of good opinions about my choice. Some thought I should have cracked open the Mac mini and put in a new drive; others thought that I should’ve used Thunderbolt; and others that I should have bought a new computer.

(My answers: the USB 3.0 was futureproofing, so I could use this drive as a boot volume or backup. Thunderbolt was too expensive relative to the remaining value in the mini. A new computer seemed too expensive. The mini is four years old and has a 40-step procedure for swapping an internal drive.)

Nonetheless, I had more than three weeks of blissful performance with a computer that felt newly rejuvenated. I’m still unable to explain why an SSD using FireWire 800 has led to less memory usage, too, but it has.

Then the honeymoon was over this week. This story should help those of you who think that these things only happen to you. Have a little schadenfreude with my permission. (Spoiler: it was the butler.)

Crashy with a chance of meathead

Monday morning, and I’m ready to tape this week’s Macworld podcast, and the machine is sluggish. Eventually, I have to reboot. We wind up rescheduling with the redoubtable Kyle Wiens of iFixit because I can’t keep the system working reliably. This is partly because, after a restart and despite the settings in Photos, iCloud Photo Library began uploading again, flooding my network. Kyle and I taped our segment later, but our excellent audio engineer, Jim Metzendorf, discovered that my side of the conversation with Susie and separately with Kyle was cutting in and out. He made do and patched it up (thanks, Jim! sorry, listeners!), but something was terribly wrong.

I tried my usual array of troubleshooting:

  • What did top -u show in Terminal and Activity Monitor under CPU reveal when slowdowns occurred?
  • Was there unusual disk activity and was memory under pressure, both viewable through Activity Monitor?
  • Did the Console app (Applications > Console) reveal log entries that showed something was churning like mad? I’ve had that happen before with outdated scanner drivers that remained loaded.
  • What does Disk Utility think? Better yet, restart, hold down Command-R, and run Disk Utility from the Recovery HD so I can check the startup partition.

Hey, zapping the NVRAM (non-volatile RAM) that holds some system settings can’t hurt! Restart, Command-Option-P-R, bing, bing, bing.

None of my usual routine helped.

These techniques revealed nothing wrong or unusual. According to all measures I could take, something was undefinably out of whack, but invisible. There were additional tests I could run, but these hadn’t helped in years.

I was using third-party RAM to bring the mini to an unsupported 16GB (for the model I have anyway; new Mac minis do support that). The last time RAM was a culprit, it was with a titanium PowerBook and, I believe, Panther. In some cases, non-Apple RAM would fail. You can test RAM with Apple’s built-in diagnostic tools. However, the particular symptoms didn’t match, and RAM is unlikely to go bad – it happens, but it’s rare for RAM to fail this long after it’s put into use.

I could have turned to third-party disk diagnostic tools that promise (and, in some cases, deliver) to check various parameters that Disk Utility doesn’t, of which there are many, and provide a report and potentially a repair. But given that Disk Utility passed the drives repeatedly and the problems didn’t seem familiar from my history with failures over decades, it seemed unlikely.

Instead, I bought a new Mac.

Hey, big spender, I hear you saying. Buying a new computer is an extreme step, and not a way for everyone to solve their problems! This is true. However, as a freelance writer, not having a functional machine for my primary work means not making money. Having devoted many hours already of non-billable time, I ordered a new lower-end Mac mini from Apple, and picked it up Wednesday morning from a nearby Apple Store.

Here is where my strategy seemed to pay off in using the external SSD drive. I pulled out the old Mac mini, swapped cables, and booted holding down the Option key so I could pick the external drive. I was back in business!

Except not. I saw even worse problems with a machine four years newer and substantially better in performance than the computer it was replacing. Some tears may have ensued. After one restart into Recovery mode and more Disk Utility checking, it turned out an external FireWire 800 drive was, in fact, unrepairable. It was also four years old, and while drives can last much longer, it’s not unreasonable for a drive to fail after that period of time. It’s another reason that SSDs will ultimately overtake spinning drives, the same reasons LEDs are replacing incandescent bulbs: reliability and long life.

(When I shopped for a replacement for that 2TB FireWire 800 drive, which I used for backups, I wanted to get a higher capacity. However, I knew that some drive models had experienced high failure rates. I consulted Backblaze’s ‘What Is the Best Hard Drive?’ blog post from January. Backblaze uses off-the-shelf drives for its cloud storage, and has exceedingly useful information about outliers. I avoided the 3TB Seagate that they found problematic.)

Pulling that drive off the FireWire chain and swapping the original Mac mini back in brought me back to functional, speedy bliss.

For a day at least.

Defaulting to the wrong version

During my transition on Wednesday morning, I discovered I was missing some email. I chalked it up to transient crashes, and contacted people I knew had been in touch. I didn’t give it enough thought. Then on Thursday morning, I found other anomalies when trying to restart cloning my SSD to the internal mini drive.

SuperDuper had greyed out the internal drive. With some tech support help from SuperDuper’s Dave Nanian, I ran Disk Utility and tried to repair the interior drive. It told me that it couldn’t be repaired because the current user’s Home directory was on the drive.

I nearly lost it at this point. What had I done?! Why has my computer forsaken me?

I checked in Disk Utility and via System Profiler that the SSD was, indeed, the boot drive. Then I used Users & Groups to find out where OS X thought the home directory was located. When you unlock the padlock for that preference pane, you can then Control-click a user name and select the Advanced option. This reveals Unix details that can be changed at your peril.

In the Home Directory field it did, indeed, say /Users/glenn, the path that I thought it should have. I clicked Choose, and the directory the folder homed in on was the internal drive’s path. Not good. (Dave hadn’t seen this before, either.)

I switched the directory to the SSD’s path, restarted, confirmed the right path was in place (and painstakingly merged a day’s worth of out-of-sync mail), but the system remained funky. Disk Utility now could run a repair operation on the internal drive, and while it checked out, it took a long, long time with a number of pauses. Perhaps the internal drive is also about to join the choir invisible? I unmounted it.

Finally, I thought, I can get back to business. But it wasn’t to be. While the system remained overall zippy, I was having pauses while typing in any program every 10 to 30 seconds and a rainbow spinning cursor. Here we go again.

After much more investigation and hoping that I wasn’t facing a defect in my new SSD, I looked into third-party software I have installed that might relate to input. Yosemite had caused some true inexplicable problems for some people with outdated software, including our own Ted Landau, who had horrible problems because an outdated audio component. (I had the same issue and debugged it with help from Twitter colleagues before finding Ted’s article!)

TextExpander was up-to-date, but Default Folder was not: I had 4.7.0, not the latest 4.7.1 installed. While the release notes didn’t mention any of the problems I had, I’m always an outlier, and the interaction of many different systems can cause trouble. I installed 4.7.1. And the problems went away for several hours.

Then they returned.

Let’s shine a light on it

I finally brought out the big guns and ran the system diagnostics. After churning away with the ‘extended’ checkbox selected, the report was – everything was fine! Of course it was: swapping in a different computer didn’t seem to improve matters.

I did some more testing with ejecting drives and swapping cables, and finally went with a distant hope. Perhaps Spotlight was the culprit. In earlier versions of OS X, I’d often seen Spotlight go nuts and drive all the cores in the CPU, spin up the fan, and wreak havoc. But according to everything I’d checked, including the simply watching disk activity lights, mdworker and other associated background processes weren’t to blame.

You can disable Spotlight for specific folders or volumes by exposing the Privacy tab in the Spotlight preference pane and either dragging items or use the + to add them through a navigation dialog. I dragged my drives in, including the startup volume, and click OK to acknowledge that some programs’ search feature might be disabled.

And everything got better for a while, but the hiccups continued. Everything worked fine, but I still had inexplicable blurps for a moment or seconds at a time during which the rainbow spun. I purchased and tried out a Thunderbolt 2 dock which would create USB 3.0 and passthrough DisplayPort and HDMI output, thinking perhaps an internal bus or speed was an issue, but didn’t realise that older Macs with USB 2.0 support won’t correctly boot USB 3.0 drives over a USB 3.0 adapter! So back that went to the store.

However, it was during that final swap that something snapped: FireWire stopped working entirely. This finally provided the missing clue. Moving entirely to USB 2.0 produced a temporarily completely working system with no hiccups and no other trouble – it’s just slower than I’d like. My suspicion is that the SATA controller used for the internal hard drive and the FireWire controller for two external ports is linked in some way, and some component went bad. This would explain why the internal drive stopped working correctly and FireWire external devices have also gone wonky. (It’s unclear from iFixIt’s teardown of this model what’s in use for SATA drives.)

I decided that the Mac had to be replaced, because goodness knows what else was rattling around in there. I purchased a build-to-order replacement with 16GB – the 2014 Mac mini has soldered-on-the-board RAM, preventing user upgrades from 8GB – and when it arrived, I shut down my current system, swapped all the cables and powered up.

At which point, the external SSD failed. Disk Utility was unable to resuscitate it.

Dear readers, you may think at this juncture, I gave up entirely and moved to the country to raise flowers and sob. But, instead, I was able to restore a clone of the SSD that had been made automatically the night before, reboot and finally get back to business. (The drive is under warranty, and being sent back for replacement, after which point I hope to clone to it and resume my speedy external drive usage.)

Restarts will continue until morale improves

Do I have a moral of the story? You can see one: I should definitely get and be using more advanced disk diagnostic software, having had one to three drives fail on a system I use every day and then having what appears to be two entire interface types failing. Testing for faults also involves removing variables. When I first started having weirdness that I couldn’t pin down on the system or software, I should have isolated drives, even though they exhibited no specific symptoms.

I should also be keeping my software up-to-date, but the 4.7.1 update of Default Folder came out only a day before I installed it, and I’ve been using 4.7.0 in Yosemite since it came out without problems. That wasn’t the culprit, but it’s a good thing to test for.

Unfortunately, I appear to have hit a blind spot in troubleshooting: without having any sign that a peripheral controller was failing, just odd pauses, there was no clear direction to move until an actual total failure occurred. All I know is that I’m deeply exhausted, mildly relieved, and throwing myself back into the breach again on your behalf.

One Comment

One person was compelled to have their say. We encourage you to do the same..

  1. Shane says:

    That was the most boring story ever.
    My 6yr old mac is still running fine cause I do weekly checks and if anything is wrong, there are usually diagnostics/fixers which sort the problem or give you a heads up. If your equipment is failing to that extent, replace that shit….. Many hours NOT spent on fixing and more on productive “money making” work is so much better.

Leave a Comment

Please keep your comments friendly on the topic.

Contact us