In a peer-group that I am a member of recently we’ve had a small discussion about monitoring the SMART status of hard drives. We all agreed that the issue with SMART monitoring is that often it is unreliable when using RMM systems. This is due to RMM systems using only the Windows SMART output which lacks some critical values you should monitor. SMART itself could be a pretty decent early warning system when using all values supplied.
To resolve this, I’ve created a set that uses CrystalDiskInfo. A tool made by CrystalMark which presents the values to you in a nice overview. We’ve used this in the past to troubleshoot or check disks for predictive failures manually, but figured we should also try the same automated. This piece of PowerShell makes SMART monitoring more agile and reliable, because we alert on more information than just the predicted failure values.
The script relies on Invoke-expression, and expand-archive, as such at least Windows 8.1 will be required.
The script
As always, the script is self-explanatory. Please upload the zip file to your own web server or location to where the latest version of CrystalDiskInfo is hosted. This also creates a folder in program program files directory and unzips itself there.
#Replace the Download URL to where you've uploaded the ZIP file yourself. We will only download this file once. $DownloadURL = "http://rwthaachen.dl.osdn.jp/crystaldiskinfo/71535/CrystalDiskInfo8_3_0.zip" $DownloadLocation = "$($Env:ProgramFiles)\CrystalDiskInfo" #Script: $TestDownloadLocation = Test-Path $DownloadLocation if(!$TestDownloadLocation){ new-item $DownloadLocation -ItemType Directory -force Invoke-WebRequest -Uri $DownloadURL -OutFile "$($DownloadLocation)\CrystalDiskInfo.zip" Expand-Archive "$($DownloadLocation)\CrystalDiskInfo.zip" -DestinationPath $DownloadLocation -Force } #We start CrystalDiskInfo with the COPYEXIT parameter. This just collects the SMART information in DiskInfo.txt Start-Process "$($Env:ProgramFiles)\CrystalDiskInfo\DiskInfo64.exe" -ArgumentList "/CopyExit" -wait $DiskInfoRaw = get-content "$($Env:ProgramFiles)\CrystalDiskInfo\DiskInfo.txt" | select-string "-- S.M.A.R.T. --------------------------------------------------------------" -Context 0,16 $diskinfo = $DiskInfoRaw -split "`n" | select -skip 2 | Out-String | convertfrom-csv -Delimiter " " -Header "NOTUSED1","NOTUSED2","ID","RawValue" | Select-Object ID,RawValue [int64]$CriticalWarnings = "0x" + ($diskinfo | Where-Object { $_.ID -eq "01"}).rawvalue [int64]$CompositeTemp = "0x" + ($diskinfo | Where-Object { $_.ID -eq "02"}).rawvalue -273.15 [int64]$AvailableSpare = "0x" +($diskinfo | Where-Object { $_.ID -eq "03"}).rawvalue [int64]$ControllerBusyTime ="0x" + ($diskinfo | Where-Object { $_.ID -eq "0A"}).rawvalue [int64]$PowerCycles ="0x" + ($diskinfo | Where-Object { $_.ID -eq "0B"}).rawvalue [int64]$PowerOnHours = "0x" + ($diskinfo | Where-Object { $_.ID -eq "0C"}).rawvalue [int64]$UnsafeShutdowns = "0x" +($diskinfo | Where-Object { $_.ID -eq "0D"}).rawvalue [int64]$IntegrityErrors ="0x" + ($diskinfo | Where-Object { $_.ID -eq "0E"}).rawvalue [int64]$InformationLogEntries ="0x" + ($diskinfo | Where-Object { $_.ID -eq "0F"}).rawvalue
The output variables will always contain data, this data can be used to threshold against in your RMM system. The thresholds I would use are:
- $CriticalWarnings = 0
- $CompositeTemp = 55 (this is 55 degrees celsius)
- $AvailableSpare = 50 (This means there are 50 reallocation blocks available. This is extremely preventive so you might want to tune it to your personal preference)
- $ControllerBusyTime = Not monitored, currently only log this for reporting purposes
- $PowerCycles = Not monitored, currently only log this for reporting purposes
- $PowerOnHours = 40000 (This is around 5 years of constant runtime.)
- $UnsafeShutdowns = 365 (I like to know if users are not shutting down their computers normally. This could also point at other software related problems.)
- $IntegrityErrors = 1 (This is what Windows normally reports on. We want to know as soon as these issues arise)
- $InformationLogEntries = 1 (How many events have been generated related to disk SMART events)
I hope this helps MSPs that are having issues with SMART monitoring in their RMM systems, anyway – As always, Happy PowerShelling!
FYI the output values all choke for me…
—–
Directory: C:\Program Files
Mode LastWriteTime Length Name
—- ————- —— —-
d—– 2019-09-11 13:10 CrystalDiskInfo
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:19 char:1
+ [int64]$CriticalWarnings = “0x” + ($diskinfo | Where-Object { $_.ID – …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Cannot convert value “0x” to type “System.Double”. Error: “Input string was not in a correct format.”
At C:\test.ps1:20 char:1
+ [int64]$CompositeTemp = “0x” + ($diskinfo | Where-Object { $_.ID -eq …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToDoubleOrSingle
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:21 char:1
+ [int64]$AvailableSpare = “0x” +($diskinfo | Where-Object { $_.ID -eq …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:22 char:1
+ [int64]$ControllerBusyTime =”0x” + ($diskinfo | Where-Object { $_.ID …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:23 char:1
+ [int64]$PowerCycles =”0x” + ($diskinfo | Where-Object { $_.ID -eq “0B …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:25 char:1
+ [int64]$UnsafeShutdowns = “0x” +($diskinfo | Where-Object { $_.ID -eq …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:26 char:1
+ [int64]$IntegrityErrors =”0x” + ($diskinfo | Where-Object { $_.ID -eq …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Cannot convert value “0x” to type “System.Int64”. Error: “Index was out of range. Must be non-negative and less than
the size of the collection.
Parameter name: startIndex”
At C:\test.ps1:27 char:1
+ [int64]$InformationLogEntries =”0x” + ($diskinfo | Where-Object { $_. …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
Hi Rob,
That’s strange. I haven’t see that happen yet. Can you run the script and pass me the following values?
$DiskInfoRaw
$diskinfo | format-table
($diskinfo | Where-Object { $_.ID -eq “01”}).rawvalue
That’ll help me see the issue, and possibily resolve it for you. 🙂
I got the output variables working by removing the prepended “0x”.
Bigger problem – your hard-coded IDs don’t match what CrystalDiskInfo is using on my system. For example, here’s $DiskInfoRaw showing the ID for Temperature is C2.
> — S.M.A.R.T. ————————————————————–
ID Cur Wor Thr RawValues(6) Attribute Name
05 100 100 _10 000000000000 Reallocated NAND Blocks
09 100 100 __0 00000000037B Power On Hours
0C 100 100 __1 000000000369 Power Cycle Count
B5 100 100 __1 000000000000 Unaligned Access Count
B6 100 100 __1 000000000000 Vendor Specific
B1 _98 _98 _10 000000000027 Vendor Specific
BB 100 100 __1 000000000000 Reported Uncorrectable Errors
C2 _70 _52 __0 0030000C001E Temperature
C7 100 100 __0 000000000000 Ultra DMA CRC Error Rate
EE _98 _98 __0 000000000002 Vendor Specific
AF 100 100 __0 000000000000 Vendor Specific
B0 100 100 __0 000000000000 Vendor Specific
B2 100 100 __0 000000000000 Vendor Specific
B4 __0 __0 __0 0000000007FB Unused Reserve NAND Blocks
C3 100 __0 _50 000000000000 Cumulative ECC Bit Correction Count
But your code looks for ID of “02”, doesn’t find it, and then subtracts 273 (so your script shows my system at absolute zero — that’s cool! 🙂
[int64]$CompositeTemp = “0x” + ($diskinfo | Where-Object { $_.ID -eq “02”}).rawvalue -273.15
Comments crossed paths — here’s your requested output:
> — S.M.A.R.T. ————————————————————–
ID Cur Wor Thr RawValues(6) Attribute Name
05 100 100 _10 000000000000 Reallocated NAND Blocks
09 100 100 __0 00000000037B Power On Hours
0C 100 100 __1 000000000369 Power Cycle Count
B5 100 100 __1 000000000000 Unaligned Access Count
B6 100 100 __1 000000000000 Vendor Specific
B1 _98 _98 _10 000000000027 Vendor Specific
BB 100 100 __1 000000000000 Reported Uncorrectable Errors
C2 _70 _52 __0 0030000C001E Temperature
C7 100 100 __0 000000000000 Ultra DMA CRC Error Rate
EE _98 _98 __0 000000000002 Vendor Specific
AF 100 100 __0 000000000000 Vendor Specific
B0 100 100 __0 000000000000 Vendor Specific
B2 100 100 __0 000000000000 Vendor Specific
B4 __0 __0 __0 0000000007FB Unused Reserve NAND Blocks
C3 100 __0 _50 000000000000 Cumulative ECC Bit Correction Count
ID RawValue
— ——–
05 100
09 100
0C 100
B5 100
B6 100
B1 _98
BB 100
C2 _70
C7 100
EE _98
AF 100
B0 100
B2 100
B4 __0
C3 100
I think this is due to it bieing an NVME drive. I’ll have to research this more and get back to you later. Can you send me your drive type? It’ll help me lots. 🙂
Sure thing:
Model : Micron 1100 SATA 256GB
Firmware : M0DL022
Serial Number : xxxxxx
Disk Size : 256.0 GB (8.4/137.4/256.0/—-)
Buffer Size : Unknown
Queue Depth : 32
# of Sectors : 500118192
Rotation Rate : —- (SSD)
Interface : Serial ATA
Major Version : ACS-3
Minor Version : ACS-3 Revision 4
Transfer Mode : SATA/600 | SATA/600
Power On Hours : 891 hours
Power On Count : 873 count
Temperature : 32 C (89 F)
Health Status : Good (100 %)
Features : S.M.A.R.T., 48bit LBA, NCQ, TRIM, DevSleep
APM Level : —-
AAM Level : —-
Drive Letter : C:
I saw this post’s screenshot again on https://www.cyberdrain.com/functional-powershell-for-msps-webinar/ — any luck troubleshooting this script?
Pingback: Monitoring with PowerShell: Monitoring SMART status using SmartCTL. - CyberDrain
Great script but it does not run well on devices with more than 1 HDD.
Any pointers on making that work?