SHA-256 Explained: Hashes, Integrity Checks, Salts, and Misuse
SHA-256 Explained: Hashes, Integrity Checks, Salts, and Misuse
A practical guide to SHA-256 hashing, file integrity checks, password caveats, collision resistance, and why hashes are not encryption.
Original workflow visual
SHA-256 Explained: Hashes, Integrity Checks, Salts, and Misuse
Hash input
Review before moving forward
Compare digest
Review before moving forward
Know limits
Review before moving forward
A cryptographic hash function maps data of any size to a fixed-size output. SHA-256 outputs 256 bits, usually displayed as 64 hexadecimal characters. It is designed to be infeasible to reconstruct the original input from the digest or to find two different inputs with the same digest. In practice, you use the digest as a fingerprint of the input, not as a compressed copy.
Software downloads often publish a SHA-256 checksum. After downloading the file, you hash your local copy and compare the digest. If the values match, the file likely arrived unchanged from the version the publisher hashed. This does not prove the publisher is trustworthy; it proves consistency with the published digest. If the checksum and file come from the same compromised page, the check can still be misleading.
The same input always produces the same SHA-256 digest. That is useful for detecting duplicates and verifying content, but it also means low-entropy inputs can be guessed. If someone hashes common words, short codes, or predictable IDs, another person can hash guesses and compare results. A hash hides the original from casual viewing, but it does not magically make weak input strong.
Plain SHA-256 is fast, which is good for integrity checks and bad for password storage. Attackers can try many guesses quickly. Password systems need salts and password-hashing algorithms designed to be expensive to brute force, such as bcrypt, scrypt, or Argon2 depending on platform guidance. A salt prevents identical passwords from sharing the same digest and makes precomputed attacks less useful.
Encryption is reversible with a key. Hashing is not meant to be reversible. If you need to recover the original data, hashing is the wrong tool. If you need to prove data was not changed, hashing may be right. If you need authenticity, combine hashing with a signature or message authentication code. The question to ask is simple: should someone be able to get the original data back?
Two inputs that look similar can hash differently. Extra spaces, line endings, Unicode normalization, JSON key order, file metadata, and encoding choices all change bytes. A text area that normalizes line endings may produce a different digest from a file hash. When comparing digests, confirm exactly which bytes were hashed. For JSON or structured data, define a canonical representation if hashes need to match across systems.
For a file, hash the file bytes directly and compare with a checksum from a trusted channel. For text, decide the encoding and line endings before comparing. For API signatures, follow the provider's canonicalization rules exactly. For passwords, do not use raw SHA-256. For secrets in logs, remember that hashing predictable values may still allow guessing. Hashes are reliable when the input and purpose are precise.
Common Questions
No. SHA-256 is a one-way hash, not encryption. You can only compare guesses or known inputs.
They may differ in spaces, line endings, Unicode normalization, or encoding. Hashes operate on bytes.
Not by itself. Use a password hashing algorithm with salts and appropriate cost settings.
Hash the exact downloaded file bytes and compare the digest with a checksum from a trusted source. Do not copy text out of a checksum file by hand if avoidable, and do not rely on a checksum hosted only beside the file if the entire page could have been modified.
JSON can be serialized with different whitespace, key order, escaping, and number formatting while representing similar data. A hash sees bytes, not intent. Signing or hashing structured data requires a canonical byte representation so independent systems hash the same thing.
It can identify an exact byte sequence if someone already has the same file or can guess it. It does not reveal arbitrary unknown content by itself, but hashes of predictable inputs can be looked up or brute-forced. Treat hashes of sensitive low-entropy data carefully.