Python programmers use hashing to transform input data into a fixed-size value. This value represents the data uniquely, and the hashing technique makes it easy to transmit and store various forms of data securely.
Hashing protects data from unauthorized access and tampering. It’s an essential ingredient in data integrity and security use cases.
This article explores everything you need to know about hashing in Python. It dives into hashing uses and highlights various hashing algorithms that make your code more efficient, secure, and reliable.
What Is Hashing in Python?
Hashing converts input data, such as a string, file, or object, into a fixed-size string of bytes. The hash or digest represents the input in a unique and reproducible way.
Hashing plays a significant role in detecting data manipulation and enhancing security. It can compute a hash value for a file, message, or other piece of data. An application stores the hash securely to verify later that the data has not been tampered with.
One of the most common uses of hashing in security is password storage. Hashing is a viable alternative to storing plain text passwords in a database. When a user enters their password, the system hashes it before storing it in the database. If a hacker accesses the database, they’ll find that the password is hard to steal.
Python hashing functions make all this possible. These mathematical functions let an application manipulate data into hash values.
How To Make an Effective Hashing Function
A hashing function should meet the following criteria to be effective and safe:
- Deterministic — Given the same input, the function should always return the same output.
- Efficient — It should be computationally efficient when calculating the hash value of any given input.
- Collision resistant — The function should minimize the chance of two inputs making the same hash value.
- Uniform — The function’s outputs should be uniformly distributed across the range of possible hash values.
- Non-invertible — It should be unlikely for a computer to calculate the function’s input value based on the hash value.
- Non-predictable — Predicting the function’s outputs should be challenging, given a set of inputs.
- Sensitive to input changes — The function should be sensitive to minor differences in input. Slight changes should cause a big difference in the resulting hash value.
Hashing Use Cases
Once you have an adequate hashing function with all these characteristics, you can apply it to various use cases. Hashing functions work well for:
- Password storage — Hashing is one of the best ways to store user passwords in modern systems. Python combines various modules to hash and secure passwords before storing them in a database.
- Caching — Hashing stores a function’s output to save time when calling it later.
- Data retrieval — Python uses a hash table with a built-in dictionary data structure to quickly retrieve values by key.
- Digital signatures — Hashing can verify the authenticity of messages that have digital signatures.
- File integrity checks — Hashing can check a file’s integrity during its transfer and download.
Python’s Built-In Hashing Function
Python’s built-in hashing function, hash()
, returns an integer value representing the input object. The code then uses the resulting hash value to determine the object’s location in the hash table. This hash table is a data structure that implements dictionaries and sets.
The code below demonstrates how the hash()
function works:
my_string = "hello world"
# Calculate the hash value of the string
hash_value = hash(my_string)
# Print the string and its hash value
print("String: ", my_string)
print("Hash value: ", hash_value)
If we save that code in a file named hash.py, we can execute it (and see the output) like this:
% python3 hash.py
String: hello world
Hash value: 2213812294562653681
Let’s run that again:
% python3 hash.py
String: hello world
Hash value: -631897764808734609
The hash value is different when invoked a second time because recent releases of Python (versions 3.3 and up), by default, apply a random hash seed for this function. The seed changes on each invocation of Python. Within a single instance, the results will be identical.
For example, let’s put this code in our hash.py file:
my_string = "hello world"
# Calculate 2 hash values of the string
hash_value1 = hash(my_string)
hash_value2 = hash(my_string)
# Print the string and its hash values
print("String: ", my_string)
print("Hash value 1: ", hash_value1)
print("Hash value 2: ", hash_value2)
When executed, we see something like this:
String: hello world
Hash value 1: -7779434013116951864
Hash value 2: -7779434013116951864
Limitations of Hashing
Although Python’s hash function is promising for various use cases, its limitations make it unsuitable for security purposes. Here’s how:
- Collision attacks — A collision occurs when two different inputs produce the same hash value. An attacker could use the same input-making method to bypass security measures that rely on hash values for authentication or data integrity checks.
- Limited input size — Since hash functions produce a fixed-sized output regardless of the input’s size, an input larger in size than the hash function’s output can cause a collision.
- Predictability — A hash function should be deterministic, giving the same output every time you provide the same input. Attackers might take advantage of this weakness by precompiling hash values for many inputs, and then comparing them to target value hashes to find a match. This process is called a rainbow table attack.
To prevent attacks and keep your data safe, use secure hashing algorithms designed to resist such vulnerabilities.
Using hashlib for Secure Hashing in Python
Instead of using the built-in Python hash()
, use hashlib for more secure hashing. This Python module offers a variety of hash algorithms to hash data securely. These algorithms include MD5, SHA-1, and the more secure SHA-2 family, including SHA-256, SHA-384, SHA-512, and others.
MD5
The widely used cryptographic algorithm MD5 reveals a 128-bit hash value. Use the code like that below to generate an MD5 hash using hashlib‘s md5
constructor:
import hashlib
text = "Hello World"
hash_object = hashlib.md5(text.encode())
print(hash_object.hexdigest())
The output of the above (in our hash.py file) will be consistent across invocations:
b10a8db164e0754105b7a99be72e3fe5
Note: The hexdigest()
method in the code above returns the hash in a hexadecimal format safe for any non-binary presentation (such as email).
SHA-1
The SHA-1 hash function secures data by making a 160-bit hash value. Use the code below with the sha1
constructor for the hashlib module’s SHA-1 hash:
import hashlib
text = "Hello World"
hash_object = hashlib.sha1(text.encode())
print(hash_object.hexdigest())
The output of the above:
0a4d55a8d778e5022fab701977c5d840bbc486d0
SHA-256
There are various hash options in the SHA-2 family. The hashlib SHA-256 constructor generates a more secure version in that family with a 256-bit hash value.
Programmers often use SHA-256 for cryptography, like digital signatures or message authentication codes. The code below demonstrates how to generate a SHA-256 hash:
import hashlib
text = "Hello World"
hash_object = hashlib.sha256(text.encode())
print(hash_object.hexdigest())
The output of the above:
a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
SHA-384
SHA-384 is a 384-bit hash value. Programmers often use the SHA-384 function in applications needing more data security.
Based on the previous examples, you can probably guess that this is a statement that will generate a SHA-384 hash:
hash_object = hashlib.sha384(text.encode())
SHA-512
SHA-512 is the most secure member of the SHA-2 family. It makes a 512-bit hash value. Programmers use it for high-throughput applications, such as checking data integrity. The code below shows how to generate a SHA-512 hash with the hashlib module in Python:
hash_object = hashlib.sha512(text.encode())
How To Choose a Hashing Algorithm
Since these algorithms differ, select your hashing algorithm based on your use case and its security requirements. Here are some steps to follow:
- Understand the use case — Your use case determines what kind of algorithm to use. For example, when storing sensitive data such as passwords, your hashing algorithm should protect against brute-force attacks.
- Consider your security requirements — Your use case’s security requirements depend on the type of data you intend to store, and they determine what kind of algorithm to pick. For example, a robust hashing algorithm is best for storing highly sensitive information.
- Research the available hashing algorithms — Explore each hashing type to understand its strengths and weaknesses. This information helps you select the best option for your use case.
- Evaluate the selected hashing algorithm — Once you choose a hashing algorithm, evaluate whether it meets your security requirements. This process may involve testing it against known attacks or vulnerabilities.
- Implement and test the hashing algorithm — Finally, implement and test the algorithm thoroughly to ensure it functions correctly and securely.
How To Use Hashing for Password Storage
Hashing has excellent potential for storing passwords, a critical component of cybersecurity.
Ideally, the application hashes and stores passwords in a secure database to prevent unauthorized access and data breaches. However, hashing alone might not be enough to protect the information. Hashed passwords are still susceptible to brute force and dictionary attacks. Hackers commonly use these practices to guess passwords and gain unauthorized access to accounts.
A more secure way to use hashing for password storage involves the salting technique. Salting adds unique, random strings or characters to each password before hashing it. The salt is unique to each password, and the application stores it alongside the hashed password in the database.
Every time a user logs in, the application retrieves the salt from the database, adds it to the entered password, and then hashes the combined salt and password.
If an attacker gains access to the database, they must compute the hash for each password and each possible salt value. Salting makes these attacks more complex, so it’s a helpful technique to deter dictionary attacks.
Python’s secrets module makes salting easy. This module generates random salts, securely storing passwords and managing tokens and cryptographic keys.
The code below uses the hashlib library and secrets module to secure user passwords further:
import hashlib
import secrets
# Generate a random salt using the secrets module
salt = secrets.token_hex(16)
# Get the user's password from input
password = input("Enter your password: ")
# Hash the password using the salt and the SHA-256 algorithm
hash_object = hashlib.sha256((password + salt).encode())
# Get the hexadecimal representation of the hash
hash_hex = hash_object.hexdigest()
# Store the salt and hash_hex in your database
How To Use Hashing for Data Integrity Checks
Hashing also helps check data integrity and protect transmitted data from modification and tampering. This four-step technique uses a cryptographic hash function to give the file a unique hash value.
First, select the appropriate hash function and use it to generate a hash value for the input data. Store that hash value, then use it for comparison when needed. Whenever you need to verify the data’s integrity, the application generates the hash value of the current data using the same hash function. Then, the application compares the new hash value with the stored value to ensure they are identical. If so, the data is uncorrupted.
The hashed value is unique, and even a tiny change in the input data triggers a significantly different hash value. This makes it easy to detect any unauthorized changes or modifications to the transmitted data.
The steps below demonstrate using a hash function for data integrity checks.
Step 1: Import the hashlib Module
import hashlib
Step 2: Use a hashlib Hash Algorithm
def generate_hash(file_path):
# Open the file in binary mode
with open(file_path, "rb") as f:
# Read the contents of the file
contents = f.read()
# Generate the SHA-256 hash of the contents
hash_object = hashlib.sha256(contents)
# Return the hexadecimal representation of the hash
return hash_object.hexdigest()
Step 3: Call the Function and Pass in the File Path
file_path = "path/to/my/file.txt"
hash_value = generate_hash(file_path)
print(hash_value)
Step 4: Generate Hashes for the Original File and Transmitted or Modified File
# Generate the hash of the original file
original_file_path = "path/to/my/file.txt"
original_file_hash = generate_hash(original_file_path)
# Transmit or modify the file (for example, by copying it to a different location)
transmitted_file_path = "path/to/transmitted/file.txt"
# Generate the hash of the transmitted file
transmitted_file_hash = generate_hash(transmitted_file_path)
Step 5: Compare the Two Hashes
if original_file_hash == transmitted_file_hash:
print("The file has not been tampered with")
else:
print("The file has been tampered with")
Summary
Hashing is invaluable for data integrity and password security. You get the most out of a hashing function when you implement secure hashing techniques, such as using the hashlib module and salting.
These techniques help prevent rainbow attacks, collision attacks, and other security vulnerabilities that affect hashing. Programmers often use these techniques with hashing functions in Python to ensure the data integrity of files and store passwords securely.
Now that you’ve learned more about hashing techniques in Python use them to improve your own application’s security. Explore more Python articles on the Kinsta blog to grow your expertise, and then consider deploying your next Python application on Kinsta’s Application Hosting platform.