Optimizing your python code with slots?

Mohammad Abu Mattar
Python
Published: 16 Jul, 2025
03 Mins read

Dev Tip: Optimizing Data Models in Big Data Workflows with `slots`

In big data and MLOps workflows, you often work with massive datasets where you create millions of objects to represent data points, features, or model predictions. Traditional Python classes can consume a significant amount of memory due to the hidden __dict__ attribute. This overhead can lead to memory bottlenecks and slow down processing, especially when you’re working with large-scale data pipelines and machine learning models.

The __slots__ attribute is a simple yet powerful way to optimize your code by drastically reducing the memory footprint of your objects. By defining __slots__, you tell Python to allocate a fixed memory space for a class’s attributes, bypassing the need for a dynamic __dict__. This makes your application more memory-efficient and can lead to faster attribute access, which is crucial for high-performance computing tasks in MLOps and big data.

Real-World Scenario: A Data Pipeline for MLOps

Imagine you’re building a data pipeline for a machine learning model that predicts customer churn. Your pipeline processes millions of customer records, and each record is represented by a Python object.

Using a regular class, each customer object would have a memory overhead from its __dict__. When you process millions of these objects, the cumulative memory usage becomes immense, potentially crashing your application or requiring more expensive, higher-memory machines.

By using a class with __slots__, you can create a memory-efficient data model. This approach minimizes memory consumption, allowing you to process more data with the same resources and speeding up your pipeline. This is critical in MLOps where efficient resource utilization is key to managing costs and scaling operations.

Real-World Scenario: MLOps Data Pipeline with `slots`

In an MLOps pipeline, data is often represented as objects for processing, feature engineering, and model training. When dealing with big data, memory efficiency is critical to avoid crashes and keep cloud computing costs low. Using __slots__ can drastically reduce the memory footprint of these data objects, making your pipeline more robust and scalable.

The following full code snippet simulates a big data MLOps workflow where we process a large number of customer records. We will create two versions of a Customer data class: one with the default Python behavior and one optimized with __slots__. The code will then compare the memory usage and time taken to create a million instances of each, demonstrating the practical benefits of __slots__.

1
import sys
2
import time
3
import random
4
import pandas as pd
5
from typing import List
6

7
# --- Part 1: Data Model Definitions ---
8

9
class Customer:
10
    """
11
    A regular Python class to represent a customer record.
12
    This class uses a default __dict__ to store attributes.
13
    """
14
    def __init__(self, customer_id: int, age: int, monthly_spend: float, churned: bool):
15
        self.customer_id = customer_id
16
        self.age = age
17
        self.monthly_spend = monthly_spend
18
        self.churned = churned
19

20
class OptimizedCustomer:
21
    """
22
    An optimized customer class using __slots__ for memory efficiency.
23
    This class explicitly defines its attributes, eliminating the __dict__ overhead.
24
    """
25
    __slots__ = ['customer_id', 'age', 'monthly_spend', 'churned']
26

27
    def __init__(self, customer_id: int, age: int, monthly_spend: float, churned: bool):
28
        self.customer_id = customer_id
29
        self.age = age
30
        self.monthly_spend = monthly_spend
31
        self.churned = churned
32

33
# --- Part 2: Data Generation and Object Creation ---
34

35
def generate_customer_data(num_records: int) -> List[tuple]:
36
    """Generates a list of tuples representing raw customer data."""
37
    data = []
38
    for i in range(num_records):
39
        customer_id = i
40
        age = random.randint(20, 70)
41
        monthly_spend = round(random.uniform(25.0, 500.0), 2)
42
        churned = random.choice([True, False])
43
        data.append((customer_id, age, monthly_spend, churned))
44
    return data
45

46
def create_objects(data: List[tuple], class_type: type) -> List:
47
    """Creates a list of objects from raw data using the specified class."""
48
    return [class_type(*record) for record in data]
49

50
# --- Part 3: Performance Comparison ---
51

52
def run_performance_test(num_records: int):
53
    """
54
    Runs a performance test to compare memory and time for both classes.
55
    """
56
    print(f"--- Running performance test with {num_records:,} records ---")
57
    raw_data = generate_customer_data(num_records)
58

59
    # Test the regular class
60
    start_time_regular = time.time()
61
    regular_customers = create_objects(raw_data, Customer)
62
    end_time_regular = time.time()
63
    memory_regular = sum(sys.getsizeof(c) for c in regular_customers) + sys.getsizeof(regular_customers)
64

65
    # Test the optimized class
66
    start_time_slotted = time.time()
67
    slotted_customers = create_objects(raw_data, OptimizedCustomer)
68
    end_time_slotted = time.time()
69
    memory_slotted = sum(sys.getsizeof(c) for c in slotted_customers) + sys.getsizeof(slotted_customers)
70

71
    # Print results
72
    print("\n✅ Regular Class Performance:")
73
    print(f"  - Total Memory: {memory_regular / (1024**2):.2f} MB")
74
    print(f"  - Time Taken: {end_time_regular - start_time_regular:.4f} seconds")
75

76
    print("\n✅ Slotted Class Performance:")
77
    print(f"  - Total Memory: {memory_slotted / (1024**2):.2f} MB")
78
    print(f"  - Time Taken: {end_time_slotted - start_time_slotted:.4f} seconds")
79

80
    # Calculate and print the savings
81
    memory_saved_mb = (memory_regular - memory_slotted) / (1024**2)
82
    time_saved_s = (end_time_regular - start_time_slotted)
83

84
    print("\n--- Summary of Savings ---")
85
    print(f"🚀 Memory Saved: {memory_saved_mb:.2f} MB ({ (memory_saved_mb / (memory_regular / (1024**2))) * 100:.2f}%)")
86
    print(f"⏱️ Slotted Class Creation Time is faster by: {time_saved_s:.4f} seconds")
87

88
    # Optional: Demonstrate a simple MLOps task like converting to a DataFrame
89
    print("\n--- MLOps Task: Converting to a Pandas DataFrame ---")
90
    # This task is just for demonstration and doesn't show a direct slots benefit here
91
    df_regular = pd.DataFrame([c.__dict__ for c in regular_customers])
92
    df_slotted = pd.DataFrame([[c.customer_id, c.age, c.monthly_spend, c.churned] for c in slotted_customers],
93
                               columns=['customer_id', 'age', 'monthly_spend', 'churned'])
94
    print("Successfully converted both object lists to Pandas DataFrames.")
95
    print(f"DataFrame head from Slotted Class:\n{df_slotted.head()}")
96

97
# --- Part 4: Execution ---
98
if __name__ == "__main__":
99
    NUM_RECORDS = 1_000_000 # 1 million records
100
    run_performance_test(NUM_RECORDS)

This code snippet demonstrates how to optimize memory usage in Python by using __slots__ in a data-heavy application, specifically in an MLOps context. It compares the memory and performance of a regular class versus an optimized class with __slots__, showing significant savings in both memory and processing time when handling large datasets.

Dev Tip: Optimizing Data Models in Big Data Workflows with __slots__

Real-World Scenario: A Data Pipeline for MLOps

Real-World Scenario: MLOps Data Pipeline with __slots__

Dev Tip: Optimizing Data Models in Big Data Workflows with `slots`

Real-World Scenario: MLOps Data Pipeline with `slots`