NumPy-Pandas-LP-Basics#
1.1 Objective#
This module provides a comprehensive introduction to numerical computing and optimization using NumPy, Pandas, and Linear Programming. These three foundational tools are essential for data science, engineering, and operational research, forming the computational backbone for large-scale numerical analysis and decision-making problems. The primary goal is to equip learners with the ability to manipulate structured data efficiently, perform high-speed numerical computations, and model real-world optimization problems using Python.
NumPy: Introduces array-based computing for efficient numerical operations.
Pandas: Focuses on structured data manipulation and analysis.
Linear Programming (LP): Explores mathematical optimization techniques using GurobiPy.
This module integrates theoretical concepts with hands-on programming using Python, Jupyter Notebook, NumPy, Pandas, and GurobiPy, allowing learners to transition from fundamental numerical analysis to real-world optimization applications.
1.2 Key Components#
1. NumPy Fundamentals#
Creating NumPy Arrays
Creating a 1D Array
Creating a 2D Array
Creating Arrays with specific Values
Array Attributes
Array Shape
Number of Dimensions
Number of Elements
Data Type of Elements
Indexing and Slicing
Accessing Elements
Slicing
Mathematical Operations
element-wise operations
Reshaping and Transposing Arrays
Reshaping Arrays
Transposing Arrays
Statistical Functions
Basic statistics functions
Stacking and Concatenation
Horizontal Stacking
Vertical Stacking
Concatenation
2. Pandas for Data Analysis#
Creating Data Structures
Creating a Pandas Series
Creating a Pandas DataFrame
Reading and Writing Data
Reading from a CSV File
Writing to a CSV File
Data Inspection and Manipulation
Using of head( ), tail( ), info( ), describe( )
Select specific columns, rows, and filtering
Data Cleaning
Using of dropna( ) to handle missing data
Changing Data Types
Modifying Data
Adding New Columns
Renaming Columns
Sorting Data
Merging and Joining Data
Merging DataFrames
Concatenating DataFrames
Pivot Tables and Crosstabs
Creating a Pivot Table
Creating a Crosstab
3. Linear Programming basics#
Introduction to LP Optimization Models
Understanding the objective functions in LP
Understanding the constraints in LP
Using Gurobi for Linear Programming
Installing Gurobi
Defining decision variables
Set objective function
Add constraints
Solving Optimization Models
Implementing an LP solver using GurobiPy
Extracting and interpreting optimal solutions
1.3 Module Impact#
This module provides a structured approach to mastering numerical computing, data handling, and optimization modeling, making it highly relevant for engineers.
Efficient Data Processing: Participants develop proficiency in handling structured datasets with NumPy and Pandas, leveraging optimized numerical operations.
Analytical Thinking: By working with structured data manipulation, learners enhance their ability to extract insights and analyze trends.
Optimization Modeling: The integration of Linear Programming equips learners with tools to tackle decision-making problems in industry applications.
Hands-on Coding: The practical implementations in Jupyter Notebook reinforce learning through interactive problem-solving.
2. NumPy basics#
NumPy (Numerical Python) is a fundamental library for numerical computing in Python. It is widely used in scientific computing, machine learning, and data analysis.
If you haven’t installed NumPy, you can do so using:
pip install numpy
Then, import it in your Python script:
import numpy as np
2.1 Creating NumPy Arrays#
Creating a 1D Array#
A NumPy array, called ndarray
, can be created from a Python list using np.array()
.
# Creating a one-dimensional array
a = np.array([1, 2, 3, 4, 5])
print("One-dimensional array:\n", a)
One-dimensional array:
[1 2 3 4 5]
Creating a 2D Array#
A two-dimensional array (matrix) can be created by passing a list of lists.
# Creating a two-dimensional array
b = np.array([[1, 2, 3], [4, 5, 6]])
print("Two-dimensional array:\n", b)
Two-dimensional array:
[[1 2 3]
[4 5 6]]
Creating Arrays with specific Values#
NumPy provides functions to create special arrays.
A = np.zeros((2, 3)) # 2x3 array of zeros
print("Zeros:\n", A)
Zeros:
[[0. 0. 0.]
[0. 0. 0.]]
B = np.ones((3, 3)) # 3x3 array of ones
print("Ones:\n", B)
Ones:
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
C = np.eye(3) # 3x3 identity matrix
print("Identity Matrix:\n", C)
Identity Matrix:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
D = np.random.rand(2, 2) # 2x2 array of random numbers
print("Random Array:\n", D)
Random Array:
[[0.24884241 0.33598379]
[0.41051721 0.3514549 ]]
2.2 Array Attributes#
NumPy arrays have some important attributes that can help you understand their structure.
print("Array Shape:", D.shape)
Array Shape: (2, 2)
print("Number of Dimensions:", D.ndim)
Number of Dimensions: 2
print("Number of Elements:", D.size)
Number of Elements: 4
print("Data Type of Elements:",D.dtype)
Data Type of Elements: float64
2.3 Indexing and Slicing#
NumPy allows you to access elements and slices of arrays easily.
### Accessing Elements
E = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Element at (1,2):", E[2, 2]) # Acess the third row, third column
Element at (1,2): 9
### Slicing
print("Second column:", E[:, 1]) # Select second column
Second column: [2 5 8]
print("Sub-matrix:\n", E[0:2, 2:3]) # Select sub-matrix
Sub-matrix:
[[3]
[6]]
2.4 Mathematical Operations#
NumPy supports element-wise operations.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print("Addition:", a+ b)
Addition: [5 7 9]
print("Multiplication:", a * b)
Multiplication: [ 4 10 18]
2.5 Reshaping and Transposing Arrays#
NumPy allows changing the shape of arrays without modifying data.
### Reshaping Arrays
F = np.arange(1, 10)
G = np.arange(1, 10).reshape(3, 3)
print("Original Array:\n", F)
print("Reshaped Array:\n", G)
Original Array:
[1 2 3 4 5 6 7 8 9]
Reshaped Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
### Transposing Arrays
print("Transposed Array:\n", G.T)
Transposed Array:
[[1 4 7]
[2 5 8]
[3 6 9]]
2.6 Statistical Functions#
NumPy provides functions for basic statistics.
print("Maximum Value:", G.max())
Maximum Value: 9
print("Minimum Value:", G.min())
Minimum Value: 1
print("Mean Value:", G.mean())
Mean Value: 5.0
print("Sum of Elements:", G.sum())
Sum of Elements: 45
print("Column-wise Sum:", G.sum(axis=0))
Column-wise Sum: [12 15 18]
print("Row-wise Sum:", G.sum(axis=1))
Row-wise Sum: [ 6 15 24]
2.7 Stacking and Concatenation#
NumPy allows combining multiple arrays in different ways.
### Horizontal Stacking
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
hstacked = np.hstack((a, b))
print("Horizontally Stacked:\n", hstacked)
Horizontally Stacked:
[[1 2 5 6]
[3 4 7 8]]
### Vertical Stacking
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
vstacked = np.vstack((a, b))
print("Vertically Stacked:\n", vstacked)
Vertically Stacked:
[[1 2]
[3 4]
[5 6]
[7 8]]
### Concatenation
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
concatenated = np.concatenate((a, b), axis=0) # Concatenate along rows
print("Concatenated:\n", concatenated)
Concatenated:
[[1 2]
[3 4]
[5 6]
[7 8]]
Conclusion#
Mastering NumPy is essential for data science, machine learning, and numerical computing, providing an efficient way to handle large datasets.
3. Pandas basics#
Pandas is a powerful and flexible data analysis and manipulation library for Python. It is widely used in data science and any scenario where structured data processing is needed.
Installing Pandas#
If you haven’t installed Pandas yet, you can do so using:
pip install pandas
Then, import it in your Python script:
import pandas as pd
3.1 Creating Data Structures#
Creating a Pandas Series#
The two primary data structures in Pandas are:
Series: A one-dimensional labeled array.
DataFrame: A two-dimensional table-like structure with labeled rows and columns.
A Pandas Series is similar to a column in an Excel spreadsheet. It consists of an array of data with an associated index.
# Creating a pandas Series
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)
a 10
b 20
c 30
d 40
dtype: int64
Creating a Pandas DataFrame#
A DataFrame is a two-dimensional table with labeled rows and columns.
# Creating a pandas DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print(df)
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
3.2 Reading and Writing Data#
Pandas can read from and write to various file formats such as CSV, Excel, etc.
### Reading from a CSV File
import os
current_dir = os.getcwd()
csv_path = os.path.join(current_dir, "numpy-pandas-lp-basics-data", "data.csv")
df = pd.read_csv(csv_path)
print(df)
Name Age Salary
0 A 25.0 50000.0
1 B 30.0 60000.0
2 C 25.0 70000.0
3 D 40.0 80000.0
4 E 30.0 55000.0
5 F 45.0 65000.0
6 G 45.0 90000.0
7 H 50.0 100000.0
8 I NaN NaN
### Writing to a CSV File
df = df.drop(columns=['Salary']) #Delete the last column
output_path = os.path.join(current_dir, "numpy-pandas-lp-basics-data", "output.csv")
df.to_csv(output_path, index=False)
df = pd.read_csv(output_path)
print(df)
Name Age
0 A 25.0
1 B 30.0
2 C 25.0
3 D 40.0
4 E 30.0
5 F 45.0
6 G 45.0
7 H 50.0
8 I NaN
3.3 Data Inspection and Manipulation#
You can use head( ), tail( ), info( ), describe( ) to show the First 5 rows, Last 5 rows, Summary of DataFrame, and Statistical summary.
print(df.head()) # First 5 rows
Name Age
0 A 25.0
1 B 30.0
2 C 25.0
3 D 40.0
4 E 30.0
print(df.tail()) # Last 5 rows
Name Age
4 E 30.0
5 F 45.0
6 G 45.0
7 H 50.0
8 I NaN
print(df.info()) # Summary of DataFrame
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 9 non-null object
1 Age 8 non-null float64
dtypes: float64(1), object(1)
memory usage: 272.0+ bytes
None
print(df.describe()) # Statistical summary
Age
count 8.000000
mean 36.250000
std 9.910312
min 25.000000
25% 28.750000
50% 35.000000
75% 45.000000
max 50.000000
You can select specific data:
print(df['Name']) # Select a column
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
8 I
Name: Name, dtype: object
print(df[['Name', 'Age']]) # Select multiple columns
Name Age
0 A 25.0
1 B 30.0
2 C 25.0
3 D 40.0
4 E 30.0
5 F 45.0
6 G 45.0
7 H 50.0
8 I NaN
print(df.loc[0]) # Select a row by index
Name A
Age 25.0
Name: 0, dtype: object
print(df.iloc[1]) # Select a row by numerical position
Name B
Age 30.0
Name: 1, dtype: object
You can also filter data easily:
filtered_df = df[df['Age'] > 30] # Select rows where Age > 30
print(filtered_df)
Name Age
3 D 40.0
5 F 45.0
6 G 45.0
7 H 50.0
3.4 Data Cleaning#
Pandas has useful functions to clean data, such as Handling Missing data, Changing Data Types, etc.
### Handling Missing Values
df.dropna(inplace=True) # Remove rows with NaN values
print(df)
Name Age
0 A 25.0
1 B 30.0
2 C 25.0
3 D 40.0
4 E 30.0
5 F 45.0
6 G 45.0
7 H 50.0
### Changing Data Types
df['Age'] = df['Age'].astype(int) # Convert Age column to integer
print(df)
Name Age
0 A 25
1 B 30
2 C 25
3 D 40
4 E 30
5 F 45
6 G 45
7 H 50
3.5 Modifying Data#
By leveraging Pandas, you can easily add new columns, rename colums, sort data, etc.
### Adding a New Column
import os
current_dir = os.getcwd()
csv_path = os.path.join(current_dir, "numpy-pandas-lp-basics-data", "data.csv")
df = pd.read_csv(csv_path)
df['Bonus'] = df['Salary'] * 0.10 # Add a new column
print(df)
Name Age Salary Bonus
0 A 25.0 50000.0 5000.0
1 B 30.0 60000.0 6000.0
2 C 25.0 70000.0 7000.0
3 D 40.0 80000.0 8000.0
4 E 30.0 55000.0 5500.0
5 F 45.0 65000.0 6500.0
6 G 45.0 90000.0 9000.0
7 H 50.0 100000.0 10000.0
8 I NaN NaN NaN
### Renaming Columns
df.rename(columns={'Name': 'Employee Name'}, inplace=True)
print(df)
Employee Name Age Salary Bonus
0 A 25.0 50000.0 5000.0
1 B 30.0 60000.0 6000.0
2 C 25.0 70000.0 7000.0
3 D 40.0 80000.0 8000.0
4 E 30.0 55000.0 5500.0
5 F 45.0 65000.0 6500.0
6 G 45.0 90000.0 9000.0
7 H 50.0 100000.0 10000.0
8 I NaN NaN NaN
### Sorting Data
df.sort_values(by='Age', ascending=False, inplace=True)
print(df)
Employee Name Age Salary Bonus
7 H 50.0 100000.0 10000.0
5 F 45.0 65000.0 6500.0
6 G 45.0 90000.0 9000.0
3 D 40.0 80000.0 8000.0
1 B 30.0 60000.0 6000.0
4 E 30.0 55000.0 5500.0
0 A 25.0 50000.0 5000.0
2 C 25.0 70000.0 7000.0
8 I NaN NaN NaN
3.6 Merging and Joining Data#
Pandas allows you to combine data from multiple DataFrames.
### Merging DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Salary': [50000, 60000, 70000]})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)
ID Name Salary
0 1 Alice 50000
1 2 Bob 60000
2 3 Charlie 70000
### Concatenating DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)
A B
0 1 3
1 2 4
0 5 7
1 6 8
### 3.7 Pivot Tables and Crosstabs
Pandas allows for powerful data summarization using pivot tables.
### Creating a Pivot Table
pivot = df.pivot_table(values='Salary', index='Age', aggfunc='mean')
print(pivot)
Salary
Age
25.0 60000.0
30.0 57500.0
40.0 80000.0
45.0 77500.0
50.0 100000.0
### Creating a Crosstab
crosstab = pd.crosstab(df['Age'], df['Salary'])
print(crosstab)
Salary 50000.0 55000.0 60000.0 65000.0 70000.0 80000.0 90000.0 \
Age
25.0 1 0 0 0 1 0 0
30.0 0 1 1 0 0 0 0
40.0 0 0 0 0 0 1 0
45.0 0 0 0 1 0 0 1
50.0 0 0 0 0 0 0 0
Salary 100000.0
Age
25.0 0
30.0 0
40.0 0
45.0 0
50.0 1
Conclusion#
This section introduced the fundamental concepts of Pandas, covering data structures, reading and writing files, data manipulation, cleaning. Pandas is an essential tool for anyone working with structured data in Python, making data analysis more efficient and intuitive.
4. Linear Programming basics#
Linear Programming (LP) is a mathematical technique used to optimize a linear objective function, subject to linear equality and inequality constraints. It is widely used in various fields.
Introduction to LP Optimization Models:
LP problems generally take the form:
The objective function:
Subject to Constraints:
where:
\(f(x)\) is the objective function to be optimized,
\(x_1, x_2, ..., x_n\) are decision variables,
\(b_1, b_2, ..., b_n\) are coefficients in the objective function,
\(K\) is a matrix of constraint coefficients,
\(x\) is a vector of decision variables,
\(a\) is a vector of constraint bounds.
Using Gurobi for Linear Programming#
Gurobi [1] is a powerful optimization solver widely used for solving optimization problems. The Gurobipy [2] Python library provides a user-friendly interface to define and solve optimization models. The Gurobipy package includes a trial license, which allows users to solve problems of limited size [2]. However, students and faculty affiliated with academic institutions are eligible for a free, full-featured license [2].
Installing Gurobi#
If you haven’t installed Gurobipy yet, use:
pip install gurobipy
Implementing a Linear Program with Gurobi#
Problem Definition:#
Consider a simple LP problem:
Linear Programming Problem#
Maximize:
Subject to:
We will solve this problem using Gurobi in Python.
First, you need to import the entire gurobipy library and import the GRB constants from gurobipy.
import gurobipy as grb
from gurobipy import GRB
Then, you need to create a new optimization model instance, define the decision variables, and set up the objective function.
# Create a new optimization model
model = grb.Model("LP")
# Define decision variables
x = model.addVar(name="x", vtype=GRB.CONTINUOUS, lb=0)
y = model.addVar(name="y", vtype=GRB.CONTINUOUS, lb=0)
# Set objective function
model.setObjective(2*x + 3*y, GRB.MAXIMIZE)
Set parameter Username
Academic license - for non-commercial use only - expires 2026-02-10
Next, you can define the constraints:
# Add constraints
model.addConstr(3*x + 2*y <= 11, "Constraint 1")
model.addConstr(x + y <= 4, "Constraint 2")
<gurobi.Constr *Awaiting Model Update*>
Now, you can invoke Gurobi to solve this optimization problem:
# Optimize the model
model.optimize()
Gurobi Optimizer version 12.0.1 build v12.0.1rc0 (win64 - Windows 11.0 (26100.2))
CPU model: 12th Gen Intel(R) Core(TM) i5-12500H, instruction set [SSE2|AVX|AVX2]
Thread count: 12 physical cores, 16 logical processors, using up to 16 threads
Optimize a model with 2 rows, 2 columns and 4 nonzeros
Model fingerprint: 0x9496ece8
Coefficient statistics:
Matrix range [1e+00, 3e+00]
Objective range [2e+00, 3e+00]
Bounds range [0e+00, 0e+00]
RHS range [4e+00, 1e+01]
Presolve time: 0.01s
Presolved: 2 rows, 2 columns, 4 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 5.0000000e+30 3.250000e+30 5.000000e+00 0s
1 1.2000000e+01 0.000000e+00 0.000000e+00 0s
Solved in 1 iterations and 0.01 seconds (0.00 work units)
Optimal objective 1.200000000e+01
# Display results
if model.status == GRB.OPTIMAL:
print(f"Optimal solution: x = {x.x}, y = {y.x}")
print(f"Optimal objective value: Z = {model.objVal}")
else:
print("No optimal solution found.")
Optimal solution: x = 0.0, y = 4.0
Optimal objective value: Z = 12.0
After running the above script, Gurobi will output the optimal values of \(x\) and \(y\), along with the optimal objective function value.
Conclusion#
Linear Programming is a mathematical optimization technique used to find the best outcome in a given system. Gurobi is a powerful solver that can efficiently handle optimization problems. This section serves as a foundation for more advanced optimization problems, including real-world applications such as grid optimization.
References#
[1] Gurobi Optimization, LLC, “Gurobi Optimizer Reference Manual,” 2024.
[2] Gurobi Optimization, LLC, “GurobiPy: Python interface for the Gurobi Optimizer,” PyPI, [Online]. Available: https://pypi.org/project/gurobipy/