import pandas as pd
Exploring Set Function in Python
The set()
function creates a collection of unique, unordered elements, automatically removing duplicates. It is useful for comparing data, grouping categories, and performing other operations within a dataset. By converting a column to a set, you can perform methods like union, intersection, and difference to analyze and compare data. Sets can be created using the set()
function or by enclosing elements in curly braces {}
.
Importing Library & Dataset
= pd.read_csv('C:/Users/SANKHYA/Documents/dataset.csv')
df df.head()
Order_ID | Order_Date | Customer_ID | Category | Sub_Category | Sales | Region | |
---|---|---|---|---|---|---|---|
0 | CA-2018-106103 | 10-06-18 | SC-20305 | Technology | Accessories | 132.52 | Central |
1 | CA-2018-102407 | 09-12-18 | AT-10435 | Office Supplies | Art | 11.16 | West |
2 | CA-2018-117947 | 18-08-18 | NG-18355 | Furniture | Furnishings | 40.48 | East |
3 | CA-2018-152485 | 04-09-18 | JD-15790 | Office Supplies | Art | 13.12 | Central |
4 | CA-2018-153339 | 03-11-18 | DJ-13510 | Furniture | Furnishings | 15.99 | South |
Let’s try few methods of set() function:
Using Union: Combines all unique elements from two sets.
# Define two sets
= set(df[df['Category']=='Technology']['Sub_Category'])
products_tech = set(df[df['Category']=='Furniture']['Sub_Category'])
products_furniture
# Union of both sets
= products_tech.union(products_furniture)
union_prod print("Union of Sub-Categories (Technology and Furniture):", union_prod)
Union of Sub-Categories (Technology and Furniture): {'Furnishings', 'Machines', 'Chairs', 'Phones', 'Accessories', 'Copiers', 'Tables', 'Bookcases'}
Using Intersection: Finds elements common to both sets.
# Define two sets
= set(df[df['Region']=='East']['Customer_ID'])
cust_region1 = set(df[df['Region']=='South']['Customer_ID'])
cust_region2
# Intersection of customers from both regions
= cust_region1.intersection(cust_region2)
overlap_cust print("Customers in Both East and South Regions:", overlap_cust)
Customers in Both East and South Regions: {'MD-17860', 'AH-10075', 'CA-12265', 'RD-19810', 'BT-11680', 'AS-10090'}
Using Difference: Identifies elements present in one set but not in the other.
# Define two sets
= set(df[df['Category']=='Office Supplies']['Sub_Category'])
products_office = set(df[df['Category']=='Technology']['Sub_Category'])
products_tech
# Difference of sub-categories: sub-categories in Office Supplies but not in Technology
= products_office.difference(products_tech)
difference_prod print("Sub-Categories in Office Supplies but not in Technology:", difference_prod)
Sub-Categories in Office Supplies but not in Technology: {'Appliances', 'Labels', 'Art', 'Storage', 'Fasteners', 'Envelopes', 'Supplies', 'Paper', 'Binders'}
Conclusion:
The set()
function in Python is a versatile tool for managing collections of unique elements, making it ideal for tasks requiring data comparison and filtering. Its support for methods like union, intersection, and difference simplifies the process of comparing and merging data, making it easier to uncover insights. By using these techniques, you can efficiently handle unique elements and relationships within your datasets. Incorporating set operations into your data analysis toolkit will enhance your ability to work with diverse data scenarios.