Sorting based on two fields in python

Python has a stable sort, so provided that performance isn't an issue the simplest way is to sort it by field 2 and then sort it again by field 1.

That will give you the result you want, the only catch is that if it is a big list (or you want to sort it often) calling sort twice might be an unacceptable overhead.

list1 = sorted(csv1, key=operator.itemgetter(2))
list1 = sorted(list1, key=operator.itemgetter(1))

Doing it this way also makes it easy to handle the situation where you want some of the columns reverse sorted, just include the 'reverse=True' parameter when necessary.

Otherwise you can pass multiple parameters to itemgetter or manually build a tuple. That is probably going to be faster, but has the problem that it doesn't generalise well if some of the columns want to be reverse sorted (numeric columns can still be reversed by negating them but that stops the sort being stable).

So if you don't need any columns reverse sorted, go for multiple arguments to itemgetter, if you might, and the columns aren't numeric or you want to keep the sort stable go for multiple consecutive sorts.

Edit: For the commenters who have problems understanding how this answers the original question, here is an example that shows exactly how the stable nature of the sorting ensures we can do separate sorts on each key and end up with data sorted on multiple criteria:

DATA = [
    ('Jones', 'Jane', 58),
    ('Smith', 'Anne', 30),
    ('Jones', 'Fred', 30),
    ('Smith', 'John', 60),
    ('Smith', 'Fred', 30),
    ('Jones', 'Anne', 30),
    ('Smith', 'Jane', 58),
    ('Smith', 'Twin2', 3),
    ('Jones', 'John', 60),
    ('Smith', 'Twin1', 3),
    ('Jones', 'Twin1', 3),
    ('Jones', 'Twin2', 3)
]

# Sort by Surname, Age DESCENDING, Firstname
print("Initial data in random order")
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
First we sort by first name, after this pass all
Twin1 come before Twin2 and Anne comes before Fred''')
DATA.sort(key=lambda row: row[1])

for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
Second pass: sort by age in descending order.
Note that after this pass rows are sorted by age but
Twin1/Twin2 and Anne/Fred pairs are still in correct
firstname order.''')
DATA.sort(key=lambda row: row[2], reverse=True)
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

print('''
Final pass sorts the Jones from the Smiths.
Within each family members are sorted by age but equal
age members are sorted by first name.
''')
DATA.sort(key=lambda row: row[0])
for d in DATA:
    print("{:10s} {:10s} {}".format(*d))

This is a runnable example, but to save people running it the output is:

Initial data in random order
Jones      Jane       58
Smith      Anne       30
Jones      Fred       30
Smith      John       60
Smith      Fred       30
Jones      Anne       30
Smith      Jane       58
Smith      Twin2      3
Jones      John       60
Smith      Twin1      3
Jones      Twin1      3
Jones      Twin2      3

First we sort by first name, after this pass all
Twin1 come before Twin2 and Anne comes before Fred
Smith      Anne       30
Jones      Anne       30
Jones      Fred       30
Smith      Fred       30
Jones      Jane       58
Smith      Jane       58
Smith      John       60
Jones      John       60
Smith      Twin1      3
Jones      Twin1      3
Smith      Twin2      3
Jones      Twin2      3

Second pass: sort by age in descending order.
Note that after this pass rows are sorted by age but
Twin1/Twin2 and Anne/Fred pairs are still in correct
firstname order.
Smith      John       60
Jones      John       60
Jones      Jane       58
Smith      Jane       58
Smith      Anne       30
Jones      Anne       30
Jones      Fred       30
Smith      Fred       30
Smith      Twin1      3
Jones      Twin1      3
Smith      Twin2      3
Jones      Twin2      3

Final pass sorts the Jones from the Smiths.
Within each family members are sorted by age but equal
age members are sorted by first name.

Jones      John       60
Jones      Jane       58
Jones      Anne       30
Jones      Fred       30
Jones      Twin1      3
Jones      Twin2      3
Smith      John       60
Smith      Jane       58
Smith      Anne       30
Smith      Fred       30
Smith      Twin1      3
Smith      Twin2      3

Note in particular how in the second step the reverse=True parameter keeps the firstnames in order whereas simply sorting then reversing the list would lose the desired order for the third sort key.

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

In this article, I will explain how to sort pandas DataFrame with one or multiple columns. By default sort_values() return a copy DataFrame with the result of the sort. To sort on current DataFrame use inplace=True.

If you are in a hurry, below are some quick examples of how to sort by multiple columns in pandas DataFrame.


# Below are quick example
# Sort multiple columns
df2 = df.sort_values(['Fee', 'Duration'],
              ascending = [False, True])

# Sort by two columns 
df2 = df.sort_values(['Courses', 'Discount'],
              ascending = [True, True])

# Using the sorting function
df.sort_values(["Fee", "Courses"],
               axis = 0, ascending = True,
               inplace = True,
               na_position = "first")

Let’s create a DataFrame with a few rows and columns and execute some examples to learn how sort works.


import pandas as pd
technologies = ({
    'Courses':["Spark","Hadoop","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,20000],
    'Duration':['30days','35days','40days','50days','60days'],
    'Discount':[1000,2300,1500,1200,2500]
               })
df = pd.DataFrame(technologies, index = ['r1','r2','r3','r4','r0'])
print(df)

Yields below output.


   Courses    Fee Duration  Discount
r1   Spark  20000   30days      1000
r2  Hadoop  25000   35days      2300
r3  pandas  26000   40days      1500
r4  Oracle  22000   50days      1200
r0    Java  20000   60days      2500

2. Sort Multiple Columns in pandas DataFrame

By using the sort_values() method you can sort multiple columns in DataFrame by ascending or descending order. When not specified order, all columns specified are sorted by ascending order.


# Sort multiple columns
df2 = df.sort_values(['Fee', 'Discount'])
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r1   Spark  20000   30days      1000
r0    Java  20000   60days      2500
r4  Oracle  22000   50days      1200
r2  Hadoop  25000   35days      2300
r3  pandas  26000   40days      1500

In case if you wanted to update the existing DataFrame use inplace=True.


# Sort ascending order
df.sort_values(by=['Fee','Discount'], inplace=True)
print(df)

Yields same output as above.

3. Sort in an Ascending Order

Use ascending param to sort the DataFrame in ascending or descending order. When you have multiple sorting columns. By default, it sorts in ascending order.


# Sort ascending order
df.sort_values(by=['Fee','Discount'], inplace=True,
               ascending = [True, True])
print(df)

Yields below output.


   Courses    Fee Duration  Discount
r1   Spark  20000   30days      1000
r0    Java  20000   60days      2500
r4  Oracle  22000   50days      1200
r2  Hadoop  25000   35days      2300
r3  pandas  26000   40days      1500

4. Sort Multiple Columns in Descending Order

In case you wanted to sort by descending order, use ascending=False. You can also specify different sorting orders for each input.


# Sort descending order
df.sort_values(by=['Fee','Discount'], inplace=True,
               ascending = [True, False])
print(df)

Yields below output.


   Courses    Fee Duration  Discount
r0    Java  20000   60days      2500
r1   Spark  20000   30days      1000
r4  Oracle  22000   50days      1200
r2  Hadoop  25000   35days      2300
r3  pandas  26000   40days      1500

Conclusion

In this article, you have learned how to sort a DataFrame by multiple columns using Dataframe.sort_values() by ascending or descending order.

Happy Learning !!

You May Also Like

  • How to Change Position of a Column in Pandas
  • Change the Order of DataFrame Columns in Pandas
  • Convert Float to Integer in Pandas DataFrame
  • Replace NaN with Blank/Empty String in Pandas
  • Pandas groupby() Explained With Examples

Reference

  • https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

How do I sort by two criteria in Python?

Use sorted() and operator. itemgetter() to sort a list by two fields. Call sorted(a_list, key = k) with k as operator. itemgetter(i, j) to sort the list by the i -th element and then by the j -th element.

How do you sort by two columns in Python?

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.

How do you sort by two values?

Example Problem: Array of arrays (finalArray) with first entry a folder path and second entry the file name; sort so that array is arranged by folder first, and within identical folders, by file name. finalArray. sort((x: any, y: any): number => { const folder1: string = x[0].

How do you sort multiple values in Python?

Sort a list of objects by multiple attributes in Python.
Using list. sort() function. A Pythonic solution to in-place sort a list of objects using multiple attributes is to use the list. sort() function. ... .
Using sorted() function. The list. sort() function modifies the list in-place..