Often you want to sort Pandas data frame in a specific way. Typically, one may want to sort pandas data frame based on the values of one or more columns or sort based on the values of row index or row names of pandas dataframe. Pandas data frame has two useful functions
Each of these functions come with numerous options, like sorting the data frame in specific order (ascending or descending), sorting in place, sorting with missing values, sorting by specific algorithm and so on.
Here is a quick Pandas tutorial on multiple ways of using sort_values() and sort_index() to sort pandas data frame using a real data set (gapminder).
Let us first load the gapminder data from software carpentry URL.
| 1 2 3 4 5 | # read data from url as pandas dataframegapminder =pd.read_csv(data_url)# print the first three rowsprint(gapminder.head(n=3)) | 
We can sort pandas dataframe based on the values of a single column by specifying the column name wwe want to sort as input argument to sort_values(). For example, we can sort by the values of “lifeExp” column in the gapminder data like
| 1 | sort_by_life =gapminder.sort_values(‘lifeExp‘) | 
| 1 2 3 4 5 | print(sort_by_life.head(n=3))          country  year        pop continent  lifeExp   gdpPercap1292Rwanda  19927290203.0Africa   23.599737.0685950Afghanistan  19528425333.0Asia   28.801779.445314552Gambia  1952284320.0Africa   30.000485.230659 | 
Note that by default sort_values sorts and gives a new data frame. The new sorted data frame is in ascending order (small values first and large values last). With head function we can see that the first rows have smaller life expectancy. Using tail function the sorted data frame, we can see that the last rows have higher life expectancy.
| 1 2 3 4 5 | print(sort_by_life.tail(n=3))             country  year          pop continent  lifeExp    gdpPercap802Japan  2002127065841.0Asia   82.00028604.59190671Hong Kong China  20076980412.0Asia   82.20839724.97867803Japan  2007127467972.0Asia   82.60331656.06806 | 
2. How to Sort Pandas Dataframe based on the values of a column (Descending order)?
To sort a dataframe based on the values of a column but in descending order so that the largest values of the column are at the top, we can use the argument ascending=False.
| 1 | sort_by_life =gapminder.sort_values(‘lifeExp‘,ascending=False) | 
In this example, we can see that after sorting the dataframe by lifeExp with ascending=False, the countries with largest life expectancy are at the top.
| 1 2 3 4 5 | print(sort_by_life.head(n=3))             country  year          pop continent  lifeExp    gdpPercap803Japan  2007127467972.0Asia   82.60331656.06806671Hong Kong China  20076980412.0Asia   82.20839724.97867802Japan  2002127065841.0Asia   82.00028604.59190 | 
Often a data frame might contain missing values and when sorting a data frame on a column with missing value, we might want to have rows with missing values to be at the first or at the last.
We can specify the position we want for missing values using the argument na_position. With na_position=’first’, it will have the rows with missing values first.
| 1 | sort_na_first =gapminder.sort_values(‘lifeExp‘,na_position=‘first‘) | 
In this example, there are NO missing values and that is why there is no na values at the top when sorted with the option na_position=’first’.
| 1 2 3 4 5 | sort_na_first.head()          country  year        pop continent  lifeExp   gdpPercap1292Rwanda  19927290203.0Africa   23.599737.0685950Afghanistan  19528425333.0Asia   28.801779.445314552Gambia  1952284320.0Africa   30.000485.230659 | 
By default sorting pandas data frame using sort_values() or sort_index() creates a new data frame. If you don’t want create a new data frame after sorting and just want to do the sort in place, you can use the argument “inplace = True”. Here is an example of sorting a pandas data frame in place without creating a new data frame.
| 1 | gapminder.sort_values(‘lifeExp‘, inplace=True, ascending=False) | 
We can see that the data frame sorted as lifeExp values at the top are smallest and the row indices are not in order.
| 1 2 3 4 5 | print(gapminder.head(n=3))          country  year        pop continent  lifeExp   gdpPercap1292Rwanda  19927290203.0Africa   23.599737.0685950Afghanistan  19528425333.0Asia   28.801779.445314552Gambia  1952284320.0Africa   30.000485.230659 | 
Note that, the row index of the sorted data frame is different from the data frame before sorting.
We can use sort_index() to sort pandas dataframe to sort by row index or names. In this example, row index are numbers and in the earlier example we sorted data frame by lifeExp and therefore the row index are jumbled up. We can sort by row index (with inplace=True option) and retrieve the original dataframe.
| 1 | gapminder.sort_index(inplace=True) | 
Now we can see that row indices start from 0 and sorted in ascending order. Compare it to the previous example, where the first row index is 1292 and row indices are not sorted.
| 1 2 3 4 5 | print(gapminder.head(n=3))       country  year         pop continent  lifeExp   gdpPercap0Afghanistan  19528425333.0Asia   28.801779.4453141Afghanistan  19579240934.0Asia   30.332820.8530302Afghanistan  196210267083.0Asia   31.997853.100710 | 
Often, you might want to sort a data frame based on the values of multiple columns. We can specify the columns we want to sort by as a list in the argument for sort_values(). For example, to sort by values of two columns, we can do.
| 1 | sort_by_life_gdp =gapminder.sort_values([‘lifeExp‘,‘gdpPercap‘]) | 
We can see that lifeExp column is sorted in ascending order and for each values of lifeExp, gdpPercap is sorted.
| 1 2 3 4 5 | print(sort_by_life_gdp.head())          country  year        pop continent  lifeExp   gdpPercap1292Rwanda  19927290203.0Africa   23.599737.0685950Afghanistan  19528425333.0Asia   28.801779.445314552Gambia  1952284320.0Africa   30.000485.230659 | 
Note that when sorting by multiple columns, pandas sort_value() uses the first variable first and second variable next. We can see the difference by switching the order of column names in the list.
| 1 | sort_by_life_gdp =gapminder.sort_values([‘gdpPercap‘,‘lifeExp‘]) | 
| 1 2 3 4 5 | print(sort_by_life_gdp.head())             country  year         pop continent  lifeExp   gdpPercap334Congo Dem. Rep.  200255379852.0Africa   44.966241.165877335Congo Dem. Rep.  200764606759.0Africa   46.462277.551859876Lesotho  1952748747.0Africa   42.138298.846212 | 
6 ways to Sort Pandas Dataframe: Pandas Tutorial
原文:https://www.cnblogs.com/andy-0212/p/11450955.html