逸之

V1

2022/01/06阅读:32主题:红绯

Understand Pandas Indexes

Understand Pandas Indexes

To efficiently use Pandas, ignore its documentation and learn the truth about indexes

Carl M. Kadie[

Carl M. Kadie

](https://medium.com/@carlmkadie?source=post_page-----1b94f5c078c6-----------------------------------)[

Sep 30, 2020·4 min read

](/understand-pandas-indexes-1b94f5c078c6?source=post_page-----1b94f5c078c6-----------------------------------)

![Panda at Beijing Zoo, China](https://miro.medium.com/max/1400/0*Di-YKBc5Ya_X0Y-N)
Photo by [Damian Patkowski](https://unsplash.com/@damianpatkowski?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

The Python Pandas library is a great tool for data manipulation. However, it is only efficient if you understand Pandas indexing. Pandas indexing is the key to accessing and joining rows in seconds instead of minute or hours.

Indexes

Like a Python dictionary (or a relational database’s index), Pandas indexing provides a fast way to turn a key into a value. For example, we can create a dataframe with index alpha:

![](https://miro.medium.com/max/60/1*_5xKRFdyq1hKv-WncfnPdw.png?q=20)![](https://miro.medium.com/max/1400/1*_5xKRFdyq1hKv-WncfnPdw.png)

and then turn the key b into the row of interest.

![](https://miro.medium.com/max/60/1*LbY-k4aEW555mD4r1qRTXA.png?q=20)![](https://miro.medium.com/max/1400/1*LbY-k4aEW555mD4r1qRTXA.png)

But what kind of thing is a Panda index? The documentation says an index is an

Immutable ndarray implementing an ordered, sliceable set (emphasis added)

In other words, a kind of mathematical set. Recall that mathematical set has these two important properties:

  • No repeated elements
  • Elements are unordered

But now, look at a second example:

![](https://miro.medium.com/max/60/1*iuzXEmWbx8xM2s_MwzrVrw.png?q=20)![](https://miro.medium.com/max/1400/1*iuzXEmWbx8xM2s_MwzrVrw.png)
![](https://miro.medium.com/max/60/1*Bcf07vU7KOsCUYGybme_bw.png?q=20)![](https://miro.medium.com/max/1400/1*Bcf07vU7KOsCUYGybme_bw.png)

We again turn the alpha column turned into an index. The element x, however, appears twice and the retrieved the rows respect the order of the two x’s. This illustrates that with a Pandas index:

  • Elements may be repeated
  • Elements are ordered

So, contrary to the Pandas documentation, a Pandas index is not a mathematical set. Instead, it is a kind of list. Specifically, a Pandas index is

  • A (kind of) list of hashable elements, where
  • the position(s) of elements can be found quickly.

With this knowledge, we can easily understand the basics of indexes, starting with their creation, deletion, and manipulation.

Manipulating Indexes

The examples above showed how to turn a column into an index with .set_index(). We can turn the index back into a column with .reset_index():

![](https://miro.medium.com/max/60/1*_VrLjtGMRYlnQvXAosOv1g.png?q=20)![](https://miro.medium.com/max/1400/1*_VrLjtGMRYlnQvXAosOv1g.png)

Let us put the index back and then look at all the elements inside the index. The property is .index.values. As expected, elements are a kind of list, specifically, a NumPy array.

![](https://miro.medium.com/max/60/1*6tTXJK7WPcc6s4ZCGq3_mQ.png?q=20)![](https://miro.medium.com/max/1400/1*6tTXJK7WPcc6s4ZCGq3_mQ.png)

We also expect to be able to quickly find the row number(s) corresponding to any index element. The method is .index.get_loc(). The result will be an integer or bool array, depending on the number of rows.

![](https://miro.medium.com/max/60/1*tS2tKSUiY-0TvmMM2n-mlw.png?q=20)![](https://miro.medium.com/max/1400/1*tS2tKSUiY-0TvmMM2n-mlw.png)
![](https://miro.medium.com/max/60/1*Qw0FxBRnPrabXGAgWPXrUw.png?q=20)![](https://miro.medium.com/max/1400/1*Qw0FxBRnPrabXGAgWPXrUw.png)

Row Access

The main way to access rows with index elements is .loc[…] (note the square brackets), where the input can be a:

  • single element
  • list of elements
  • slice of elements

The rows will be output in the order they appear in the input. This example shows each kind of input.

![](https://miro.medium.com/max/60/1*mS9GR-_QAYP6jQ__fknG0g.png?q=20)![](https://miro.medium.com/max/1400/1*mS9GR-_QAYP6jQ__fknG0g.png)
![](https://miro.medium.com/max/60/1*9DkcsrfrpUlJ0R2NM_5LPw.png?q=20)![](https://miro.medium.com/max/1400/1*9DkcsrfrpUlJ0R2NM_5LPw.png)
![](https://miro.medium.com/max/60/1*QGnOKMl_mihYRHdDf-9ACw.png?q=20)![](https://miro.medium.com/max/1400/1*QGnOKMl_mihYRHdDf-9ACw.png)
![](https://miro.medium.com/max/60/1*X0hfzEAE9eTfdktu9J4QQg.png?q=20)![](https://miro.medium.com/max/1400/1*X0hfzEAE9eTfdktu9J4QQg.png)

Note that unlike the rest of Python, the start:stop slice is inclusive of the stop value.

Joining Rows

Finally, let us look at joining two datafames. The rules are:

  • The left dataframe need not be indexed, but the right one does.
  • Give the left column(s) of interest in the join’s on input.

In this example, we will use join to add a “score” column to a dataframe. Here is the left dataframe. It isn't indexed.

![](https://miro.medium.com/max/60/1*aDqMI2k_nwmvY9yYsbrPcQ.png?q=20)![](https://miro.medium.com/max/1400/1*aDqMI2k_nwmvY9yYsbrPcQ.png)

The right dataframe needs an index, but it can be named anything. Here we call it alpha2.

![](https://miro.medium.com/max/60/1*2JILf7SYslBC6jhvSyNXog.png?q=20)![](https://miro.medium.com/max/1400/1*2JILf7SYslBC6jhvSyNXog.png)

We combine the two dataframes with a left join. We use column alpha from the first dataframe and whatever is indexed in the second dataframe. The result is a new dataframe with a score column.

![](https://miro.medium.com/max/60/1*WvdFrHk8DnDY78y7p63RaA.png?q=20)![](https://miro.medium.com/max/1400/1*WvdFrHk8DnDY78y7p63RaA.png)

Conclusion

We have seen that contrary to the documentation, a Pandas index is not a mathematical set. Instead, it is a kind of list with a fast way to find the position(s) of any element.

Understanding this makes it easy to understand how to create, remove and manipulate indexes. We can then use indexes to quickly access and join rows.

What’s next? With this foundation, you should next learn to create indexes from multiple columns, to apply set-like operators to indexes, and to efficiently delete rows. (Surprisingly, Pandas grouping and sorting does not need or use indexes.)

[

Carl M. Kadie

](https://medium.com/@carlmkadie?source=post_sidebar--------------------------post_sidebar--------------)

Ph.D. in CS and Machine Learning. Retired Microsoft & Microsoft Research. Volunteer, open-source projects related to ML and to Genomics.

FollowCarl M. Kadie Follows

  • Cory Doctorow[

    Cory Doctorow

    ](https://doctorow.medium.com/?source=blogrolls_sidebar-----1b94f5c078c6-----------------------------------)

  • TDS Editors[

    TDS Editors

    ](https://towardsdatascience.medium.com/?source=blogrolls_sidebar-----1b94f5c078c6-----------------------------------)

  • Lessig[

    Lessig

    ](https://medium.lessig.org/?source=blogrolls_sidebar-----1b94f5c078c6-----------------------------------)

  • Anne Vaeth[

    Anne Vaeth

    ](https://annemarievaeth.medium.com/?source=blogrolls_sidebar-----1b94f5c078c6-----------------------------------)

  • Guido van Rossum[

    Guido van Rossum

    ](https://medium.com/@gvanrossum_83706?source=blogrolls_sidebar-----1b94f5c078c6-----------------------------------)

See all (9)

60

Related

How To Change The Order Of DataFrame Columns In PandasJoining Pandas DataFramesLearn how to merge Pandas Dataframes easilyPython Tricks: How to Check Table Merging with PandasSelecting Multiple Columns From a Pandas DataFrame

Thanks to Linda Chen.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

Get this newsletter

60 claps

60

More from Towards Data Science

Follow

Your home for data science. A Medium publication sharing concepts, ideas and codes.

[

David Yaffe

](https://dyaffe1.medium.com/?source=follow_footer-----1b94f5c078c6----0-------------------------------)[

·Sep 30, 2020

](/putting-an-end-to-unreliable-analytics-386431bb4e56?source=follow_footer-----1b94f5c078c6----0-------------------------------)

Putting an end to Unreliable Analytics

[![](https://miro.medium.com/max/1400/1*D6dbm-K967JSMU16erRMuA.png)](https://towardsdatascience.com/putting-an-end-to-unreliable-analytics-386431bb4e56?source=follow_footer-----1b94f5c078c6----0-------------------------------)
Source: TheToonCompany from Pixaby

When building a product or service, it’s imperative to know that input data will be as expected. …

[

Read more · 3 min read

](/putting-an-end-to-unreliable-analytics-386431bb4e56?readmore=1&source=follow_footer-----1b94f5c078c6----0-------------------------------)

3


Share your ideas with millions of readers.

Write on Medium


[

Yong Cui

](https://yongcui01.medium.com/?source=follow_footer-----1b94f5c078c6----1-------------------------------)[

·Sep 30, 2020

](/how-to-calculate-the-number-of-parameters-in-keras-models-710683dae0ca?source=follow_footer-----1b94f5c078c6----1-------------------------------)

How to Calculate the Number of Parameters in Keras Models

Understand the summary of your Sequential Keras models

[![](https://miro.medium.com/max/1400/0*zJBL7B7k-V3-sfbD)](https://towardsdatascience.com/how-to-calculate-the-number-of-parameters-in-keras-models-710683dae0ca?source=follow_footer-----1b94f5c078c6----1-------------------------------)
Photo by [Gordon Williams](https://unsplash.com/@artifactflash?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

Introduction

Although new ML frameworks are emerging, keras models are still favorites for many data scientists. For new learners, one major question that they may have is to make sense of the model that they’re building by following particular tutorials. To put our discussion in a context, let’s suppose that we’re…

[

Read more · 5 min read

](/how-to-calculate-the-number-of-parameters-in-keras-models-710683dae0ca?readmore=1&source=follow_footer-----1b94f5c078c6----1-------------------------------)

58


[

Guillermo Barreiro

](https://medium.com/@guille_barreiro?source=follow_footer-----1b94f5c078c6----2-------------------------------)[

·Sep 30, 2020

](/analyzing-worldwide-cuisines-with-python-and-foursquare-api-e63455c14246?source=follow_footer-----1b94f5c078c6----2-------------------------------)

Analyzing Worldwide Cuisines With Python And Foursquare API

Analyzing which cuisines are most popular in the biggest cities in the world

[![](https://miro.medium.com/max/1400/0*OZYZQxgGlhyQUOVU)](https://towardsdatascience.com/analyzing-worldwide-cuisines-with-python-and-foursquare-api-e63455c14246?source=follow_footer-----1b94f5c078c6----2-------------------------------)
Photo by [Rachel Park](https://unsplash.com/@therachelstory?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

There are plenty of well-known cuisines around the world, like Chinese, Japanese, Mexican, Spanish… It might be obvious that Mexico is the best place to taste Mexican food, Thailand for Thai food, and so on, but it would be very tedious to visit so many countries to taste each different…

[

Read more · 8 min read

](/analyzing-worldwide-cuisines-with-python-and-foursquare-api-e63455c14246?readmore=1&source=follow_footer-----1b94f5c078c6----2-------------------------------)

14

[

1

](/analyzing-worldwide-cuisines-with-python-and-foursquare-api-e63455c14246?responsesOpen=true&source=follow_footer-----1b94f5c078c6----2-------------------------------)


[

Bharath K

](https://bharath-k1297.medium.com/?source=follow_footer-----1b94f5c078c6----3-------------------------------)[

·Sep 30, 2020

](/neural-networks-made-fun-with-tensorflow-playground-4e681a0c4529?source=follow_footer-----1b94f5c078c6----3-------------------------------)

Neural Networks Made Fun With TensorFlow Playground!

Using TensorFlow Playground to explore and have fun with Neural Networks

[![](https://miro.medium.com/max/10944/0*EymD2ZNrwlqKIO8v)](https://towardsdatascience.com/neural-networks-made-fun-with-tensorflow-playground-4e681a0c4529?source=follow_footer-----1b94f5c078c6----3-------------------------------)
Photo by [Peter Conlan](https://unsplash.com/@peterconlan?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

TensorFlow Playground is an extremely awesome website where you can visualize and intuitively understand how neural networks work. This website, developed by the TensorFlow team at Google, is one of the best platforms that will allow you to explore the powerful deep neural networks.

This short article will guide you…

[

Read more · 6 min read

](/neural-networks-made-fun-with-tensorflow-playground-4e681a0c4529?readmore=1&source=follow_footer-----1b94f5c078c6----3-------------------------------)

211


[

Taylor Stanley

](https://tcastanley.medium.com/?source=follow_footer-----1b94f5c078c6----4-------------------------------)[

·Sep 30, 2020

](/using-neural-networks-to-improve-waste-management-one-propagation-at-a-time-b2b7414d329d?source=follow_footer-----1b94f5c078c6----4-------------------------------)

Using Neural Networks to Improve Waste Management, One Propagation at a Time.

[![](https://miro.medium.com/max/1400/0*9Gxp-nrMbiZ76Nll)](https://towardsdatascience.com/using-neural-networks-to-improve-waste-management-one-propagation-at-a-time-b2b7414d329d?source=follow_footer-----1b94f5c078c6----4-------------------------------)
Pic 1\. Waste disposal site in [Thilafushi, Mal](https://www.shutterstock.com/g/MOHAMEDABDULRAHEEM)é. S[hutterstock - Mohamed Abdulraheem](https://www.shutterstock.com/g/MOHAMEDABDULRAHEEM)

Scope of the Problem

It’s definitely not pretty, and it’s not something brought up at the dinner table, though after each meal is over the process starts again, I’m talking about waste!

Waste management is the process of collecting and disposing of waste, and is an integral part of every society and one that…

分类:

后端

标签:

后端

作者介绍

逸之
V1