Python

파이썬 기초 - 02_pandas_Series

Black940514 2024. 8. 27. 11:08

Pandas

Pandas가 메인으로 데이터를 핸들링 할때 주로 사용하는 패키지
최근걸로 필터링하고 사용해라
pandas 자료형 : Seires, DataFrame
- 1차원 : Series
- 2차원 : DataFrame
- 3차원 : Pannel
- 내가 처리할 자료들이 어떤 형식인지
1D 을 표현하는 자료형 : Series
→ 직접적으로 사용할 일은 거의 없음!!!
Pandas를 개발한 사람들은 “차원”을 중심으로 여러 자료형!!!
1차원 벡터 → Series type
2차원 행렬 → DataFrame 자주 사용
3차원 텐서 → Pannel Type

동영상은 4차원의 데이터 → 전통적인 머신러닝의 한계

차원축소가 필요함 → 데이터가 뭉개질수도 있음 → 잘 이용해라

특징 : 인덱스를 내가 원하는 대로 ㅏㅁㄴ들 수 있음!
- 원하는 정보로 접근도 가능함
- 왜 굳이???
Pandas의 개발 히스토리
- 파이썬으로 금융 데이터 처리하고 싶은데
- 언제, 얼마,
- 장이 안열리는 날도 있음→ 날짜 중심의 데이터가 빵꾸
- 날짜를 다이렉트로 찾아야 될때도 있음
- 내가 원하는 것들도 컨트롤 하고 싶다
- 값에 직접 접근하고 싶다.
단점도 존재한다 → 시간 숫자 텍스트,,,,,,속도가 느림
→적당한 사이즈의 핸들링에는 매우 편함
큰 사이즈, 실시간 대용량 대응을 못함

ex) 삼성전자 주가 데이터를 처리하고 싶다

종가 1개만 생각하고,

80000, 83000, 79000, 82000,

→ 1차원 가격들만 모아둘 수 있는 (리스트, array,,,etc)

오늘 가격은 얼마?? → -1 인덱스

지난주 화욜은 얼마??

데이터 전처리의 기본은 에러가 아니라 내가 원하는 대로 됐냐

쌩 파이썬의 Dict와 ↔ Pandas와 유사함

⇒ 데이터 수집에서 서로 상호보완적인 관계

그래서 Dict로 데이터를 수집하고 한큐에 pandas로 변환하는 식으로 하기도 함

numpy의 기본적인 벡터 연산의 특징

→ 단순 순서가 중심의 벡터 연산이 아니라

index 중심의 벡터 연산으로 동작을 함

pandas에서는 index에 대해서는 대충 OR 처리하는 경우들이 있음

→ 원하지 않은 row

참고) 결측치 데이터에 대한 처리!!

→ 자주 사용되는 메서도

.isnull() : 결측값이냐? T/F
.notnull() : 정상값이냐? T/F

pandas는 FM적으로 명시적으로 바라보는 방법을 만들어 둠

값에 대한 접근을 바라보는 관점 : 1개 값 접근 vs 여러개 값

1개 값에 접근
- at : 내가 만든 인덱스로 1개 값 접근!!
- iat : 태생적 정수 인덱스 1개 값 접근!!
여러개 값에 접근
- loc : 내가 만든 인덱스로 여러개 값 접근
- iloc : 태생적 정수 인덱스 여러개 값 접근

참고) 수치연산을 하는 numpy를 포함 ⇒ 간단한 연산을 내포!!

정리

쌩 파이썬의 리스트 → numpy array → pandas Series

1D, index(정수, 내가 만든거 둘다 가능)

at/iat/loc/iloc

벡터연산을 기본내포(index중심)

In [ ]:

# pandas가 이제 앞으로 메인으로 데이터를 핸들링을 할 때 주로!!!
# 파이썬을 기반으로 만든 외부 패키지!!!!
# pandas도 기본적으로 자기가 필요한 자료형 : Series, DataFrame, Pannel
# ==> 1차원 자료를 처리할 자료형 : Series
#     2차원 자료를 처리할 자료형 : DataFrame
#     3차원 자료를 처리할 자료형 : Pannel
# ==> 내가 처리할 자료들이 어떤 차원의 자료형!!!!
#     일반적으로 주로 사용하는 것이 2D ==> DataFrame

In [ ]:

# 1D 을 표현하는 자료형 : Series
# ==> 직접적으로 사용할 일은 거의 없음!!!

In [ ]:

import numpy as np
import pandas as pd

In [ ]:

# 기본적으로 여러개의 값들을 다루는 자료형 중에 하나임!!!
# 쌩 파이썬의 대표 : 리스트/튜플/// set, Dict ,문자열,,,
# ==> upgrade : numpy의 array 자료형( 벡터연산이 되도록!!! )
# ==> upgrade : pandas

# pandas를 개발한 사람들은 "차원"을 중심으로 여러 자료형!!!!
# 1차원 : 1차원 벡터 ==> Series Type
# 2차원 : 2차원 행렬 ==> DataFrame Type *** 자주 사용되는 것 **
# 3차원 : 3차원 텐서 ==> Pannel Type
# 그 이상의 고차원은 안 함!!!!
# 그 이상의 고차원은 Tensor ==> TF/PyTorch : DL쪽이 중심!!!!!

# + 특징 : 인덱스를 내가 원하는 대로 만들 수 있음!!!
#          원하는 정보로 접근도 가능함!!!
#          ==> "인덱스"내가 만든 것도 있고,,
#              "인덱스"가 태생적인 정수쪽 인덱스도 있다!!!!
# pandas의 개발히스토리가
# 파이썬으로 금융 데이터 처리하고 싶어요!!!!!
# 언제, 얼마,
# 장이 안열리는 날도 있음...--> 날짜 중심의 데이터가 빵구..
# 2024-02-28
# 2024-03-02
# .....
# ? 그려면 2024-05-08의 데이터는 얼마?
list.index(값) --> list에서 위치 정보로 조회!!!
data[날짜]
# ==> 내가 원하는 정보를 바탕으로 값에 직접 접근하고 싶다!!!
#     순서 이외에 내가 원한느 정보를 바탕으로 접근하자!!!
#     (코드값, 날짜, 시간, 주민번호 등등등....)

# ==> 물론 단점도 있음 : 시간, 숫자, 텍스트,,,여러 자료형 다 처리
#                        속도가 상당히 느림!!!!
#                       ==> 적당항 사이즈의 데이터 핸들링에는 아주 편함!!!
# but) 큰 사이즈, 실시간 대용량 대응을 못 함!!!!!
#      ==> sql + gpu + spark etccccccccccccccc

In [ ]:

# 예) 삼성전자 주가 데이터를 처리하고 싶다!!!!
#     종가 1개만 생각하고
#     80000, 83000, 79000, 82000, ....
# ==> 1차원 가격들만 모아둘 수 있음!!!( 리스트, array etc)

# 오늘 가격은 얼마? ==> -1
# 지난 주 화욜은 얼마? ??? 몇 번쨰인지 고민을 해야함!!!
# ===> 주말,,,,중간 공휴일,,,,
# ===> 몇 번째 데이터로 접근하기에 상당히 불편하다!!!!!!
# ===> 정수인덱스 말고,,
#      내가 원하는 인덱스로 종가 데이터를 접근할 경로를 만들자!!!
#      (날짜가 그 중에 하나 후보!!!)

In [ ]:

# 예) 주식 가격 데이터만 있다고 가정!!!
# 값 : 10000, 10300, 9900, 10500, 11000

In [ ]:

# 생성1) 쌩 파이썬의 리스트
stock_price_list = [ 10000, 10300, 9900, 10500, 11000 ]
# 생성2) numpy --> array
stock_price_arr = np.array(stock_price_list)
# 생성3) pandas --> 1차원 --> Series 자료형을 생성 : S대문자..
stock_price_Series = pd.Series( stock_price_list )

In [ ]:

stock_price_list

Out[ ]:

[10000, 10300, 9900, 10500, 11000]

In [ ]:

stock_price_arr

Out[ ]:

array([10000, 10300,  9900, 10500, 11000])

In [ ]:

stock_price_Series

Out[ ]:

	0
0	10000
1	10300
2	9900
3	10500
4	11000

dtype: int64

In [ ]:

# 1번 주식 가격에 대한 접근
print( stock_price_list[0])
print( stock_price_arr[0])
print( stock_price_Series[0])

10000
10000
10000

In [ ]:

# 1번~3번 주식 가격에 대한 접근 : 슬라이싱
print( stock_price_list[0:4])
print( stock_price_arr[0:4])
print( stock_price_Series[0:4])

[10000, 10300, 9900, 10500]
[10000 10300  9900 10500]
0    10000
1    10300
2     9900
3    10500
dtype: int64

In [ ]:

# 내가 보고자 하는 것들이, 랜점하게 0번째 주가/4번째 주가만,,
stock_price_list[ [0,4]]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-4b39151459e7> in <cell line: 2>()
      1 # 내가 보고자 하는 것들이, 랜점하게 0번째 주가/4번째 주가만,,
----> 2 stock_price_list[ [0,4]]

TypeError: list indices must be integers or slices, not list

In [ ]:

stock_price_arr[ [0,4]]

Out[ ]:

array([10000, 11000])

In [ ]:

stock_price_Series[ [0,4]]

Out[ ]:

	0
0	10000
4	11000

dtype: int64

In [ ]:

# ==> 특별하게 나만의 인덱스를 지정하지 않으면,,
#     그냥 값들만 모아둔 친구라서,,태생적인 정수인덱스 기반만 가능!!!!
# +++ 내가 직접 인덱스를 만들자!!!!
# **** pandas는 꼭!!!! 인덱스를 달고 다닌다 *********

In [ ]:

stock_price_Series_index = pd.Series(
    # 정보 : 진짜 리얼한 데이터들,
    #        내가 원하는 값에 접근할 수 있는 인덱스!!!
    data = [10000, 10300, 9900, 10500, 11000],
    index = ["2024-08-01","2024-08-02","2024-08-03",
             "2024-08-04", "2024-08-5"]
)
stock_price_Series_index

Out[ ]:

	0
2024-08-01	10000
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500
2024-08-5	11000

dtype: int64

In [ ]:

stock_price_Series_index[0]

<ipython-input-15-582ae4b44d1a>:1: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  stock_price_Series_index[0]

Out[ ]:

In [ ]:

stock_price_Series_index[0:4]

Out[ ]:

	0
2024-08-01	10000
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500

dtype: int64

In [ ]:

stock_price_Series_index["2024-08-02"]

Out[ ]:

In [ ]:

stock_price_Series_index["2024-08-02":"2024-08-04"]
# ==> 정수 인덱스로 슬라이싱 하면 : 끝 점이 빠짐...
#     내가 만든 인덱스로 슬라이싱 하면 : 끝 점이 포함...

Out[ ]:

	0
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500

dtype: int64

In [ ]:

stock_price_Series_index[ [0,4]]

<ipython-input-19-868bbe721acb>:1: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  stock_price_Series_index[ [0,4]]

Out[ ]:

	0
2024-08-01	10000
2024-08-5	11000

dtype: int64

In [ ]:

stock_price_Series_index[ ["2024-08-02","2024-08-04"]]

Out[ ]:

	0
2024-08-02	10300
2024-08-04	10500

dtype: int64

In [ ]:

stock_price_Series_index["2024-08-05"]
# 참고) 쌩 파이썬의 Dict의 Key처럼 Exact MAtching을 기본으로 함!!
# ==> 공백, 대소문자, 오타 등에 유의해야 함!!!!!

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key)
   3790         try:
-> 3791             return self._engine.get_loc(casted_key)
   3792         except KeyError as err:

index.pyx in pandas._libs.index.IndexEngine.get_loc()

index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '2024-08-05'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-21-bb2b38d63f10> in <cell line: 1>()
----> 1 stock_price_Series_index["2024-08-05"]

/usr/local/lib/python3.10/dist-packages/pandas/core/series.py in __getitem__(self, key)
   1038 
   1039         elif key_is_scalar:
-> 1040             return self._get_value(key)
   1041 
   1042         # Convert generator to list before going through hashable part

/usr/local/lib/python3.10/dist-packages/pandas/core/series.py in _get_value(self, label, takeable)
   1154 
   1155         # Similar to Index.get_value, but we do not fall back to positional
-> 1156         loc = self.index.get_loc(label)
   1157 
   1158         if is_integer(loc):

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key)
   3796             ):
   3797                 raise InvalidIndexError(key)
-> 3798             raise KeyError(key) from err
   3799         except TypeError:
   3800             # If we have a listlike key, _check_indexing_error will raise

KeyError: '2024-08-05'

In [ ]:

stock_price_Series_index["2024-08-5"]

Out[ ]:

In [ ]:

# numpy에서 하는게 다 됨!!!!!
# ==> 불리언 인덱싱!!!!! ==> 조건 필터링!!!!!!!!
# 여기서도 불리언 인덱싱이 가능함!!!
# ex) 주식 가격이 10000원이 넘는 데이터만 추려서 봅시다...

In [ ]:

stock_price_Series_index[stock_price_Series_index >10000 ]

Out[ ]:

	0
2024-08-02	10300
2024-08-04	10500
2024-08-5	11000

dtype: int64

In [ ]:

stock_price_Series_index >10000

Out[ ]:

	0
2024-08-01	False
2024-08-02	True
2024-08-03	False
2024-08-04	True
2024-08-5	True

dtype: bool

In [ ]:

# 참고) dict와 상당히 유사!!!! 동작 유사!!! key - value
#       pandas                index(정수,내가만든거) + value

In [ ]:

stock_price_Series_index.index

Out[ ]:

Index(['2024-08-01', '2024-08-02', '2024-08-03', '2024-08-04', '2024-08-5'], dtype='object')

In [ ]:

stock_price_Series_index.values

Out[ ]:

array([10000, 10300,  9900, 10500, 11000])

In [ ]:

stock_price_Series_index + 100

Out[ ]:

	0
2024-08-01	10100
2024-08-02	10400
2024-08-03	10000
2024-08-04	10600
2024-08-5	11100

dtype: int64

In [ ]:

# 참고) in : 자료형이 파이썬의 dict --> key 중심으로 체크!!!
#                     파이썬의 list --> value 중심으로 체크!!
#                     pandas에서는? --> index 중심으로 체크!!!
#                                       (값 중심 아님!!!)

In [ ]:

"2024-08-03" in stock_price_Series_index

Out[ ]:

True

In [ ]:

"2024-08-05" in stock_price_Series_index

Out[ ]:

False

In [ ]:

"2024-08-5" in stock_price_Series_index

Out[ ]:

True

In [ ]:

stock_price_Series_index

Out[ ]:

	0
2024-08-01	10000
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500
2024-08-5	11000

dtype: int64

In [ ]:

10300 in stock_price_Series_index

Out[ ]:

False

In [ ]:

# 알아둘 사항 : 쌩 파이썬의 dict <---> pandas 유사함!!!
# ==> 데이터 수집에서 서로 상호호환이 가능함!!!!
# (뒤에 데이터 수집할 때 예로 볼 예정!!!!)

In [ ]:

s_data = {"APPL":1000, "MS":2000, "TSLA": 1500}
s_data

Out[ ]:

{'APPL': 1000, 'MS': 2000, 'TSLA': 1500}

In [ ]:

s_data["APPL"]

Out[ ]:

In [ ]:

s_data[0]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-37-3ee0bb4fb7d0> in <cell line: 1>()
----> 1 s_data[0]

KeyError: 0

In [ ]:

s_data_Series = pd.Series( s_data)
s_data_Series

Out[ ]:

	0
APPL	1000
MS	2000
TSLA	1500

dtype: int64

In [ ]:

s_data_Series["APPL"]

Out[ ]:

In [ ]:

s_data_Series[0]

<ipython-input-41-5ed2c8b0f14b>:1: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  s_data_Series[0]

Out[ ]:

In [ ]:

#
s_data = {"APPL":1000, "MS":2000, "TSLA": 1500}
ticker = ["GOOGLE", "APPL", "MS","META"]
s_2 = pd.Series(s_data, index = ticker )
s_2

Out[ ]:

	0
GOOGLE	NaN
APPL	1000.0
MS	2000.0
META	NaN

dtype: float64

In [ ]:

# ==> 결측치에 대한 발생을 늘 염두해야 함!!!
#     why??? 결측치 생긴다고 경고나 에러가 없음!!!!!
# ==> pandas : NaN
#     mysql  : null --> is null. is not null

In [ ]:

s_1 = pd.Series(s_data)
s_1

Out[ ]:

	0
APPL	1000
MS	2000
TSLA	1500

dtype: int64

In [ ]:

s_2

Out[ ]:

	0
GOOGLE	NaN
APPL	1000.0
MS	2000.0
META	NaN

dtype: float64

In [ ]:

s_1 + s_2
# ==> numpy의 기본적인 벡터연산의 특징!!!
#     단순 순서가 중심의 벡터연산이 아니라,,
#     index 중심의 벡터 연산으로 동작을 함!!!
# +++ pandas에서는 index에 대해서는 대충 OR 처리하는 경우들이 있음!!
#     원하지 않은 row 생성될 수 있음!!!!
#     데이터 수를 늘 체크!!!!!!!!!

Out[ ]:

	0
APPL	2000.0
GOOGLE	NaN
META	NaN
MS	4000.0
TSLA	NaN

dtype: float64

In [ ]:

# 참고) 결측치 데이터에 대한 처리!!!
# --> 자주 사용되는 메서드
#     .isnull() : 너 값이 결측값이냐? --> T/F
#     .notnull() :너 값이 정상값이냐? --> T/F

In [ ]:

temp = s_1 + s_2
temp

Out[ ]:

	0
APPL	2000.0
GOOGLE	NaN
META	NaN
MS	4000.0
TSLA	NaN

dtype: float64

In [ ]:

temp.isnull()

Out[ ]:

	0
APPL	False
GOOGLE	True
META	True
MS	False
TSLA	True

dtype: bool

In [ ]:

# 실제 받은 데이터 중에서,,,결측치들만 체크!!!!
# ==> 실제 빵구난 데이터를 보자!!!!
temp[ temp.isnull() ]

Out[ ]:

	0
GOOGLE	NaN
META	NaN
TSLA	NaN

dtype: float64

In [ ]:

# 정상적인 값들만 있는 것을 보자..
temp[ temp.notnull() ]

Out[ ]:

	0
APPL	2000.0
MS	4000.0

dtype: float64

In [ ]:

# 주의!!!!!!
# 위의 isnull(), notnull() 메서드들이 FM적으로 맞는 말이고,
#                                              맞는 기능!!!
# 실제 데이터를 처리하다가 보면,,,,이런 단순 기능으로 빠지는 애들이 있음..
# NaN이 아니라,,,
""
None
# ==> 코드적인 부분만이 아니라,,실제 데이터를 보면서 작업을 해야함!!
#     결측데이터에 있어서,,,,,,,빈문자열, None, NaN 구별하면서!!1

In [ ]:

# 기타
temp.ndim

Out[ ]:

In [ ]:

temp.shape

Out[ ]:

(5,)

In [ ]:

len(temp)

Out[ ]:

In [ ]:

temp.dtype

Out[ ]:

dtype('float64')

In [ ]:

##### pandas에서 값에 대한 접근 방식 : FM적인 방법 #####

In [ ]:

# 기존의 방식 : 인덱스 자리에 내가 원하는 것을 요청하자!!!
#               -> 정수, 슬라이싱, 리스트etc
#               + 내가 만든거, 내가 만든거로 슬라이싱, 내가 만든걸로 리스트,,
#               + 조건식 기반의 불리언 인덱싱(필터링)
# ==> 앞의 결론 : pandas[ 대충 던지면 된다!! ]
# 값들이 많은 경우에 있어서는 속도차이가 상당히 나요!!!
# ==> FM적인 방법을 사용해야 속도를 보장받을 수 있음!!
#     상당히 차이 남!!!

In [ ]:

# pandas는 FM적으로 명시적으로 접근하는 방법을 만들어 둠!!!
# 값에 대한 접근을 바라보는 관점 : 1개 값 접근 vs 여러개 값
#   1개 값에 접근  1) at  : 내가 만든 인덱스로 1개 값 접근!!!
#                  2) iat : 태생적인 정수인덱스1개 값 접근!!!!
#   여러개 값에 접근 1) loc : 내가 만든 인덱스로 여러개 값 접근!!
#                    2) iloc :태생적인 정수인덱스로 여러개 값 접근!!

# 참고) 이 부분들은 계속 버전에 따라서 변경되고 있음!!!!
#      ==> 단순 구글링을 하면,,오래전 자료가 나와서 안 맞을 수 있음.!!
#      ==> 최근 문서 기준으로..

# 결론!!!) 내가 만든 인덱스로 접근 vs 태생적인 정수값을 접근!!!
#        + 1개 값을 접근을 할 것인가 vs 여러개 값을 요청할 것인가!!!

# 참고) 1개 값을 iloc , loc 로도 가능은 함!!!
#       데이터가 많으면 속도가 차이가 남!!!!!

In [ ]:

stock_price_Series_index = pd.Series(
    # 정보 : 진짜 리얼한 데이터들,
    #        내가 원하는 값에 접근할 수 있는 인덱스!!!
    data = [10000, 10300, 9900, 10500, 11000],
    index = ["2024-08-01","2024-08-02","2024-08-03",
             "2024-08-04", "2024-08-5"]
)
stock_price_Series_index

Out[ ]:

	0
2024-08-01	10000
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500
2024-08-5	11000

dtype: int64

In [ ]:

stock_price_Series_index[0]
# ==> 비공식적인 방법 : 간단히 처리하다 확인용으로 탐색으로는 괜찮음!!
# ==> 100만 데이터르 처리에서는...노답!!!!( 속도가 많이 차이남!! )

<ipython-input-55-582ae4b44d1a>:1: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  stock_price_Series_index[0]

Out[ ]:

In [ ]:

# ex) 정수인덱스로 접근 & 0번째 1개 값 접근 : FM
stock_price_Series_index.iat[0]

Out[ ]:

In [ ]:

stock_price_Series_index.iloc[0]
# 일반적으로 속도 : [0] >> iloc[0] >> iat[0]
# 명확히 1개 값을 접근한다면 at/iat로 사용하세요!!!!!

Out[ ]:

In [ ]:

# 예) 2024년 8월 4일 주식 가격은 얼마에요?
# ==> 내가 만든 인덱스 & 1개 요청 : at
stock_price_Series_index.at["2024-08-04"]

Out[ ]:

In [ ]:

stock_price_Series_index.iat["2024-08-04"]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-59-a36f9c91870e> in <cell line: 1>()
----> 1 stock_price_Series_index.iat["2024-08-04"]

/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
   2437                 raise ValueError("Invalid call for scalar access (getting)!")
   2438 
-> 2439         key = self._convert_key(key)
   2440         return self.obj._get_value(*key, takeable=self._takeable)
   2441 

/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py in _convert_key(self, key)
   2510         for i in key:
   2511             if not is_integer(i):
-> 2512                 raise ValueError("iAt based indexing can only have integer indexers")
   2513         return key
   2514 

ValueError: iAt based indexing can only have integer indexers

In [ ]:

# 예) 0번쨰 부터 4번째 날의 주식 데이터를 보여주세요!!!
stock_price_Series_index[:4]

Out[ ]:

	0
2024-08-01	10000
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500

dtype: int64

In [ ]:

stock_price_Series_index.iloc[:4]

Out[ ]:

	0
2024-08-01	10000
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500

dtype: int64

In [ ]:

# 예) 2024년 8월 2일부터 2024년 8월 4일까지 주가데이터 보여주세요!!

In [ ]:

stock_price_Series_index["2024-08-02":"2024-08-04"]

Out[ ]:

	0
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500

dtype: int64

In [ ]:

stock_price_Series_index.loc["2024-08-02":"2024-08-04"]

Out[ ]:

	0
2024-08-02	10300
2024-08-03	9900
2024-08-04	10500

dtype: int64

In [ ]:

#  참고) 수치연산을 하는 numpy를 포함!!!--=> 간단한 연산을 내포!!

In [ ]:

stock_price_Series_index.sum()

Out[ ]:

In [ ]:

stock_price_Series_index.count()

Out[ ]:

In [ ]:

stock_price_Series_index.max()

Out[ ]:

In [ ]:

# 참고) pandas 자체 메서드 이외에 파이썬 함수, 외부 함수...
sum(stock_price_Series_index)

Out[ ]:

In [ ]:

max(stock_price_Series_index)

Out[ ]:

In [ ]:

# 정리
# 쌩 파이썬의 리스트 --> numpy array ---> pandas Series
#                                             1D
#                                      index(정수,내가 만든거)
#                                           at/iat/loc/iloc
#                                      벡터연산기본!!(index중심)

In [ ]:

'Python' 카테고리의 다른 글

파이썬 기초 - json (0)	2024.08.27
파이썬 기초 - 06_리스트 (0)	2024.08.23
파이썬 기초 - 05_범위(in_out) (0)	2024.08.21
파이썬 기초 - 04_Dictionary (0)	2024.08.21
파이썬 기초 - 03_리스트필터링 (0)	2024.08.19

현재글파이썬 기초 - 02_pandas_Series

AI엔지니어 Black입니다.

Today :
Yesterday :

Black_AIEngineer

파이썬 기초 - 02_pandas_Series

Pandas

정리

'Python' 카테고리의 다른 글

'Python'의 다른글

티스토리툴바

« 2026/01 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

파이썬 기초 - 02_pandas_Series

Pandas

정리

'Python' 카테고리의 다른 글

'Python'의 다른글

관련글

티스토리툴바