[2023 Spring NLP Seminar] Data-Centric AI : Chapter 1

AI 2023. 4. 5. 12:15

728x90

[2023 Spring NLP Seminar] Data-Centric AI : Chapter 1

Data-Centric AI의 필요성

Bad Data를 수작업으로 수정하기에는 돈이 엄청 많이 든다 !

-> noisy한 데이터를 어떻게 다룰 수 있을지에 대한 다양한 방법

A data-centric AI pipline

Step 1: Explore the data, fix fundamental issues, and transform it to be ML appropriate. (데이터가 있으면, 일단 탐색해보고 ML에 맞게 변형을 한다!)
Step 2: Train a baseline ML model on the properly formatted dataset. (변형된 데이터를 가지고 일단 ML 실험을 돌려봐)
Step 3: Utilize this model to help you improve the dataset (모델을 개선하기 전에 data-centric ai 기술을 적용해서 여러 실험을 돌려봐라!)
Step 4: Try different modeling techniques to improve the model on the improved dataset and obtain the best model. (3단계에서 개선된 데이터셋을 가지고 이제는 모델링 기술들을 적용해서 실험해봐라!)
신신당부: 절대 2단계에서 4단계로 뛰어넘지 말고, 좋은 시스템 구축을 위해서 3-4단계를 반복해라!

Confident Learning

Pruning noisy data (<-> fixing label errors or modifying the loss function)
Counting to estimate noise (<-> jointly learning noise rates during training)
Ranking examples to train confidence (<-> weighting by exact probabilites)
: SVM과 같은 의사결정 모델 결과에 기반해서 훈련 중에 사용할 데이터 순위를 매긴다.

다은님 발표

https://velog.io/@delee12/Data-Centric-AI-1-IntroductionIAP-2023

[Data-Centric AI #1] Introduction (IAP, 2023)

page: https://dcai.csail.mit.edu/1/17/23: Data-Centric AI vs. Model-Centric AI1/18/23: Label Errors1/19/23: Dataset Creation and Curation1/20/23:

velog.io

https://velog.io/@delee12/Data-Centric-AI-2-Data-Centric-AI-vs.-Model-Centric-AI

[Data-Centric AI #2] Data-Centric AI vs. Model-Centric AI

Introduction Data-Centric AI vs. Model-Centric AILabel ErrorsDataset Creation and CurationData-centric Evaluation of ML ModelsClass Imbalance, Outlier

velog.io

https://velog.io/@delee12/Data-Centric-AI-3-Label-Errors

[Data-Centric AI #3] Label Errors (Confident Learning)

Introduction Data-Centric AI vs. Model-Centric AILabel ErrorsDataset Creation and CurationData-centric Evaluation of ML ModelsClass Imbalance, Outlier

velog.io

728x90

'AI' 카테고리의 다른 글

Discriminative Model / Generative Model (0)	2023.02.24
강화 학습 참고할 사이트 (0)	2021.09.23
Argparse 자습서 (0)	2021.09.08
Global Average Pooling (0)	2021.09.03

ABOUT ME

세상은 내가 정하는 대로 세상은 내가 정하는 대로

Data-Centric AI의 필요성

A data-centric AI pipline

Confident Learning

'AI' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Data-Centric AI의 필요성

A data-centric AI pipline

Confident Learning

'AI' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바