Hardness sampling: exploring instance hardness in pool-based active learning

Utilize este link para identificar ou citar este item: https://bdm.unb.br/handle/10483/41100

Arquivos neste item:

Arquivo	Descrição	Tamanho	Formato
2024_GabrielSCNogueira_tcc.pdf		3,83 MB	Adobe PDF	ver/abrir

Registro completo

Campo Dublin Core	Valor	Língua
dc.contributor.advisor	Garcia, Luís Paulo Faina	-
dc.contributor.author	Nogueira, Gabriel da Silva Corvino	-
dc.identifier.citation	NOGUEIRA, Gabriel da Silva Corvino. Hardness sampling: exploring instance hardness in pool-based active learning. 2024. 47 f., il. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) — Universidade de Brasília, Brasília, 2024.	pt_BR
dc.description	Trabalho de Conclusão de Curso (graduação) — Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2024.	pt_BR
dc.description.abstract	Active Learning (AL) techniques enable the creation of efficient models with minimal annotation effort by deciding which portions of the available data are worth learning. Pool-based AL (PAL) is a specific scenario in which instances within a pool of unlabeled data must be selected, labeled by an oracle, and incorporated into a subset of the pool to be used as a training set. The goal of PAL is to build a growing subset that is increasingly more representative of the problem at hand. However, the proper strategy for an optimal query of such instances is still an open question. In this paper, we resort to Hardness Measures (HMs) to enrich the current repertoire of PAL strategies available to address this question. HMs are metrics that employ the Instance Hardness (IH) concept to identify instances with a higher probability of being misclassified and have been successfully applied in areas such as meta-learning and explainable AI. Likewise, this study adds to this collective effort by exploring the use of IH in the context of AL, examining HMs as informativeness criteria for PAL, which led to a new PAL strategy called Hardness Sampling (HardS). We tested HardS across multiple datasets and learners, demonstrating its competitive performance compared to classical strategies such as Uncertainty Sampling, Expected Error Reduction, and Density-weighted methods. The results also highlighted the success of neighborhood-based measures, especially the ratio of the intra-class and extra-class distances at an instance level. Additionally, some tree-based and likelihood-based measures also showed promising performance.	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.subject.keyword	Aprendizado ativo	pt_BR
dc.subject.keyword	Informática	pt_BR
dc.title	Hardness sampling: exploring instance hardness in pool-based active learning	pt_BR
dc.type	Trabalho de Conclusão de Curso - Graduação - Bacharelado	pt_BR
dc.date.accessioned	2025-01-13T22:28:15Z	-
dc.date.available	2025-01-13T22:28:15Z	-
dc.date.submitted	2024-11-28	-
dc.identifier.uri	https://bdm.unb.br/handle/10483/41100	-
dc.language.iso	Inglês	pt_BR
dc.rights.license	A concessão da licença deste item refere-se ao termo de autorização impresso assinado pelo autor que autoriza a Biblioteca Digital da Produção Intelectual Discente da Universidade de Brasília (BDM) a disponibilizar o trabalho de conclusão de curso por meio do sítio bdm.unb.br, com as seguintes condições: disponível sob Licença Creative Commons 4.0 International, que permite copiar, distribuir e transmitir o trabalho, desde que seja citado o autor e licenciante. Não permite o uso para fins comerciais nem a adaptação desta.	pt_BR
dc.description.abstract1	Active Learning (AL) techniques enable the creation of efficient models with minimal annotation effort by deciding which portions of the available data are worth learning. Pool-based AL (PAL) is a specific scenario in which instances within a pool of unlabeled data must be selected, labeled by an oracle, and incorporated into a subset of the pool to be used as a training set. The goal of PAL is to build a growing subset that is increasingly more representative of the problem at hand. However, the proper strategy for an optimal query of such instances is still an open question. In this paper, we resort to Hardness Measures (HMs) to enrich the current repertoire of PAL strategies available to address this question. HMs are metrics that employ the Instance Hardness (IH) concept to identify instances with a higher probability of being misclassified and have been successfully applied in areas such as meta-learning and explainable AI. Likewise, this study adds to this collective effort by exploring the use of IH in the context of AL, examining HMs as informativeness criteria for PAL, which led to a new PAL strategy called Hardness Sampling (HardS). We tested HardS across multiple datasets and learners, demonstrating its competitive performance compared to classical strategies such as Uncertainty Sampling, Expected Error Reduction, and Density-weighted methods. The results also highlighted the success of neighborhood-based measures, especially the ratio of the intra-class and extra-class distances at an instance level. Additionally, some tree-based and likelihood-based measures also showed promising performance.	pt_BR
Aparece na Coleção:	Ciência da Computação

Mostrar item em formato simples Recomendar este item Visualizar estatísticas