Multi-level AI-driven analysis of software repository similarities

Honglin Zhang, Leyu Zhang, Lei Fang, Rosa Filgueira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces significant enhancements to RepoSim4Py and RepoSnipy, advanced semantic tools for deep analysis of software repositories. RepoSim4Py commandline toolbox now supports multi-level embedding, encompassing code, documentation, requirements, README, and comprehensive repository analysis, which enable the understanding of repository dynamics. Concurrently, RepoSnipy webbased search engine facilitates sophisticated repository similarity searches and introduces clustering based on both repository tags (topic_cluster) and code embeddings (code_cluster). We also introduce SimilarityCal, a novel binary classification model trained on these clusters, to predict and quantify repository similarities with high accuracy. These developments provide researchers and developers with powerful tools to navigate the complex landscape of software repositories, improving efficiency in software development and fostering innovation through better reuse of existing resources.
Original languageEnglish
Title of host publication2024 IEEE 20th International Conference on e-Science (e-Science)
Subtitle of host publicationSeptember 16-20, 2024 | Osaka, Japan
PublisherIEEE
Number of pages10
ISBN (Print)9798350365610
DOIs
Publication statusPublished - 16 Sept 2024
Event20th IEEE International eScience Conference (eScience 2024) - Osaka, Japan
Duration: 16 Sept 202420 Sept 2024
Conference number: 20
https://www.escience-conference.org/2024/

Publication series

NameIEEE International Conference on e-Science (e-Science)
PublisherIEEE
ISSN (Print)2325-372X
ISSN (Electronic)2325-3703

Conference

Conference20th IEEE International eScience Conference (eScience 2024)
Abbreviated titleeScience 2024
Country/TerritoryJapan
CityOsaka
Period16/09/2420/09/24
Internet address

Fingerprint

Dive into the research topics of 'Multi-level AI-driven analysis of software repository similarities'. Together they form a unique fingerprint.

Cite this