Abstract
This paper introduces significant enhancements to RepoSim4Py and RepoSnipy, advanced semantic tools for deep analysis of software repositories. RepoSim4Py commandline toolbox now supports multi-level embedding, encompassing code, documentation, requirements, README, and comprehensive repository analysis, which enable the understanding of repository dynamics. Concurrently, RepoSnipy webbased search engine facilitates sophisticated repository similarity searches and introduces clustering based on both repository tags (topic_cluster) and code embeddings (code_cluster). We also introduce SimilarityCal, a novel binary classification model trained on these clusters, to predict and quantify repository similarities with high accuracy. These developments provide researchers and developers with powerful tools to navigate the complex landscape of software repositories, improving efficiency in software development and fostering innovation through better reuse of existing resources.
Original language | English |
---|---|
Title of host publication | 2024 IEEE 20th International Conference on e-Science (e-Science) |
Subtitle of host publication | September 16-20, 2024 | Osaka, Japan |
Publisher | IEEE |
Number of pages | 10 |
ISBN (Print) | 9798350365610 |
DOIs | |
Publication status | Published - 16 Sept 2024 |
Event | 20th IEEE International eScience Conference (eScience 2024) - Osaka, Japan Duration: 16 Sept 2024 → 20 Sept 2024 Conference number: 20 https://www.escience-conference.org/2024/ |
Publication series
Name | IEEE International Conference on e-Science (e-Science) |
---|---|
Publisher | IEEE |
ISSN (Print) | 2325-372X |
ISSN (Electronic) | 2325-3703 |
Conference
Conference | 20th IEEE International eScience Conference (eScience 2024) |
---|---|
Abbreviated title | eScience 2024 |
Country/Territory | Japan |
City | Osaka |
Period | 16/09/24 → 20/09/24 |
Internet address |