Shristi Shrestha
  • Shristi Shrestha

  • Phd Candidate

  • Computer Science and Engineering

  • Louisiana State University

    Baton Rouge, LA

About Me

I am pursuing a Ph.D. in Computer Science under the supervision of Dr. Anas Mahmoud. My research focuses on leveraging Large Language Models (LLMs) to address the needs of mobile app users and developers. I apply natural language processing (NLP), qualitative analysis, and statistical methods to gain insights from user feedback. I also investigate mobile app store policies on app testing and feature requirements and examine their impact on app development workflows.

Publications

  • S. Shrestha and A. Mahmoud, Automated Software Engineering Journal (ASEJ), 2025

    LLM-based text summarization Recall analysis Performance analysis of multiple LLMs Cost-latency analysis of LLMs Review entity analysis Readability user study Non-parametric statistical analysis
    >>> Read summary

    Problem
    Popular apps receive thousands of reviews daily. Users cannot read them all to make informed decisions, and existing star-rating systems fail to capture specific user needs. For example, stock trading app users expect seamless transaction executions, transparent pricing, no hidden fees, and strategy resources for maximizing their returns on investments (ROI). Users navigating current app review systems are forced to navigate a diverse array of feedback, including bug reports and feature requests, to determine if the app matches their user goals.

    Solution
    We propose an LLM-based app review summarization technique to extract and present fixed length summaries for apps. We employ the Chain of Density prompting, originally designed for news articles. From a large volume of reviews, the LLM is instructed to iteratively identify review entities and fuse them into a fixed-length summary. Unlike news articles, app reviews represent voices of thousands of users and discuss various topics related to the app.

    Data
    Summarized a dataset of 37,245 reviews collected from eight popular apps across four diverse domains: Ride-hailing, Online Dating, Investing, and Mental Health. For fair representation of different ratings in the reviews, we employed stratified sampling based on star ratings (1-5 stars) and used Hybrid TF-IDF ranking to sample 2.8k high-information reviews. We used GPT-4 to generate abstractive summaries of the sampled reviews.

    Results
    The modified version of CoD prompting achieved an average recall of 81% for key review entities, significantly outperforming vanilla prompts (64%) and extractive summarization (60%) baselines. Our user study with 48 participants demonstrated that the generated summaries maintained high readability even as their semantic density increased. Comparative analysis revealed that while GPT-4 produced the most balanced summaries in terms of coherence and density, Gemini-1.5-Flash and Llama-3.1 offered lower-cost alternatives with varying trade-offs in summary length and sentiment retention.

  • G. Shrestha, S. Shrestha, and A. Mahmoud, ACM Transactions on Software Engineering and Methodology (ACM TOSEM), 2025

    Qualitative thematic analysis Developer survey Inter-rater reliability Play Store App Testing Policy
    >>> Read summary

    Problem
    Put yourself in the position of an indie developer. You are aiming to solve a niche problem, spent months in building a viable product, and want to release it to start attracting real users. Google's Play Store wants you to find 20 individuals willing to test (interact) your app for 14 days continuously. As per the documentation, you can reach out to your friends, family, colleagues, or social media forums to find those individuals.

    Solution
    We conducted a mixed-method study to uncover how developers are surviving the policy. First, we performed qualitative thematic analysis of Reddit comments about the policy. Second, we ran a 15-min survey with developers who faced the policy in releasing their own apps on Play Store.

    Data
    We analyzed 897 comments from 564 users posted on 38 subreddit threads. Then, we surveyed 14 indie developers, asking them two questions about the policy: a) strategies to comply with the policy, b) assessment of the policy requirements.

    Results
    We found developers expressing frustration on several Reddit posts and shared their logistical barriers to adopting the policy. Several job posts appeared on online Gig platforms like Fiverr asking/calling for testers. Redditors reported concerns over such growing a market (e.g., `potential scams`, `black market`, `sweatshops`). Occasionally, developers expressed that the policy discriminated them in favor of corporate developers.

  • S. Shrestha and A. Mahmoud, Proceedings of the IEEE/ACM International Conference on Mobile Software Engineering and Systems (ICSE MobileSoft), 2024

    Prompt engineering and analysis Zero-shot prompting NLTK Gensim GloVe embeddings Qualitative thematic analysis Recall analysis Pandas Scikit-learn
    >>> Read summary

    Problem
    Finding an app that meets specific user criteria is difficult because the current `one-size-fits-all` 5-star rating system in app stores fails to capture domain-specific needs. This forces users to spend significant mental effort scrolling through thousands of unstructured reviews to locate relevant information.

    Solution
    We developed an automated, unsupervised pipeline that uses a large language model (LLM) to generate `Rate Features` from long, unstructured app reviews. `Rate features` are neutral, domain-specific, concise summaries (2-3 words) of the reviews. We first filter informative reviews using Hybrid TF.IDF extractive summarization method. Then, we use GPT-3.5-turbo to generate `Rate Features` from the filtered reviews.

    Data
    Generated `Rate Features` from 167k reviews of 90 popular apps across three domains: Ride-hailing, Mental Health, and Investing.

    Results
    We found that LLMs identified user goals described in the reviews as the best candidates for Rate features. Top three frequently appearing `Rate Features` recalled 95-100% of the user goals in the analyzed review dataset.

Research Interests

Software Engineering | Requirement Engineering | Natural Language Processing | App Store Requirement Analysis | User Interface Design Analysis | Human Computer Interactions | Qualitative Analysis

Education

  • Louisiana State University

    2022 - present

    Ph.D. Candidate in Computer Science; GPA: 3.91

    Dissertation title: “Leveraging Large Language Models to Enhance the Utility of Mobile App Store Rating Systems”

    Baton Rouge, LA, USA

  • Tribhuwan University, Pulchowk Campus

    2015 - 2019

    Bachelor in Computer Engineering; GPA: 3.75

    Lalitpur, Nepal

Teaching

Louisiana State University

Since 2022
  • Programming Lab Assistant (3 hr, 30 students)

    Supervised first-year undergraduate computer science students in their lab assignments for the “Introduction to the Java programming language” course.
  • Graduate Teaching Assistant (3 hr, 100+ students)

    Offer assistance in grading midterm and final exams for two courses: “Software Systems Design” and “Programming Language.”

Work Experience

Sireto Technology

2019 - 2022
  • Software Developer (full-time, onsite)

    Engineered web and mobile applications for the company including art e-commerce platform, survey form builder, and business profile verification tools.
  • Intern, Software Developer, QA (full-time, 3 months, onsite)

    Developed and executed unit and integration tests for Java-based applications to ensure functionality, reliability, and code quality.

Technical Skills

  • Programming Languages:

    Python, Java, Kotlin, JavaScript (JS), SQL, C/C++, Dart

  • Frameworks & Tools:

    SpringBoot, Next.js, React (library), Flutter (SDK), Figma, NLTK, Git

  • Cloud Technologies:

    Firebase (Auth, Functions, FireStore, NextJS integration), AWS

  • Database:

    PostgreSQL, MongoDB, Elasticsearch, HBase