id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1565797455907422
record_format oai_dc
collection NDLTD
language English
sources NDLTD
topic Computer Engineering
Computer Science
graph embedding
graph hierarchy
asymmetric transitivity
question routing
expert finding
cold-start
breaking cycles
directed acyclic graph
spellingShingle Computer Engineering
Computer Science
graph embedding
graph hierarchy
asymmetric transitivity
question routing
expert finding
cold-start
breaking cycles
directed acyclic graph
Sun, Jiankai
Directed Graph Analysis: Algorithms and Applications
author Sun, Jiankai
author_facet Sun, Jiankai
author_sort Sun, Jiankai
title Directed Graph Analysis: Algorithms and Applications
title_short Directed Graph Analysis: Algorithms and Applications
title_full Directed Graph Analysis: Algorithms and Applications
title_fullStr Directed Graph Analysis: Algorithms and Applications
title_full_unstemmed Directed Graph Analysis: Algorithms and Applications
title_sort directed graph analysis: algorithms and applications
publisher The Ohio State University / OhioLINK
publishDate 2019
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1565797455907422
work_keys_str_mv AT sunjiankai directedgraphanalysisalgorithmsandapplications
_version_ 1719456339946635264
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu15657974559074222021-08-03T07:12:25Z Directed Graph Analysis: Algorithms and Applications Sun, Jiankai Computer Engineering Computer Science graph embedding graph hierarchy asymmetric transitivity question routing expert finding cold-start breaking cycles directed acyclic graph Taxonomy graphs that capture hyponymy or meronymy relationships through directed edges are expected to be acyclic. However, in practice, they may have thousands of cycles, as they are often created in a crowd-sourced way. Since these cycles represent logical fallacies, they need to be removed for many web applications. In this thesis, we first address the problem of breaking cycles while preserving the logical structure (hierarchy) of a directed graph as much as possible. Existing approaches for this problem either need manual intervention or use heuristics that can critically alter the taxonomy structure. In contrast, our approach infers graph hierarchy using a range of features, including a Bayesian skill rating system and a social agony metric. We also devise several strategies to leverage the inferred hierarchy for removing a small subset of edges to make the graph acyclic. We then apply our breaking cycles technique to address the problem of Question Difficulty and Expertise Estimation (QDEE) in Community Question Answer (CQA) sites such as Yahoo! Answers and Stack Overflow. Our framework QDEE tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within the said model.Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question and the expertise of the user. The QDEE framework also allows us to characterize user expertise in novel ways by identifying interesting patterns and roles played by different users in such collaborative CQA systems.Each node in the competition graph will be assigned by QDEE a scalar value to represent its skill. The learned skills of question nodes are question difficulty scores, and the learned skills of user nodes are their expertise scores. However, the question difficulty and user expertise can vary on different topics. In Stack Exchange sites, each question can be assigned multi-tags to represent its most relevant topics. Hence using a solely scalar value to represent question difficulty level or user expertise is not thorough. We then propose to learn latent representations for question difficulty (question node) and user expertise (user node) by directed graph embedding with asymmetric transitivity preserving (ATP). Asymmetric transitivity is an essential property of directed graphs since it can play an important role in downstream graph inference and analysis. Maintaining such properties, while reducing the graph to a lower-dimensional vector space, has been the focus of much recent research. In this thesis, we propose a scalable methodology for embedding directed graphs with the goal of preserving such properties. The technique incorporates graph hierarchy and reachability information naturally by relying on a non-linear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. However, ATP can only learn embedding for nodes which are already in the graph, which undermines its application in cold- start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. In this thesis, we propose ColdRoute to handle the task of routing cold questions posted by new or existing askers to suitable experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach with well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (precision$@1$, accuracy, MRR) over the state-of-the-art models such as semantic matching by $159.5\%$,$31.84\%$, and $40.36\%$ for cold questions posted by existing askers, and $123.1\%$, $27.03\%$, and $34.81\%$ for cold questions posted by new askers respectively.Finally, we propose to solve the cold question routing problem by marrying the merits of the above two types of methods ColdRoute and ATP. On one hand, we use graph embedding techniques to encode useful higher-order graph structure information, and on the other hand, we enable the incorporation of heterogeneous information (e.g. textual) in the node embedding as well. Comparing with the directed graph constructed in {\em ATP}, our proposed method allows more flexible node interactions and can enrich the question/user node embeddings with the textual information from question tags. And it's able to route newly posted questions no matter posted by existing askers or new askers to suitable users in CQAs. 2019 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1565797455907422 http://rave.ohiolink.edu/etdc/view?acc_num=osu1565797455907422 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.