CorpusStudio:

Surfacing Emergent Patterns in a Corpus of Prior Work while Writing


Abstract: Many communities, including the scientific community, develop implicit writing norms. Understanding them is crucial for effective communication with that community. Writers gradually develop an implicit understanding of norms by reading papers and receiving feedback on their writing. However, it is difficult to both externalize this knowledge and apply it to one's own writing. We propose two new writing support concepts that reify document and sentence-level patterns in a given text corpus: (1) an ordered distribution over section titles and (2) given the user's draft and cursor location, many retrieved contextually relevant sentences. Recurring words in the latter are algorithmically highlighted to help users see any emergent norms. Study results (N=16) show that participants revised the structure and content using these concepts, gaining confidence in aligning with or breaking norms after reviewing many examples. These results demonstrate the value of reifying distributions over other authors’ writing choices during the writing process.



Given a corpus of papers written for the same or similar audiences, e.g., papers previously published at ACM UIST, CorpusStudio writers by making visible the writing choices of previous authors in the corpus. To help the writer recognize common and uncommon paper structures, the left sidebar (A) shows an ordered distribution of clusters of section titles in the corpus using Positional Diction Clustering. Informed by (A), writers can draft their own outline in the center text editor (B). When fleshing out their outline with prose, to potentially see emergent patterns in previously written papers, writers can press TAB to retrieve analogous sentence examples from the corpus (C) based on their cursor's location within their own draft. To reveal emerging patterns, the writer can select different modes of highlighting commonalities and variation across retrieved sentences. The writer can hover over a retrieved sentence (D) to see more of the context in which it appeared, and save, annotate, and share retrieved sentence examples that they think fulfill a purpose particularly well or poorly.