Peter Organisciak

↧

Image may be NSFW.
Clik here to view.

Stuff-getting and People-joining

December 1, 2013, 1:58 pm

Based on an XKCD comic, the Up-Goer Five text editor only lets you write using the one thousand most common words in English. Here are my attempts to describe what I do in crowdsourcing and information...

View Article

Image may be NSFW.
Clik here to view.

Low-Effort Crowdsourcing

December 3, 2013, 1:57 pm

Sentence generation with choice-based typing. The program prompts a user to choose one of two words that are likely to come after the previous words, allowing them to generate a whole sentence by...

View Article

Image may be NSFW.
Clik here to view.

Running Maps

January 16, 2014, 4:37 pm

Motorola’s now-discontinued MotoACTV sportswatch gives you the commendable option to download all your running routes. With a touch of data hacking, some manual editing to remove redundant routes, and...

View Article

Image may be NSFW.
Clik here to view.

Add user pseudonyms in data analysis

April 16, 2014, 11:08 am

When analyzing anonymous user data in a team, I often take an extra step to help discussion: converting user identifiers to popular English name pseudonyms. Pseudonyms tend to make the data more...

View Article

Image may be NSFW.
Clik here to view.

Old Slang: Appreciating Webster’s with Bots

June 2, 2014, 9:06 am

The richness of language can be under-appreciated because of its mundane nature. James Somers’s essay You’re probably using the wrong dictionary recently turned me on to old dictionaries, which – with...

View Article

Image may be NSFW.
Clik here to view.

Authors cited in dictionary definitions

September 30, 2014, 4:36 pm

View Article

Image may be NSFW.
Clik here to view.

US Names

November 18, 2014, 3:24 pm

I just put up a modest reference repository with various slices of data on US names. I included an estimate of names among US-born citizens today, by cross-referencing baby names data and population...

View Article

Image may be NSFW.
Clik here to view.

Popular Dish Prices, 1913-1970

December 3, 2014, 10:03 am

With data from NYPL Labs’ What’s on the Menu?

View Article

Image may be NSFW.
Clik here to view.

I’m on the Job Market!

January 7, 2015, 10:16 am

I’m an information scientist with a digital humanities background, specializing in large-scale text analysis, crowds systems, and information retrieval over novel datasets. Look at my CV, or contact me...

View Article

Image may be NSFW.
Clik here to view.

Your First Twitter Bot, in 20 minutes

October 27, 2015, 3:15 pm

I think it was the Pres. at dawn with the Spin Back Knuckle. — bad Clue guesses (@BadClues) September 6, 2015 Creating a Twitter bot is a great exercise for formalizing a simple concept in a concrete...

View Article

Image may be NSFW.
Clik here to view.

MARC Fields in the HathiTrust

December 7, 2015, 1:39 pm

At the HathiTrust Research Center, we’re often asked about metadata coverage for the nearly 15 million HathiTrust records. Though we provide additional derived features like author gender and language...

View Article

Image may be NSFW.
Clik here to view.

Git tip: Automatically converting iPython notebook READMEs to Markdown

December 16, 2015, 1:50 pm

A small but useful tip today, on using iPython notebooks for a git project README while keeping an auto-generated version in the Markdown format that Github prefers. I’m in the midst of refreshing and...

View Article

Image may be NSFW.
Clik here to view.

HTRC Feature Reader 2.0

March 2, 2016, 11:50 am

I’ve released an overhaul of the HTRC Feature Reader, a Python library that makes it easy to work with the Extracted Features (EF) dataset from the HathiTrust. EF provides page-level feature counts for...

View Article

Image may be NSFW.
Clik here to view.

Term Weighting for Humanists

March 9, 2016, 7:30 am

This post is about words. Specifically, an intuition about words, one which is meant to grasp at the aboutness of a text. The idea is simple: that not all words are equally valuable. Here, I’ll...

View Article

Image may be NSFW.
Clik here to view.

A Dataset of Term Stats in Literature

March 18, 2016, 7:01 am

Following up on Term Weighting for Humanists, I’m sharing data and code to apply term weighting to literature in the HTRC’s Extracted Features dataset. Crunched for 235,000 Language and Literature...

View Article

Image may be NSFW.
Clik here to view.

Understanding Classified Languages in the HathiTrust

June 14, 2016, 6:33 am

The HTRC Extracted Features (EF) dataset provides two forms of language information: the volume-level bibliographic metadata (what the library record says), as well as machine-classified tags for each...

View Article

Image may be NSFW.
Clik here to view.

Pico Safari: Active Gaming in Integrated Environments

July 19, 2016, 2:06 pm

With the recent release of Pokemon Go, I’m posting my presentation notes for a similar game called Pico Safari, a collaboration with Lucio Gutierrez, Garry Wong, and Calen Henry in late 2009, advised...

View Article

Image may be NSFW.
Clik here to view.

Beyond tokens: what character counts say about a page

October 20, 2016, 7:17 am

When talking about quantitative features in text analysis the term token count is king, but other features can help infer the content and context of a page. I demonstrate visually how the characters at...

View Article