DH Project Foundations
Best Practices for Getting Started
Joey Takeda, Digital Humanities Innovation Lab, Simon Fraser University
University of British Columbia, January 07, 2025
Unceded territory of the səl̓ilw̓ətaʔɬ (Tsleil-Waututh), kʷikʷəƛ̓əm (Kwikwetlem), Sḵwx̱wú7mesh Úxwumixw (Squamish), and xʷməθkʷəy̓əm (Musqueam) Nations
Agenda
- Establishing best practices for file names
- Creating plan for structure
- Getting GitHub accounts setup (CoLab)
DH Projects as the "Basic Unit of DH"
Projects are both nouns and verbs: a project is a kind of scholarship that requires design, management, negotiation, and collaboration. It is also scholarship that projects, in the sense of futurity, as something which is not yet.
[P]rojects are projective, involving iterative processes and many dimensions of coordination, experimentation, and production.
Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, and Jeffrey Schnapp. Digital_Humanities. The MIT Press, 2012. https://doi.org/10.7551/mitpress/9248.001.0001.
File management and structure
- Managing and wrangling data, files, etc is a crucial dimension of this work
- Planning on the information you will collect
File Naming
Best Practices for Files
- File names should be unique
- File names should be consistent
- File names should be descriptive, but not deterministic
Best Practices for Files
- File names should consist of numbers, letters, underscores, and/or hyphens
- No spaces, ampersands, colons, periods, et cetera
- Separators should be consistent (e.g. always use _ or -)
Different Cases
- camelCase: myFileName.xml
- UpperCamelCase: MyFileName.xml
- snake_case: my_file_name.xml
- kebab-case: my-file-name.xml
- Train-Case: My-File-Name.xml
Other Tips
- Numbers should be left padded (e.g. not File_1 but File_001 if you think there may be more than 99)
- Prefer human readable names (e.g. 1925-02-01_WinnipegTribune, not 19250201WT)
- Think about sortable order
- Strive for immutability; do not embed changeable states (more on this later)
Other Tips (for TEI Projects)
- Filenames should be the same as ids
- Should begin with a letter
- For entities, usually 4-5 letters and a number, based on some unique property (as a shorthand)
Structuring Folders
- All about the tree
- But folders are just conveniences and should not be a primary way of categorizing data
The best actors in the world, either for tragedy, comedy, history, pastoral, pastoral-comical, historical-pastoral, tragical-historical, tragical-comical-historical-pastoral, scene individable, or poem unlimited.
Hamlet, TLN 1445-1450
Folders, continued
- Folders are not necessarily the place for interpretation
- Where possible, folders should reflect some kind of material reality or convenient structure
Files and Folders
- Redundancy is fine (good even!)
Common Structure for TEI Projects
├── facsimiles
│ ├── Text1.tiff
│ ├── Text2.tiff
├── info
│ ├── about.xml
│ ├── acknowledgements.xml
│ ├── index.xml
│ ├── legal.xml
│ ├── menu.xml
├── sch
│ ├── [PROJECT_NAME].odd
│ ├── [PROJECT_NAME].rng
│ ├── [PROJECT_NAME].sch
├── texts
│ ├── Text1.xml
│ ├── Text2.xml
├── bibliography.xml
├── organizations.xml
├── people.xml
├── places.xml
└── [PROJECT_NAME].xpr
Activity
https://joeytakeda.github.io/disa-workshops/
Version Control
- Data should be versioned using a version control software
- Version control means every version of every file is retained
- So no need for Draft-NEW,
Draft-NEW-Revised, My-File-REVISED-Final-JANUARY, etc
GitHub
- Signing up for GitHub account: https://github.com