NLP and Web Mining Portfolio¶

Your name or alias

YYYY-MM

This page summarizes my work on web mining and applied natural language processing (NLP) projects.

1. Getting Started¶

Repository Link¶

(clickable link to your nlp-01 repository)

Brief Overview of Project Tools and Choices¶

2. Text Preprocessing¶

Repository Link¶

(clickable link to your nlp-02 repository)

Techniques¶

(Describe the preprocessing steps you applied: tokenization, stopword removal, stemming, lemmatization, or other cleaning steps.)

Artifacts¶

(clickable link to artifacts/ folder and explain result files)

Insights¶

(How did preprocessing change the text? What did you keep, remove, or transform and why?)

3. Text Exploration¶

Repository Link¶

(clickable link to your nlp-03 repository)

Techniques¶

(Explain how you analyzed word frequencies, n-grams, TF-IDF scores, or visualizations.)

Artifacts¶

(clickable link to artifacts/ folder and explain result files)

Insights¶

(What words, phrases, or patterns appeared most often? What did this reveal about the corpus?)

4. API Text Data¶

Repository Link¶

(clickable link to your nlp-04 repository)

Techniques¶

(Describe the API you used, how you authenticated, and how you collected and stored text data.)

Artifacts¶

(clickable link to artifacts/ folder and explain result files)

Insights¶

(What data did you collect? What did you observe about its structure, volume, or quality?)

5. Web Documents¶

Repository Link¶

(clickable link to your nlp-05 repository)

Techniques¶

(Explain how you scraped or fetched web documents. What tools and approaches did you use?)

Artifacts¶

(clickable link to artifacts/ folder and explain result files)

Insights¶

(What content did you retrieve? What challenges did you encounter with real web data?)

6. NLP Pipeline¶

Repository Link¶

(clickable link to your nlp-06 repository)

Techniques¶

(Describe how you combined preprocessing, exploration, and collection techniques into a complete pipeline.)

Artifacts¶

(clickable link to artifacts/ folder and explain result files)

Assessment¶

(What did the full pipeline reveal about your text corpus? What decisions or actions could this analysis support?)

7. Toy GPT Exploration¶

Repository Link¶

(clickable link to your toy-gpt repository)

Techniques¶

(Describe which models you explored: unigram, bigram, trigram, or attention-based. What did you implement or modify?)

Artifacts¶

(clickable link to artifacts/ folder and explain result files)

Insights¶

(What did building or exploring a small language model reveal about how modern LLMs work?)