I recently saw a giant list of links to Udemy courses, in the form below. I found this unwieldy and impossible to read, so I wrote a python script to extract out the titles and add formatting.

Coupons are valid for a limited time only, so grab them while they last.
WEB DEVELOPMENT
www.udemy.com/ultimate-web/learn/v4/?couponCode=LRNWEB
www.udemy.com/responsive-website-template-from-scratch-html-css/?couponCode=FREEFB
www.udemy.com/web-design-creating-websites-from-scratch/?couponCode=WEBFREE

My script changed the links and added dashed separators to distinguish when the topic changed.


Coupons are valid for a limited time only, so grab them while they last.
WEB DEVELOPMENT
------------------------------------------------------------------------------
ultimate web
	www.udemy.com/ultimate-web/learn/v4/?couponCode=LRNWEB
responsive website template from scratch html css
	www.udemy.com/responsive-website-template-from-scratch-html-css/?couponCode=FREEFB
web design creating websites from scratch
	www.udemy.com/web-design-creating-websites-from-scratch/?couponCode=WEBFREE

The Code

I used regular expressions for the extraction, and then wrote several output formats for the links, including HTML anchor tags, markdown format, and the currently shown format where urls are tabbed in. This was because pastebin wouldn’t accept links with alternate text.


import re

def ProcessLine(pattern, line):
    match = re.search(pattern, line)
    if match is None:
        return line + "*" * 79 + "\n"
    else:
        words = match.group(1).replace("-", " ")
        # This one simply puts a line between text and link so pastebin can use it
        return f'{words}\n\t\t{line}\n'

# www.udemy.com/applewatchcourse/?couponCode=EnrollFREE
if __name__ == '__main__':
    # Extract the text after '.com/' and the next slash
    pattern = re.compile(r'^www[.]udemy[.]com[/]([^/]+)[/]')
    with open('links.txt', 'r') as read:
        with open('fixedlinks.txt', 'w' ) as write:
            for line in read:
                write.write(f'{ProcessLine(pattern, line)}')