How to Extract Only the Content from a Web Page – olussier.net

octubre 5, 2010

How to Extract Only the Content from a Web Page

Have you ever visited a web page and actually had to take a moment to figure out where the content was because the page was so heavily loaded with non-content stuff? With the growing number of websites, with different designs, one may wish to simply read the page’s content without having to deal with all the extra stuff (navigation, ads, social features…).

The excellent folks at Arc90 have come up with a solution: the Readability bookmarklet. This easy-to-use bookmarklet extracts the main content from a web page and displays it in a simple yet pretty way. You can even customize the style, size and margins to make your reading as enjoyable as possible. The bookmarklet uses a generic algorithm that works on most pages that actually have content. While it is not 100% accurate, they do claim a success rate over 99%. Try it yourself on this page by clicking here!

Here’s a short video that shows how simple and effective it is:

Besides improving the reading experience, there are other great uses to this bookmarklet. First, websites do not always provide printer-friendly versions of their pages. With Readability, you get a clutter-free article ready to be printed. There even is a “Print” button. Also, if you use Evernote with the Web Clipper, you should try using Readability on a page before clipping it. You will end up clipping only the article, which is more likely what you wanted to do!

Using the Readability Algorithm in Your Applications

You can even use the power of Readability if you need to extract web pages’ content in your applications. Some nice folks have ported the algorithm to other languages. See Nirmal Patel‘s Python port here, Keyvan Minoukadeh‘s PHP port here and Immortal‘s C# port here.

vía How to Extract Only the Content from a Web Page – olussier.net.

http://vimeo.com/moogaloop.swf?clip_id=8798492&server=vimeo.com&show_title=1&show_byline=1&show_portrait=1&color=&fullscreen=1&autoplay=0&loop=0

Readability – Installation Video for Firefox, Safari & Chrome from Arc90 on Vimeo.

Anuncios

Online Ontology Visualisation: RDFa

octubre 5, 2010

jOWL status updateI packaged the latest development version of jOWL into a 0.5 release, available at Google Code. jOWL is an AJAX/javascript extension to jQuery that I am developing. The jOWL library parses and reasons with OWL-DL documents. Supported browsers for this release are Internet Explorer 7 and Firefox 2 & 3.This release is accompanied by several new and impressive demos in my humble opinion. These make use of the new functionalities that have been incorporated so far. Below are some important highlights.

vía Online Ontology Visualisation: RDFa.


Como crear un bookmarklet

junio 13, 2010

Un Bookmarklet es un marcador del navegador (elemento de Favoritos en si usas Internet Explorer) que en vez de contener una dirección de internet contiene una llamada javascript.

Lo que hacemos con esta técnica es forzar que el navegador ejecute un codigo javascript que nosotros le indicamos cada vez que el usuario clicka en ese marcador.

Esto puede ser usado de forma personal para todo: cambiar el DOM, los estilos de la web, buscar dentro del documento, etc… pero para lo que más nos sirve, como desarrolladores web es para ofrecer la posibilidad de enviar a nuestra página la url o datos de lo que está viendo el usuario.

Esta técnica es muy usada por agregadores o redes de marcadores sociales para facilitar la vida al usuario capturando la página que está viendo y enviandola directamente a la url del site que debe recogerla.

http://blog.ikhuerta.com/como-crear-un-bookmarklet

Como crear un bookmarklet.


OpenURL ContextObject in SPAN COinS

enero 8, 2010

OpenURL COinS: A Convention to Embed Bibliographic Metadata in HTML

stable version 1.0

Abstract

COinS ContextObjects in Spans is a simple, ad hoc community specification for publishing OpenURL references in HTML.

Contents

Main Page

1. Introduction

2. Specification : OpenURL ContextObject in SPAN COinS- Embedding Citation Metadata in HTML

3. Discussion : How to use COinS in HTML

4. Details 1. Empty SPANs. 2. Why “Z3988”? 3. What is a ContextObject? 4. Choosing the type of ContextObject for Compatibility.5. XHTML6. why the span element? 7. why class and title attributes?

5. Implementations 1. Embedding Sites 2. COinS Processors 3. Other Software support for COinS

6. Links

7. Notes

Using COinS to Provide OpenURL links COinS Generator Brief Guide to Implementing ContextObjects for Journal Articles Brief Guide to Implementing ContextObjects for Books

http://ocoins.info/

desdeOpenURL ContextObject in SPAN COinS.


Online QDA – Getting started with Qualitative Data Analysis Software

diciembre 1, 2009

Getting started with Qualitative Data Analysis SoftwareBelow are links to materials that tell you how to undertake some of the basic activities in qualitative data analysis software such as importing documents and starting projects, coding and its organisation, creating memos, text and code searching, reporting and retrieving information.Many software developers now produce their own teaching materials that cover the basics of the software use. Where this is so the links below are to their materials. In other cases there are links to materials on this site that cover older versions of the software to help those who are still using these versions.

desdeOnline QDA – Getting started with Qualitative Data Analysis Software.


Planning a Semantic Web site

junio 30, 2009

Rob Crowther (robert@crowther.info), Web developer, Freelance

Summary: The Semantic Web brings with it the opportunities for users to get smarter search results, and for site owners to get more targeted traffic as users find what they really want. But these benefits don’t just magically appear. This article leads you through the aspects of both information architecture and general infrastructure you need in place to truly take advantage of this burgeoning opportunity.

This article discusses what you need to know to make your Web site part of the Semantic Web. It starts with a discussion of the problems the Semantic Web tries to solve and then moves to the technologies involved, such as Resource Description Framework (RDF), Web Ontology Language (OWL), and SPARQL Protocol and RDF Query Language (SPARQL). You’ll see how the Semantic Web is layered on top of the existing Web. It then covers some issues that you want to know about when you plan a new Web site and also gives specific examples of how to use technologies like RDFa and Microformats to enable your existing Web site to become a part of the Semantic Web.


Aperture Framework

mayo 15, 2009

Java (programming language)
Image via Wikipedia

Aperture

a Java framework for getting data and metadata

Project name

From Merriam-Webster Online:

Main Entry: ap·er·ture (sounds like this)

Pronunciation: ‘ap-&(r)-“chur, -ch&r, -“tyur, -“tur

Function: noun

Etymology: Middle English, from Latin apertura, from apertus, past participle of aperire to open

1. an opening or open space : HOLE

2. a : the opening in a photographic lens that admits the light

b : the diameter of the stop in an optical system that determines the diameter of the bundle of rays traversing the instrument

c : the diameter of the objective lens or mirror of a telescope

Download
Sourceforge project

desdeAperture Framework.

Reblog this post [with Zemanta]