- Previous: zemanta.preferences
- Up: Zemanta Documentation
- Next: Zemanta API companion
zemanta.suggest
Suggest method allows developers to query Zemanta for contextual metadata about a given text. There are currently four main components of response: articles, keywords, images, in-text links and optional component categories.
Application submits text, either as HTML or plain text, and receives a number of suggestions back. It should be noted that HTML means text with markup such as links, bold and similar, it does not mean you can send whole page including navigation into the API. You should only send the "pure content". This document is primarily concerned with the formating of input and out parameters of a request, for precise explanation how each suggestion is made, please refer to Zemanta API Companion.
Function input parameters
| Parameter | Description | Required | Possible values |
| method | Method on the server | Yes | "zemanta.suggest" |
| api_key | Your API key | Yes | string |
| text | Input text (clear text or HTML) |
Yes | string |
| format | requested output format | Yes | "xml", "json", "wnjson" or "rdfxml" |
| return_rdf_links |
return URIs of Linking Open Data entities |
No |
0, 1 |
| return_categories |
categorize into specified categorization scheme |
No |
"dmoz" or partner ID |
| return_images |
return related images (default is yes) This can cause dramatic speedups |
No |
0, 1 |
| emphasis | terms to "emphasise" (even when not present in text) |
No |
string |
| personal_scope |
return only personalized related articles and images |
No |
0, 1 |
| markup_limit | Number of in-text links to return (default: depending on the number of input words, 1 per each 10 words, and it maxes out at 10) | No | number |
| images_limit | Number of images to return (default:24) | No | number |
| articles_imit | Number of articles to return (default:10) |
No | number |
| articles_max_age_days | Maximum age of returned articles (default: no limit) | No | number |
| image_max_w | Maximum image width (default: 300) | No | number |
| image_max_h | Maximum image height (default: 300) | No | number |
| sourcefeed_ids |
ID for personalized related articles |
No |
|
| flickr_user_id |
flickr ID of the user |
No |
|
| social_timestamp |
ID for recognizing social links |
No |
|
| pixie | the chosen Zemanta signature icon | No |
Parameters in green should be passed directly from the response of zemanta.preferences call. Do not try to put your own values there.
About response formats
- xml
Generally XML is the format for interchange of information on internet. Zemanta offers a simple XML response format for its zemanta.suggested call.
- json
In scripting languages sometimes JSON is more natural format to parse, so you can use "json" as format, to get such response. Generally it is structured the same as "xml" format mentioned above.
- wnjson
In JavaScript there are additional issues in calling Zemanta API. You cannot send cross-domain POST requests in JavaScript (and you need that to send large chunks of text to Zemanta API). Recently a new method has been invented by JavaScript frameworks. You can open such call inside IFrame and then read the content of the window title to get to the plain JSON. Generally you should use frameworks such as jQuery that support this kind of calls natively. We call this response format "wnjson".
- rdfxml
Since Zemanta is a semantic application it is expected that proper semantic response is offered. When specifying "rdfxml" format you will get RDF/XML structure as response. We suggest using semantic libraries to read the triples encoded inside. All objects inside this response are properly typed and we have documented it on a separate page. There you can also find more information about possible uses of Zemanta in semantic software/projects/ecosystems. Response is more precisely described in an "Zemanta RDF response" document.
Response structure (top level)
| Parameter | Description | Type | Possible Values |
| status | indicates the status of request | string | ok, fail |
| rid | unique request id that can be used with calls that require it | string | 36 chars UUID4 |
| articles | a list of objects | list | |
| keywords | a list of objects | list | |
| images | a list of objects | list | |
| markup | object |
dict | |
| categories |
a list of objects |
list |
optional (when using categorization) |
| signature |
signature to use (HTML blob) |
string |
Articles substructure
Articles substructure is a list of article objects where each object has the following format:
| Article object | Description | Type |
| url | URL of the article | string |
| title | title of the article | string |
| published_datetime | date when article was published. If not available harvested date is used. In ISO 8601 format. | string |
| confidence | confidence on 0.0 to 1.0 scale | float |
| zemified |
is the article zemified or not (1 or 0), optional |
integer |
Keywords substructure
Keywords substructure is a list of keyword objects where each object has the following format:
| Keyword object | Description | Type |
| name | keyword (can contain spaces, but not commas) | string |
| confidence | confidence on 0.0 to 1.0 scale | float |
| schema |
origin of the keyword (right now "general") |
string |
Images substructure
Images substructure is a list of image objects where each object has the following format:
| Image object | Description | Type |
| url_l | URL of large version of the image |
string |
| url_m | URL of medium version of the image | string |
| url_s | URL of small version of the image | string |
| url_l_w |
width of large image |
integer |
| url_l_h |
height of large image | integer |
| url_m_w |
width of medium image | integer |
| url_m_h |
height of medium image |
integer |
| url_s_w |
width of small image | integer |
| url_s_h |
height of small image | integer |
| source_url | URL of page that has more information about the image | string |
| license | license of image (HTML blob) |
string |
| description | description of image (text) |
string |
| attribution | attribution of image (HTML blob) |
string |
| confidence | confidence on 0.0 to 1.0 scale | float |
Markup substructure
Markup substructure has two substructures:
| Markup object | Description | Type |
| text | HTML formatted text with links (DEPRECATED) |
string |
| links | a list of objects | list |
Structure of each link object
| Link object | Description | Always |
Type |
| anchor | the word(s) in original text that should be anchored | yes |
string |
| confidence | confidence on 0.0 to 1.0 scale | yes |
float |
| target | a list of objects | yes |
list |
| freebase_guid |
Freebase GUID (given when input parameter freebase = 1 and data is available) (DEPRECATED, use return_rdf_links instead) |
no |
string |
Structure of each target object
| url | resource URL of the linked term | string |
| type | type of resource | string |
| title | title of resource | string |
Type can be one of the following strings:
- wikipedia
- amazon
- imdb
- youtube
- homepage
- geolocation
- blog
- crunchbase
- musibrainz
- mybloglog
- myspace
- itis (Integrated Taxonomic Information System)
- ncbi (National Center for Biotechnology Information)
- stockexchange
- lastfm
- snooth
- rdf (for semantic links to dbPedia, Freebase, MusicBrainz and Semantic TechCrunch)
Categories substructure
Categories substructure is a list of category objects where each object has the following format:
| Category object | Description | Type |
| name | category name |
string |
| confidence | confidence on 0.0 to 1.0 scale | float |
| categorization |
what categorization this category comes from |
string |
If you don't have special arragement with Zemanta you can only get "dmoz" as categorization.
Examples how API can be used in different langauges (PHP, Perl, C#, ...) are available in the wiki.
Sample call (python)
import urllibThe Phoenix Mars Lander has successfully deployed its robotic arm and
gateway = 'http://api.zemanta.com/services/rest/0.0/'
args = {'method': 'zemanta.suggest',
'api_key': 'key1234',
'text': '''
tested other instruments including a laser designed to detect dust,
clouds, and fog. The arm will be used to dig up samples of the Martian
surface which will be analyzed as a possible habitat for life.''',
'return_categories': 'dmoz',
'format': 'xml'}
args_enc = urllib.urlencode(args)
print urllib.urlopen(gateway, args_enc).read()
Sample response (Truncated for clarity)
<rsp>
<status>ok</status>
<articles>
<article>
<url>http://abcnews.go.com/Technology/story?id=5255072&page=1</url>
<confidence>0.048289</confidence>
<published_datetime>2008-06-26T19:12:59Z</published_datetime>
<title>Seeds of Life Found in Martian Soil</title>
<zemified>0</zemified></article>
<article>
<url>http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9108238&source=rss_topic54</url>
<confidence>0.0479</confidence>
<published_datetime>2008-07-09T13:00:00Z</published_datetime>
<title>NASA: Mars Lander short circuit pushes up ice test</title>
<zemified>0</zemified>
</article>
</articles>
<markup>
<text>The <a class="zem_slink" href="http://en.wikipedia.org/wiki/Phoenix_%28spacecraft%29">Phoenix Mars Lander</a> has successfully deployed its <a class="zem_slink" href="http://en.wikipedia.org/wiki/Robotic_arm">robotic arm</a> and tested other instruments including a laser designed to detect dust, clouds, and fog. The arm will be used to dig up samples of the <a class="zem_slink" href="http://en.wikipedia.org/wiki/Mars">Martian</a> surface which will be analyzed as a possible habitat for life.</text>
<links>
<link>
<confidence>0.084166</confidence>
<anchor>Phoenix Mars Lander</anchor>
<target>
<url>http://www.youtube.com/watch?v=tR91HkTZ9VY</url>
<type>youtube</type>
<title>Phoenix (spacecraft)</title>
</target>
<target>
<url>http://en.wikipedia.org/wiki/Phoenix_%28spacecraft%29</url>
<type>wikipedia</type>
<title>Phoenix (spacecraft)</title>
</target>
</link>
<link>
<confidence>0.006165</confidence>
<anchor>robotic arm</anchor>
<target>
<url>http://en.wikipedia.org/wiki/Robotic_arm</url>
<type>wikipedia</type>
<title>Robotic arm</title>
</target>
</link>
</links>
</markup>
<images>
<image><description>PASADENA, CA - MAY 25: Phoenix principal investigator, University of Arizona, Peter Smith (L) and Phoenix project manager, JPL, Barry Goldstein address a final press conference before an illustrative video of the Phoenix Mars Lander approaching Mars...</description><attribution>Image by <a href="http://www.daylife.com/source/Getty_Images">Getty Images</a> via <a href="http://www.daylife.com">Daylife</a></attribution>
<license>Low resolution use allowed when backlinking</license>
<source_url>http://www.daylife.com/image/097c92HarS9oN</source_url><confidence>0.5</confidence>
<url_s>http://cache.daylife.com/imageserve/097c92HarS9oN/75x75.jpg</url_s><url_s_w>75</url_s_w>
<url_s_h>75</url_s_h><url_m>http://cache.daylife.com/imageserve/097c92HarS9oN/150x100.jpg</url_m>
<url_m_h>113</url_m_h>
<url_m_w>150</url_m_w><url_l>http://cache.daylife.com/imageserve/097c92HarS9oN/150x100.jpg</url_l><url_l_h>100.0</url_l_h>
<url_l_w>150</url_l_w></image>
<image>
<description>Day 2 14.19.40 Phoenix Mars Lander 3-D Anaglyphs</description><attribution>Image by <a href="http://www.flickr.com/photos/48836503@N00/2530611038">gate3003</a> via Flickr</attribution><license>License CreativeCommons Attribution only</license><source_url>http://www.flickr.com/photos/48836503@N00/2530611038</source_url><confidence>0.5</confidence><url_s>http://farm4.static.flickr.com/3043/2530611038_f490407155_s.jpg</url_s><url_s_w>75</url_s_w><url_s_h>75</url_s_h><url_m>http://farm4.static.flickr.com/3043/2530611038_f490407155_m.jpg</url_m><url_m_w>220</url_m_w><url_m_h>240</url_m_h>
<url_l>http://farm4.static.flickr.com/3043/2530611038_f490407155.jpg</url_l><url_l_w>458</url_l_w><url_l_h>500</url_l_h></image>
<image><description>An artist's rendition of the Phoenix Mars probe during landing. The sophisticated landing system on Phoenix allows the spacecraft to touch down within 10 km (6.2 miles) of the targeted landing area. Thrusters are started when the lander is 570 m (1900 feet) above the surface. The navigation system is capable of detecting and avoiding hazards on the surface of Mars.</description><attribution>Image via <a href="http://commons.wikipedia.org/wiki/Image:Phoenix_landing.jpg">Wikipedia</a></attribution>
<license>Public domain</license><source_url>http://commons.wikipedia.org/wiki/Image:Phoenix_landing.jpg</source_url><confidence>0.99</confidence><url_s>http://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Phoenix_landing.jpg/75px-Phoenix_landing.jpg</url_s>
<url_s_w>75</url_s_w>
<url_s_h>69</url_s_h><url_m>http://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Phoenix_landing.jpg/202px-Phoenix_landing.jpg</url_m><url_m_w>202</url_m_w>
<url_m_h>186</url_m_h>
<url_l>http://upload.wikimedia.org/wikipedia/commons/6/6a/Phoenix_landing.jpg</url_l>
<url_l_w>5200</url_l_w>
<url_l_h>4800</url_l_h>
</image>
</images>
<keywords>
<keyword>
<confidence>0.506297</confidence>
<name>Mars</name><scheme>general</scheme></keyword>
<keyword>
<confidence>0.296248</confidence>
<name>Phoenix</name><scheme>general</scheme></keyword>
</keywords>
<categories>
<category>
<confidence>0.231914</confidence>
<categorization>dmoz</categorization>
<name>Top/Science/Anomalies_and_Alternative_Science/Astronomy,_Alternative/Planetary_Anomalies</name>
</category><category>
<confidence>0.195886</confidence>
<categorization>dmoz</categorization>
<name>Top/Science/Astronomy/Solar_System/Planets/Mars</name>
</category>
</categories>
<signature><div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/40b3d04b-5248-4256-a22b-c07ba38b2d9f/" title="Zemified by Zemanta"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=40b3d04b-5248-4256-a22b-c07ba38b2d9f" alt="Zemanta Pixie" /></a></div></signature>
<rid>40b3d04b-5248-4256-a22b-c07ba38b2d9f</rid>
</rsp>
Fine print
The request size is limited. Only first 8kb of text is going to be processed.
There are also limits in place for number of requests per day (as specified in Terms of service) and per second. If you go over these limits, the system will return an error message "403 Developer over quota". Contact us if you need to make more calls to our system.
While confidence information is available for certain analysis, it is very seldom the case that comparing confidence values between documents is meaningful. Generally they represent relative measure of confidence for the specific type of recommendation for that specific document. Value should also not be interpreted as probability. We do our best to return meaningful confidences, but generally you should consult us about their use.
- Previous: zemanta.preferences
- Up: Zemanta Documentation
- Next: Zemanta API companion

Comments
Please sign in to post a comment.