Analysis of large web page generation mechanism and data of Shanghai Dragon

Of course, the new Automatic generation of

because of various words expansion channels, expansion of a word may with the root, completely irrelevant. So it is necessary to calculate the similarity with the corresponding string expansion word roots and high similarity, for the same kind of word.

Audi used car price "for the value and the" Pentium similar second-hand car ": 0.71875

like this is at the beginning of this year to do small scale station group, the same routine, but because after more than a year of Kuwait, program development on ability than before so many Niubi, data on the speed and scale is much larger than before. One and a half months flow from 0 to 10 thousand, and then because of a new update system, a set of templates, URL format and the page before, like domain贵族宝贝//.html, are responsible for causing the URL routing conflict, resulting in a heap page open return 500, after half a month didn’t know… However any because "site invariance" caused by the loss of Shanghai dragon, are difficult to reverse.

around the root to dig

dug out words, filtering, edge sensitive, suspected contraband, repeat custom blacklist, word

"Austria >

here is the "automatic generation mechanism" page of the process:

like this is a small 14 years before the station, Shanghai Longfeng flow is no more than 10 thousand, by the end of the part of the "automatic page generation mechanism", is very rough, and the 3 fourth months of inactivity, finally began to change, then the traffic has been rising up, has been in the same about 100 thousand, fell and rose, after a year, began to drop drop drop, long time because the system does not adjust, filling in a lot of garbage and garbage data word.

site site also is not to, but need to spend resources free short-term.

from a variety of channels out of a number of industry root

pages, refers to the whole process from the tuning guide word line, automatic generation, artificial adjusting parameters. Apply to have a large amount of data of the site. And for a long time before it was in use, is the old pattern. Of course, the new station site is not to, but need to spend resources free short-term. This is like 14 years received a single station

"and" Pentium second-hand car prices ";

pages, only from the whole course of "guide words – – on-line tuning", automatic generation, artificial adjusting parameters. Apply to have a large amount of data of the site. And for a long time before it was in use, is the old pattern.

import module

Automatic generation of Keywords

is like the "extended words Audi second-hand car prices" are "less than 50 thousand of second-hand car